This book presents the design, analysis, and application of nonlinear adaptive filters with the goal of improving effici
427 147 13MB
English Pages 270 [271] Year 2023
Table of contents :
Preface
Acknowledgments
Contents
Abbreviations and Acronyms
Chapter 1: Adaptive Filter
1.1 Introduction
1.2 Linear Adaptive Filters
1.2.1 LMS Algorithm
1.2.2 Affine Projection Algorithm
1.2.3 Recursive Least-Squares Algorithm
1.2.4 Subband Algorithm
1.2.5 Kalman Filter
1.3 Nonlinear Adaptive Filters
1.3.1 Volterra Filter
1.3.2 FLANN Adaptive Filter
1.3.3 Spline Adaptive Filter
1.3.4 Kernel Adaptive Filter
1.4 Summary
References
Chapter 2: Volterra Adaptive Filter
2.1 Introduction
2.2 Volterra Filter Model
2.3 Pipelined Volterra Filter
2.4 Convex Combination of Volterra Filter
2.4.1 The Algorithm I
2.4.2 The Algorithm II
2.5 Robust Volterra Filtering Algorithm
2.6 The Volterra Expansion Model Based Filtered-x Logarithmic Continuous Least Mean p-Norm (VFxlogCLMP) Algorithm for Active N...
2.6.1 VFxlogLMP Algorithm
2.6.2 VFxlogCLMP Algorithm
2.6.3 Performance Analysis of the VFxlogCLMP Algorithm
2.6.4 EMSE Analysis
2.6.5 Convergence Condition of the VFxlogCLMP Algorithm
2.7 Diffusion Volterra Nonlinear Filtering Algorithm
2.7.1 Diffusion Least Mean Square (DLMS) Algorithm
2.7.2 Problem Formulation
2.7.3 The DV Filtering Algorithm
2.8 Simulation Results
2.8.1 Pipelined Volterra Filter
2.8.2 Convex Combination of Volterra Filter
2.8.3 Robust Volterra Filtering Algorithm
2.8.4 The VFxlogCLMP Algorithm for ANC Application
2.8.5 Diffusion Volterra Filtering Algorithm
2.9 Summary
References
Chapter 3: FLANN Adaptive Filter
3.1 Introduction
3.2 Neural Network Structures
3.2.1 MLP
3.2.2 ChNN
3.2.3 FLANN
3.2.4 LeNN
3.3 Recursive FLANN
3.3.1 Feedback FLANN Filter
3.3.2 Reduced Feedback FLANN Filter
3.3.3 Recursive FLANN Structure
3.3.3.1 A BIBO Stability Condition
3.4 Convex Combination of FLANN Filter
3.5 Random Fourier Filter
3.5.1 Random Fourier Feature
3.5.2 RF-LMS Algorithm
3.5.3 Cascaded RF-LMS (CRF-LMS) Algorithm
3.5.4 Mean Convergence Analysis
3.5.5 Computational Complexity
3.6 Nonlinear Active Noise Control
3.6.1 Robust Control Algorithms for NANC
3.6.1.1 FsLMP Algorithm
3.6.1.2 FsqLMP Algorithm
3.6.1.3 RFsLMS Algorithm
3.6.1.4 FsMCC Algorithm
3.6.1.5 RFF-FxMCC Algorithm
3.7 Nonlinear Channel Equalization
3.7.1 Communication Channel Equalization
3.7.2 Channel Equalization Using a Generalized NN Model
3.7.3 FLNN Equalizer
3.7.3.1 Adaptive Equalizer with FLNN Cascaded with Chebyshev Orthogonal Polynomial
3.7.3.2 Decision Feedback Equalizer Using the Combination of FIR and FLNN
3.8 Computer Simulation Examples
3.8.1 FLANN-Based NANC with Minimum Phase Secondary Path System
3.8.2 Random Fourier Filter-Based NANC
3.8.2.1 Projection Dimension and Memory Length of Random Fourier Filter
3.8.2.2 Real Example: Random Fourier Filter-Based Active Traction Substation Noise Control
3.8.3 Nonlinear Channel Equalization
3.8.3.1 Channel Equalization Using a Generalized NN Model
3.8.3.2 Adaptive Equalizer Based on the FLNN Cascaded with Chebyshev Orthogonal Polynomial Structure
3.8.3.3 Adaptive Decision Feedback Equalizer with the Combination of FIR Filter and FLANN
3.9 Summary
References
Chapter 4: Spline Adaptive Filter
4.1 Introduction
4.2 Spline Filter Model
4.2.1 Spline Adaptive Filter
4.2.2 Basic Spline Filter Algorithm
4.2.2.1 SAF-LMS Algorithm
4.2.2.2 SAF-NLMS Algorithm
4.2.2.3 SAF-SNLMS Algorithm
4.2.2.4 SAF-VSS-SNLMS Algorithm
4.3 Robust Spline Filtering Algorithm
4.3.1 SAF-MCC Algorithm
4.3.2 Performance Analysis
4.4 Applications
4.4.1 Active Noise Control Based on Spline Filter
4.4.1.1 FcGMCC Algorithm
4.4.1.2 Convergence Analysis
4.4.2 Echo Cancellation Based on Spline Filter
4.4.2.1 The Nonlinear Echo Canceler
4.4.2.2 The Architectures Proposed in
4.5 Computer Simulation Examples
4.5.1 Basic Spline Filter Algorithm Simulation
4.5.2 SAF-MCC Algorithm Simulation
4.5.3 Performance Analysis Simulation
4.5.4 Simulation of ANC
4.5.4.1 Performance of the FcGMCC Algorithm
4.5.5 Simulation of Echo Cancellation
4.6 Summary
References
Chapter 5: Kernel Adaptive Filters
5.1 Introduction
5.2 Kernel Adaptive Filters
5.2.1 Reproducing Kernel Hilbert Space
5.2.2 Kernel Least Mean Square
5.2.2.1 Kernel Selection
5.2.2.2 Step-Size Selection
5.2.2.3 Mean Square Convergence Analysis
5.2.3 Kernel Affine Projection Algorithms
5.2.3.1 Affine Projection Algorithms
5.2.3.2 Kernel Affine Projection Algorithms
5.2.4 Kernel Recursive Least Squares
5.3 Network Optimization
5.3.1 Sparsification Algorithms
5.3.1.1 Novelty Criterion
5.3.1.2 Approximate Linear Dependency
5.3.1.3 Surprise Criterion
5.3.2 Quantization Algorithms
5.3.2.1 On-Line Quantization
5.3.2.2 Off-Line Quantization
5.3.3 Kernel Approximation
5.3.3.1 Nyström Method
5.3.3.2 Random Fourier Feature Method
5.4 Computer Simulation Examples
5.4.1 Comparisons of Different KAFs
5.4.1.1 Mackey-Glass Chaotic Time Series Prediction
5.4.1.2 Nonlinear Channel Equalization
5.4.2 Comparisons of Network Optimization Methods
5.4.2.1 Relation Between Code Book Size and Performance
5.4.2.2 Comparison of Several Network Optimization Methods
5.4.2.3 KRLS with Different Sparsification Methods
5.4.2.4 Comparison of Different Quantization Methods
5.4.2.5 KRR with Different Quantization Methods
5.5 Summary
References
Index
Haiquan Zhao Badong Chen
Efficient Nonlinear Adaptive Filters
Design, Analysis and Applications
Efficient Nonlinear Adaptive Filters
Haiquan Zhao • Badong Chen
Efficient Nonlinear Adaptive Filters Design, Analysis and Applications
Haiquan Zhao School of Electrical Engineering Southwest Jiaotong University Chengdu, China
Badong Chen Inst Artificial Intelligence & Robo Xi'an Jiaotong University Xi'an, China
ISBN 978-3-031-20817-1 ISBN 978-3-031-20818-8 https://doi.org/10.1007/978-3-031-20818-8
(eBook)
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
In recent years, signal-processing technology taken great a leap forward. Especially with the development of digital circuit technology, the efficiency of digital signal processing (DSP) has been greatly improved. Digital filtering technology, an important branch of DSP, has been widely studied and applied in many fields, which mainly aims to extract the useful information contained in the received signal. In practice, the device that achieves filtering function is generally called a filter, which can extract the desired information from the input signal. Digital filter is used to process discrete-time signal. For linear time invariant (LTI) filter, its internal parameters and structure are fixed, and the output signal is the linear mapping of the input signal. However, when the statistical characteristics of the signal to be processed are unknown, the LTI filter cannot provide good signal processing capability. At this time, adaptive filter is a very attractive solution, which can optimize its internal free parameters according to the input signal to provide effective performance. Strictly speaking, adaptive filter is a kind of nonlinear filter (its characteristics depend on the input signal), so it does not satisfy superposition and homogeneity. However, at a certain moment, the parameters of the filter are fixed, and the output of the filter is a linear mapping of the input signal. At the crux of adaptive filters is the design of the filtering algorithm, i.e. how the parameters of the filter are adaptively adjusted to meet the performance requirements in response to changes in the environment (input and desired signal). The algorithms discussed in this book are all based on discrete-time signals, because the rapid development of VLSI technology makes the processing of discrete-time signals more rapid and convenient. An adaptive filter generally consists of three parts: (1) Application. Adaptive filtering technology has been applied in many aspects, such as channel equalization, signal prediction, echo cancellation, beam-forming, system identification, and signal enhancement. (2) Structure. Adaptive filter can be composed of many structures, and different structure corresponds to different computational complexity. According to the form of impulse response, adaptive filter can be divided into finite impulse response (FIR) filter and infinite impulse response (IIR) filter. The most widely v
vi
Preface
used FIR filter is transverse filter, its transfer function has no pole point, so there is no system stability issue. For this structure, the output of the filter is a linear combination of the input signals. However, most of the actual systems are nonlinear, the linear adaptive filter is not suitable to deal with this kind of situation because of its inherent defects, so the nonlinear adaptive filter is proposed to overcome abovementioned problem, such as Volterra filter, function link artificial neural network (FLANN), spline filter, and kernel function–based filter. (3) Algorithm. The algorithm adaptively adjusts the coefficients of the filter to minimize a certain optimization criterion. In fact, the theory of linear adaptive filtering is mature enough, and a large number of journals and books have summarized it in detail. However, there are very few books on nonlinear adaptive filters. Therefore, the core content of this book is to introduce some nonlinear adaptive filters with complete theoretical systems, including some classical applications, nonlinear filter structures, and algorithms. The first chapter of this book briefly introduces the basic knowledge of classical linear adaptive filtering. The understanding of this basic knowledge is the basis for further study of nonlinear adaptive filtering methods in the following chapters. The main contents of this book consist of five chapters, which are summarized as follows: Chapter 1 mainly introduces the linear adaptive filter and several classical adaptive filtering algorithms. Finally, a brief introduction is given to the nonlinear filter that will be described in the following chapters. Chapter 2 introduces the Volterra filter for nonlinear systems, mainly includes the pipelined Volterra filter, convex combined Volterra filter and robust Volterra filter, and their corresponding nonlinear filtering algorithms. Moreover, a robust diffusion Volterra (DV) algorithm for distributed nonlinear network is also described in detail. Finally, computer simulations are provided. Chapter 3 describes the functional link artificial neural network (FLANN)-based nonlinear filter, mainly includes the structure, principle, and some improved models of the FLANN-based filter. The nonlinear property and modelling ability of the FLANN-based filter are verified by computer simulations. In Chap. 4, the nonlinear spline filter and adaptive algorithms are introduced. In addition, the convergence behavior of a robust spline filtering algorithm is analyzed, and the validity of analysis results are verified by computer simulations. Finally, the application of spline filter in active noise control is given. In Chap. 5, we introduce the kernel adaptive filter and several classical kernel adaptive filtering algorithms. In particular, in order to reduce the high computing consumption and storage space caused by the large-scale hidden layer nodes of these algorithms, several network optimization methods are presented. Finally, computer simulations are provided to verify the validity of these optimization methods. This book provides a reference for researchers and students in the field of developing and researching advanced signal processing of adaptive filters, and also provides a convenient way for practical engineers in related fields to understand effective algorithms. The readers of this book need to understand some basic principles of digital signal processing, random processes, and matrix theory,
Preface
vii
including finite impulse response (FIR) digital filter realization, random variables, and first-order and second-order statistics. Assuming that the readers have such a background, they will have no problem reading this book. In addition, a number of references are given at the end of each chapter to facilitate the readers’ further study of a chapter. Chengdu, China Xi’an, China
Haiquan Zhao Badong Chen
Acknowledgments
We would like to thank some of my former and current graduate students. In particular, we would like to thank PhD. Yingying Zhu, PhD. Shaohui Lv, PhD. Wenjing Xu, PhD. Dongxu Liu, PhD. Pengfei Li, Dr. Chuang Liu, Ms. Yuan Gao, Ms. Boyu Tian, Ms. Jinwei Lou, Ms. Xinhao Xu, Dr. Zhengda Qin, and Dr. Lei Xing with whom we have worked on the topic of this book and who contributed to some of the results reported here. This work was partially supported by National Natural Science Foundation of China (grant: 62171388, 61871461), Fundamental Research Funds for the Central Universities (grant: 2682021ZTPY091), and Southwest Jiaotong University Graduate Teaching Materials (Monograph) Funding Construction Project (grant:SWJTU-ZZ2022-017).
ix
Contents
1
Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Linear Adaptive Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 LMS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Affine Projection Algorithm . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Recursive Least-Squares Algorithm . . . . . . . . . . . . . . . . . 1.2.4 Subband Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.5 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Nonlinear Adaptive Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Volterra Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 FLANN Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Spline Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.4 Kernel Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 1 1 2 4 5 10 11 12 13 14 14 15 15
2
Volterra Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Volterra Filter Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Pipelined Volterra Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Convex Combination of Volterra Filter . . . . . . . . . . . . . . . . . . . . 2.4.1 The Algorithm I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 The Algorithm II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Robust Volterra Filtering Algorithm . . . . . . . . . . . . . . . . . . . . . . . 2.6 The Volterra Expansion Model Based Filtered-x Logarithmic Continuous Least Mean p-Norm (VFxlogCLMP) Algorithm for Active Noise Control Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 VFxlogLMP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 VFxlogCLMP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . .
17 17 17 20 24 27 30 33
37 39 40 xi
xii
Contents
2.6.3
3
Performance Analysis of the VFxlogCLMP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.4 EMSE Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.5 Convergence Condition of the VFxlogCLMP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Diffusion Volterra Nonlinear Filtering Algorithm . . . . . . . . . . . . . 2.7.1 Diffusion Least Mean Square (DLMS) Algorithm . . . . . . . 2.7.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.3 The DV Filtering Algorithm . . . . . . . . . . . . . . . . . . . . . . . 2.8 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.1 Pipelined Volterra Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.2 Convex Combination of Volterra Filter . . . . . . . . . . . . . . . 2.8.3 Robust Volterra Filtering Algorithm . . . . . . . . . . . . . . . . . 2.8.4 The VFxlogCLMP Algorithm for ANC Application . . . . . . 2.8.5 Diffusion Volterra Filtering Algorithm . . . . . . . . . . . . . . . 2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48 48 49 50 51 56 56 60 64 66 71 78 78
FLANN Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Neural Network Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 MLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 ChNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 FLANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 LeNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Recursive FLANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Feedback FLANN Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Reduced Feedback FLANN Filter . . . . . . . . . . . . . . . . . . . 3.3.3 Recursive FLANN Structure . . . . . . . . . . . . . . . . . . . . . . . 3.4 Convex Combination of FLANN Filter . . . . . . . . . . . . . . . . . . . . 3.5 Random Fourier Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Random Fourier Feature . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 RF-LMS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Cascaded RF-LMS (CRF-LMS) Algorithm . . . . . . . . . . . . 3.5.4 Mean Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . 3.5.5 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Nonlinear Active Noise Control . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 Robust Control Algorithms for NANC . . . . . . . . . . . . . . . 3.7 Nonlinear Channel Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 Communication Channel Equalization . . . . . . . . . . . . . . . . 3.7.2 Channel Equalization Using a Generalized NN Model . . . . 3.7.3 FLNN Equalizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Computer Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . .
83 83 84 84 85 87 88 90 90 91 95 100 107 107 108 110 113 115 116 116 126 126 127 128 135
42 45
Contents
xiii
3.8.1
FLANN-Based NANC with Minimum Phase Secondary Path System . . . . . . . . . . . . . . . . . . . . . . 3.8.2 Random Fourier Filter-Based NANC . . . . . . . . . . . . . . . . 3.8.3 Nonlinear Channel Equalization . . . . . . . . . . . . . . . . . . . . 3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
135 138 142 159 159
4
Spline Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Spline Filter Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Spline Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Basic Spline Filter Algorithm . . . . . . . . . . . . . . . . . . . . . . 4.3 Robust Spline Filtering Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 SAF-MCC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Active Noise Control Based on Spline Filter . . . . . . . . . . . 4.4.2 Echo Cancellation Based on Spline Filter . . . . . . . . . . . . . 4.5 Computer Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Basic Spline Filter Algorithm Simulation . . . . . . . . . . . . . 4.5.2 SAF-MCC Algorithm Simulation . . . . . . . . . . . . . . . . . . . 4.5.3 Performance Analysis Simulation . . . . . . . . . . . . . . . . . . . 4.5.4 Simulation of ANC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.5 Simulation of Echo Cancellation . . . . . . . . . . . . . . . . . . . . 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
163 163 164 164 166 172 172 175 183 183 186 189 189 195 198 201 204 207 207
5
Kernel Adaptive Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Kernel Adaptive Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Reproducing Kernel Hilbert Space . . . . . . . . . . . . . . . . . . 5.2.2 Kernel Least Mean Square . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Kernel Affine Projection Algorithms . . . . . . . . . . . . . . . . . 5.2.4 Kernel Recursive Least Squares . . . . . . . . . . . . . . . . . . . . 5.3 Network Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Sparsification Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Quantization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Kernel Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Computer Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Comparisons of Different KAFs . . . . . . . . . . . . . . . . . . . . 5.4.2 Comparisons of Network Optimization Methods . . . . . . . . 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
209 209 210 210 213 221 224 227 227 232 241 245 245 248 253 254
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Abbreviations and Acronyms
AEC ALD ANC AP APA BIBO BER CFsLMS CRF-LMS CRF-FxMCC ChNN DIV DIV-LLMP DQ DV DV-LLMP EMSE FIR FLANN FFLANN FLANN-NLMS FcLMS FcGMCC FLOM FsLMP FsLMS FsMCC FsqLMP FxLMS
Acoustic echo cancellation Approximate linear dependency Active noise control Affine projection Affine projection algorithm Bounded-input bounded-output Bit error rate Convex combined FsLMS algorithm Cascaded random Fourier least mean square algorithm Cascaded random Fourier filtered-x maximum correntropy criterion algorithm Chebyshev neural network Diffusion interpolated Volterra Diffusion interpolated Volterra logarithm least mean p-norm Density-dependent vector quantization Diffusion Volterra Diffusion Volterra logarithm least mean p-norm Theoretical mean square error finite impulse response Functional link artificial neural network Feedback functional link neural network FLANN-based normalized least mean square algorithm Filtered-c least mean square Filtered-c generalized maximum correntropy criterion Fractional lower order moments Filtered-s least mean p-power algorithm Filtered-s least mean square algorithm Filtered maximum correntropy criterion algorithm Filtered-s q-gradient least mean p-power algorithm Filtered-x least mean square algorithm xv
xvi
IIR IIR-SAF LeNN IVF IVFF-RLLMP KAF KAPA KF KLMS KRLS LMS LUT MCC MLP MMSE MSE NC NG NLAEC NLMS NLMP NN PDF PRNN PRQ QKLMS RBF RFF RFFLANN RF-FxMCC RF-LMS RFsLMS RKHS RLLMP RLS SAF SAF-LMS SAF-MCC SAF-NLMS SAF-SNLMS SAF-VSS-SNLMS SF
Abbreviations and Acronyms
Infinite impulse response IIR spline adaptive filter Legendre neural network Interpolated Volterra filter Improved variable forgetting factor recursive logarithm least mean p-norm Kernel adaptive Filter Kernel affine projection algorithm Kalman filtering Kernel least mean square Kernel recursive least square Least mean square Look-up table Maximum correntropy criterion Multilayer perceptron Minimum mean square error Mean square error Novelty criterion Non-Gaussian Nonlinear acoustic echo cancellation Normalized least-mean-square Normalized version of the LMP Neural network Probability density function Pipelined recurrent neural network Probability density rank-based quantization Quantized kernel least mean square Radial basis function Random Fourier feature Reduced feedback functional link neural network Random Fourier filtered-x maximum correntropy criterion algorithm Random Fourier least mean square algorithm Robust filtered-s least mean square algorithm Reproducing kernel Hilbert space Recursive logarithm least mean p-norm Recursive least square Spline adaptive filter Spline adaptive filter with LMS Spline adaptive filter with MCC Spline adaptive filter with NLMS Spline adaptive filter with sign NLMS Spline adaptive filter with variable step-size SNLMS Subband filtering
Abbreviations and Acronyms
SNR SOV SαS VFF-RLS VQ VQIT AEC ALD ANC AP APA CFsLMS CRF-FxMCC CRF-LMS DIV DIV-LLMP DQ
xvii
Signal-to-noise ratio Second-order Volterra Symmetric α-stable distribution Variable forgetting factor recursive least square Vector quantization Vector quantization using information theoretic learning Acoustic echo cancellation Approximate linear dependency Active noise control Affine projection Affine projection algorithm Convex combined FsLMS algorithm Cascaded random Fourier filtered-x maximum correntropy criterion algorithm Cascaded random Fourier least mean square algorithm Diffusion interpolated Volterra Diffusion interpolated Volterra logarithm least mean p-norm Density-dependent vector quantization
Chapter 1
Adaptive Filter
1.1
Introduction
In this chapter, conventional linear filters and adaptive filtering algorithms are mainly introduced, including LMS, RLS, AP algorithms, subband filtering, and Kalman filtering algorithms [1]. In addition, several classical nonlinear adaptive filters are briefly introduced, and their detailed descriptions are presented in the following chapters.
1.2 1.2.1
Linear Adaptive Filters LMS Algorithm
The LMS algorithm is the most widely used adaptive filtering algorithm, the main advantages of which are simple structure and low computational complexity [2]. Usually, adaptive filter consists of a transfer filter for processing the input signal and an algorithm unit for updating the transfer filter’s coefficients. A general structure of the adaptive filter is illustrated in Fig. 1.1. In Fig. 1.1, x(n) is the input signal; w(n) = [w0 (n), w1 (n), . . ., wL-1 (n)] is the filter coefficient vector w(n); L is the order of the transfer filter; d(n) is the desired signal; y(n) = wT(n)x(n) is the output signal of the filter; e(n) is the error signal, and given by e(n) = d(n) - y(n). The LMS algorithm is obtained by solving the following minimization mean square error (MSE) problem: E e 2 ð nÞ min w
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Zhao, B. Chen, Efficient Nonlinear Adaptive Filters, https://doi.org/10.1007/978-3-031-20818-8_1
ð1:1Þ
1
2
1
Fig. 1.1 The structure of adaptive filter
Adaptive Filter d(n)
x(n)
y(n)
+
-
∑
Transfer filter w(n)
e(n)
Adaptive algorithm
For the LMS algorithm [3–6], the filter coefficients can be updated as follows: wðnÞ = wðn - 1Þ þ μxðnÞeðnÞ
ð1:2Þ
where μ is the fixed step size. The normalized LMS (NLMS) algorithm is an improved LMS algorithm, which is developed to get faster convergence speed and lower steady-state misalignment. The NLMS algorithm can be described as follows [7]: wðnÞ = wðn - 1Þ þ μðnÞxðnÞeðnÞ μ μ ð nÞ = δ þ kxðnÞk2
ð1:3Þ ð1:4Þ
where δ is regularization parameter. To ensure the convergence of NLMS in the sense of mean square, the step size condition can be given by [8]: 0 < μðnÞ
0 for every n, it results that: jξðnÞj ≤
NX a -1
ðjai jM x Þ = αM x
i=0
By combining Eq. (3.52) and Eq. (3.53), finally can obtain that:
ð3:53Þ
100
3
jyðnÞj ≤
1 ðαM x + η + θÞ β
FLANN Adaptive Filter
ð3:54Þ
It is worth noting that the BIBO stability condition is essentially that of a recursive linear filter. The recursive FLANN filter is not affected by instabilities when the input signal has a finite amplitude, and the recursive linear part of the filter is stable. This behavior, due to the bounds of the trigonometric functions used in the input-output relationship, is in contrast, in general, with what happens for recursive polynomial filters where the input signals need to be constrained in well-defined ranges.
3.4
Convex Combination of FLANN Filter
To cope with the compromise between the speed of convergence and the steady-state error of FLANN filter, the convex combination scheme is adopted. Take the nonlinear active noise control system as an example. The convex combined active noise control system based on the FLANN filter is shown in Fig. 3.8. To achieve a good performance from the convex combination scheme, two adaptive FLANN filters with different step sizes are adapted individually. Moreover, the output signals of the component filters are convex combined by a mixing parameter in such a manner that the advantages of both FLANN filters are kept, i.e., the fast convergence speed from a large step size adaptive FLANN filter and the low steady-state error from the small step size adaptive FLANN filter [16]. As shown
Fig. 3.8 Block diagram of nonlinear ANC systems-based convex combination of two FLANN filters
3.4
Convex Combination of FLANN Filter
101
in Fig. 3.8, it is clearly shown that the output sy(n) of the nonlinear ANC system at the error microphone can be calculated using the convex combination form as follows: b ðnÞ yðnÞ = A b ðnÞ ½λðnÞy1 ðnÞ + ð1 - λðnÞÞy2 ðnÞ syðnÞ = A
ð3:55Þ
where y(n) = λ(n)y1(n) + (1 - λ(n))y2(n) is the output of adaptive combination FLANN filter and output yj(n)( j = 1, 2) of the jth adaptive FLANN controller is given as follows: yj ðnÞ = Wj T SðnÞ =
2p +1 X i=1
yj,i ðnÞ =
2p +1 X
w j T s ð nÞ
ð3:56Þ
i=1
and Wj(n) denotes the corresponding weight coefficients and is defined by: T Wj ðnÞ = wj,1 ðnÞ, wj,2 ðnÞ, ⋯wj,2p + 1 ðnÞ
ð3:57Þ
The signal matrix S(n) generated by the trigonometric function expansion is given by: T SðnÞ = s1 ðnÞ, s2 ðnÞ, ⋯s2p + 1 ðnÞ
ð3:58Þ
As for the combination scheme, the mixing parameter λ(n) is kept in the interval (0, 1) by the definition via a Sigmoid activation function as: λ ð nÞ =
1 1 + exp ½ - aðnÞ
ð3:59Þ
Notice that λ(n) increases monotonically with a(n) and lies between 0 and 1. It is also obtained the convex combination W(n) of the weights of the overall filter WðnÞ = λðnÞW1 ðnÞ + ð1 - λðnÞÞW2 ðnÞ
ð3:60Þ
Thus, the filter bank implementation of W(n) is expressed as: Wi ðnÞ = λðnÞW1,i ðnÞ + ð1 - λðnÞÞW2,i ðnÞ, i = 1, 2, . . . , 2P + 1
ð3:61Þ
Hence, according to the FxLMS algorithm, the cost function of the convex combined FsLMS algorithm (CFsLMS) is derived as follows. First, the cost function is given as:
102
3
FLANN Adaptive Filter
J ðnÞ = E e2 ðnÞ
ð3:62Þ
Taking the derivation of the above function yields: ∂J ðnÞ ∂syðnÞ = - 2eðnÞ ∂WðnÞ ∂WðnÞ
ð3:63Þ
where the overall error at time n is defined as: eðnÞ = dðnÞ - syðnÞ
ð3:64Þ
Subsequently, a different condition of the secondary path is discussed. When the secondary path is linear, the element in Eq. (3.64) is written as: ∂syðnÞ ∂syðnÞ ∂yðnÞ = = Alinear ðnÞ SðnÞ ∂WðnÞ ∂yðnÞ ∂WðnÞ
ð3:65Þ
While the secondary path is nonlinear, the output becomes a nonlinear function, which is described as: syðnÞ = f ½yðnÞ, yðn - 1Þ, . . . , yðn - m + 1ÞT
ð3:66Þ
As a consequence, the element in Eq. (3.64) is given as: m -1 X ∂syðnÞ ∂syðnÞ ∂yðn - kÞ = ∂WðnÞ ∂y ðn - kÞ ∂WðnÞ k=1
ð3:67Þ
Assume that the convex combination W(n) is slowly varying for small step sizes, so that: ∂yðn - kÞ ∂yðn - k Þ ≈ ∂WðnÞ ∂Wðn - kÞ
ð3:68Þ
Defining the nonlinear part as follows:
∂syðnÞ ∂syðnÞ ∂syðnÞ , , ..., ANonlinear ðnÞ = ∂yðnÞ ∂yðn - 1Þ ∂yðn - m + 1Þ The virtual secondary path is introduced and can get:
T ð3:69Þ
3.4
Convex Combination of FLANN Filter
∂syðnÞ = ANonlinear ðnÞ SðnÞ ∂WðnÞ
103
ð3:70Þ
The weight of each combined FLANN filter is individually adjusted by the FsLMS algorithm with its error and step size. And the update rule of the consisted filter is given as: W j ðn + 1Þ = W j ðnÞ + μj ej ðnÞS0 ðnÞ, j = 1, 2
ð3:71Þ
where the error of each component filter is computed as: ( ej ðnÞ =
dðnÞ - Alinear ðnÞ yj ðnÞ, when the secondary path is linear dðnÞ - ANonlinear ðnÞ yj ðnÞ, when the secondary path is nonlinear ð3:72Þ
and the filtered signal matrix S′(n) is calculated by: T S0 ðnÞ = s01 ðnÞ, s02 ðnÞ, . . . , s02P + 1 ðnÞ
ð3:73Þ
Without loss of generality, it is assumed that μ1 > μ2. As compared with the coefficient W2,i(n), W1,i(n) have a faster speed of convergence but a larger steadystate MSE. On condition that the fast FLANN filter significantly outperforms the slow one, the weight update can be modified to improve the performance of CNFSLMS algorithm as follows: W 2,i ðn + 1Þ = W 2,i ðnÞ + α μj ej ðnÞS0 ðnÞ + ð1 - αÞW 1,i ðn + 1Þ, i = 1, 2, ⋯, 2P + 1 ð3:74Þ where the parameter α (0 < α < 1) is close to 1. For the particular case of the FLANN combination scheme, it is the key factor how to adapt the mixing parameter α(n) so that the error of the overall filter can be minimized. It is obvious that the error e(n) of the overall filter can be calculated using the convex combination from eðnÞ = λðnÞe1 ðnÞ + ½1 - λðnÞe2 ðnÞ
ð3:75Þ
According to the stochastic gradient descent rule, the variable parameter α(n) is adapted by:
104
3
αðn + 1Þ = αðnÞ - μa
FLANN Adaptive Filter
∇J ðnÞ 2
ð3:76Þ
where the step size μa is used to keep λ(n) in the interval [0,1] and must be fixed to a very high value, with the result that the convex combination filter is adapted even faster than the fast FLANN filter. However, a disadvantage of this scheme is that α(n) stops updating whenever λ(n) is too close to 0 or 1. To deal with the drawback, the values of α(n) can be limited to the interval [-4, 4] [17]. In Eq. (3.76), ∇J(n) denotes the gradient estimator and is calculated by ∇J(n) = 2e(n)[e1(n) - e2(n)]λ(n)[1 - λ(n)]. Then, the update equation of mixing parameter α(n) is expressed as: αðn + 1Þ = αðnÞ - μa eðnÞ½e1 ðnÞ - e2 ðnÞλðnÞ½1 - λðnÞ
ð3:77Þ
The selection of the step size plays a crucial role in obtaining appropriate filter behavior. Thus, a normalized form is calculated as follows: αð n + 1 Þ = α ð n Þ -
μa eðnÞ½e1 ðnÞ - e2 ðnÞλðnÞ½1 - λðnÞ δ + r ð nÞ
ð3:78Þ
By using this update rule, the selection of μa is not affected by the signal-to-noise (SNR). Moreover, to further improve performance, a low-pass filtered estimation r(n) is used instead of the instantaneous value of [e1(n) - e2(n)]2 as follows: rðn + 1Þ = βrðnÞ = ð1 - βÞ½e1 ðnÞ - e2 ðnÞ2
ð3:79Þ
where the parameter β is a constant close to 1. Inevitably, the CNFSLMS algorithm needs calculation of the exponential function, which would result in heavy computational complexity. Therefore, to solve the problem, we use the modified Versorial function to replace the sigmoid function and come up with a modified CNFSLMS (MCNFSLMS) algorithm in the following parts. The modified Versorial function given in [18] is expressed as follows: 8 1 > > < 1 - Ba2 ðnÞ + 2 aðnÞ ≥ 0 λðnÞ = 1 > > að nÞ < 0 : 2 Ba ðnÞ + 2
ð3:80Þ
where B is a parameter to adjust the curve shape. Moreover, to guarantee convergence, it requires that B is larger than 1. To select the appropriate value of the parameter B in Eq. (3.80), the λ(n) curve with different B values (B = 1, 2, 3, 4, and 5) is plotted in Fig. 3.9a. It is observed
3.4
Convex Combination of FLANN Filter
105
Fig. 3.9 (a) The modified Versorial function with different B values. (b) The NMSE of the MCNFSLMS algorithm with different B values
106
3
FLANN Adaptive Filter
Table 3.1 Summary of the MCNFSLMS algorithm
that the modified Versorial function with B = 2 is the best approximation of the Sigmoid function. Figure 3.9b describes the normalized mean square error (NMSE) of the MCNFSLMS algorithm with different B values (B = 1, 2, 3, 4, and 5) in a nonlinear ANC case which is with a secondary path transfer function with minimum phase. It has shown that the performance of the MCNFSLMS algorithm with B = 2 outperforms other cases. The pseudo-code for the MCNFSLMS algorithm is listed in Table 3.1.
3.5
Random Fourier Filter
3.5
107
Random Fourier Filter
In above contents, we have introduced several nonlinear filters that based on the FLANN, which uses sines and cosines as basic functions. In this part, we continue to describe another black-box modeling technique. Different with the FLANN, we use random Fourier expansions (RFEs) to model the unknown function, because they offer a unique mix of computational efficiency, theoretical guarantees, and ease of use that make them ideal for online processing.
3.5.1
Random Fourier Feature
General neural networks are more expressive than random Fourier features; they are difficult to use and come without theoretical guarantees. Standard kernel methods suffer from high computational complexity because the number of kernels equals the number of measurements. RFEs have been originally introduced to reduce the computational burden that comes with kernel methods [19]. The basic scheme for the random Fourier feature filter is shown as Fig. 3.10 The random Fourier feature filter. Assume that we provided N scalar measurements yi taken at measurement points xi 2 ℝd as well as a kernel κ(xi, xj) that, in a certain sense, measures the closeness of two measurement points. To train the kernel expansion: f ð xÞ =
N X
ai κ ðx, xi Þ
ð3:81Þ
i=1
a linear system involving the kernel matrix [κ(xi, xj)]i, j has to be solved for the coefficients ai. The computational costs of training and evaluating grow linearly in the number of data points N, respectively. This can be prohibitive for large values of
Fig. 3.10 The random Fourier feature filter
108
3
FLANN Adaptive Filter
N. We now explain how RFEs can be used to reduce the complexity. Assuming the kernel k is shift-invariant and has Fourier transform p, it can be normalized such that p is a probability distribution [20]. That is, we have: κ xi , x j =
Z
T pðωÞe - iw ðxi - xj Þ dω
ð3:82Þ
We will use several trigonometric properties and the fact that k is real to continue the derivation. This gives: κ xi , xj =
Z Z
= =
1 2π
T pðωÞe - iw ðxi - xj Þ dω
pðωÞ cos wT xi - xj + pðωÞ Z
Z
2π
cos wT xi - xj + 2b dbdω
0
Z
2π
pðωÞ
cos wT xi - xj + cos wT xi - xj + 2b dbdω
0
Z Z 2π 1 pðωÞ 2 cos wT xi + b cos wT xj + b dbdω = 2π 0 = E 2 cos ΩT xi + B cos ΩT xj + B D 2 X ≈ cos ωk T xi + bk cos ωk T xj + bk D k=1 ð3:83Þ where ωk is vector in which elements are independent samples of the random variable Ω with probability distribution function (PDF) p, and bk 2 [0, 2π] are independent samples of the random variable B with a uniform distribution. N P 2 T For ck = D ai cos ðωk xi + bk Þ, we thus have: i=1
f ð xÞ =
N X
ai κ ðx, xi Þ ≈
i=1
D X
ck cos ωk T xi + bk
ð3:84Þ
k=1
It is noted that the number of coefficients D is now independent of the number of measurements N. This is especially advantageous in online applications where the number of measurements N keeps increasing.
3.5.2
RF-LMS Algorithm
The block diagram of the random Fourier filter is shown in Fig. 3.11. The random Fourier filters can simplify the filter iteration by converting the implicit kernel method apparent. The input vector is x(n) = [x(n), x(n - 1), . . ., x(n - L + 1)]T,
3.5
Random Fourier Filter
109
Fig. 3.11 The block diagram of single-channel NANC system based on random Fourier filter
which will be expanded through the RF(ωk, φk) module. Then, after passing through the cosine module, the D-dimensional random Fourier feature vector can be obtained as [21]: Ζð x ð n Þ Þ = ½ ζ 1 , ζ 2 , . . . , ζ D T ,
ð3:85Þ
where ζ i = cos (ωix(n) + φi), i = 1, 2, ⋯, D. ωk = [ω1, . . ., ωD] follows Gaussian distribution with zero mean and covariance matrix ε2I, which is denoted by ω~N(0, ε2I), and φk, k = 1, . . ., D is sampled from the uniform distribution [0, 2π]. The input signal is expanded to its RFF and then injected into an adaptive filter. Considering the filter weight vector as w(n) = [w1(n), w2(n), ⋯, wD(n)], the output of the random Fourier filter is computed as: yðnÞ = wT ðnÞΖðxðnÞÞ:
ð3:86Þ
The residual signal, which is the superposition of the desired and output signal, can be given as: eðnÞ = d ðnÞ - yðnÞ,
ð3:87Þ
According to the stochastic gradient descent approach, the updating cost function of the random Fourier filter-based LMS algorithm is given as:
110
3
FLANN Adaptive Filter
n o 1 J w = E e 2 ð nÞ , 2
ð3:88Þ
Approximating the expectation by the current values, and following the steepest descent recursion, yields: wðn + 1Þ = wðnÞ - μ∇Jw :
ð3:89Þ
where ∇Jw is the gradient of the cost function with respect to the filter coefficient vector. Therefore, the update rule is obtained as: wðn + 1Þ = wðnÞ + μeðnÞΖðnÞ
3.5.3
ð3:90Þ
Cascaded RF-LMS (CRF-LMS) Algorithm
It has been verified that the RFF is a powerful tool to release the heavy computation of kernel mapping [22]. However, the projection dimension of RFF demands considerable calculations to ensure accuracy. To save the computation cost of the random Fourier filter without degrading its performance, a novel model is designed in this part. The block diagram of the cascaded random Fourier filter-based nonlinear system identification model is shown in Fig. 3.12. The cascaded model can be divided into the main nonlinear RFF filter wa = ½a1 , a2 , . . . , aDs T and the linear cascaded filter wb = [b1, b2, . . ., bM]T [23]. For the nonlinear module, the input vector x(n) = [x(n), x(n - 1), ⋯, x(n - Ls + 1)]T is projected to Ds dimensions by random Fourier transform (RFT), and the expanded vector is given as: ΖDs ðnÞ = ½z1 ðnÞ, z2 ðnÞ, . . . , zDs ðnÞT
ð3:91Þ
where zk(n) = cos (rkx(n) + ψ k), k = 1, 2, . . ., Ds. rk = ½r 1 , . . . , r Ls follows Gaussian distribution with zero mean and covariance matrix ε2I which is denoted by r~N(0, ε2I), and ψ k, k = 1, . . ., Ds is sampled from the uniform distribution [0, 2π]. Accordingly, the output of the nonlinear module is obtained as: yx ð n Þ =
Ds X
ak ðnÞ cos ðrk xðnÞ + ψ k Þ
k=1
where ak(n), k = 1, 2, . . ., Ds is the update coefficients of wa(n).
ð3:92Þ
3.5
Random Fourier Filter
111
Fig. 3.12 Block diagram of the cascaded feedforward RFF system
The cascaded scheme reduces calculations by shortening the memory length of the reference signal and compensating the performance by the linear cascaded filter. The current random Fourier filter output yx(n) is delayed and employed to the auxiliary linear filter. Thus, the overall output of the designed filter can be calculated as: yc ðnÞ = yx ðnÞ +
M X
bl ðnÞyx ðn - l + 1Þ
ð3:93Þ
l=1
where bl(n), l = 1, 2, . . ., M is the update coefficients of the linear cascaded filter wb. The residual signal of the presented system is computed as: ec ðnÞ = d ðnÞ - yc ðnÞ
ð3:94Þ
Taking the cost function J = 12 ec 2 ðnÞ , the gradient estimators of the random Fourier filter and the linear filter can be defined apart as: 8 h ∂ec ðnÞ > < ∇J wa = ∂a , 1 ðnÞ h > : ∇J w = ∂ec ðnÞ , b ∂b1 ðnÞ
∂ec ðnÞ , ∂a2 ðnÞ
...,
∂ec ðnÞ ∂aDS ðnÞ
∂ec ðnÞ , ∂b2 ðnÞ
...,
∂ec ðnÞ ∂bM ðnÞ
iT
iT
According to Eq. (3.94), the error gradients can be further expressed as:
ð3:95Þ
112
3
FLANN Adaptive Filter
8 h iT ∂yc ðnÞ ∂yc ðnÞ > < ∇J wa = - ∂a ⋯ ∂a ð n Þ ð n Þ 1 DS h iT > : ∇J w = - ∂yc ðnÞ ⋯ ∂yc ðnÞ b ∂b1 ðnÞ ∂bM ðnÞ
ð3:96Þ
From Eq. (3.93), the partial derivative above can be defined as: yak ðnÞ =
∂yc ðnÞ ∂ak ðnÞ
= ð1 + b1 ðnÞÞ cos ðrk xðnÞ + ψ k Þ +
M X l=2
bl ð nÞ
∂yx ðn - l + 1Þ ,k = 1,2, . . . ,Ds ∂ak ðnÞ ð3:97Þ
ybm ðnÞ =
∂yc ðnÞ = yx ðn - m + 1Þ,m = 1,2, . . . ,M ∂bm ðnÞ
ð3:98Þ
According to [24], assuming that the step size is small enough, the approximation is made as: ∂yx ðn - l + 1Þ ∂yx ðn - l + 1Þ ≈ : ∂ak ðnÞ ∂ak ðn - l + 1Þ
ð3:99Þ
Therefore, Eq. (3.97) can be modified as: yak ðnÞ =
∂yc ðnÞ ∂ak ðnÞ
≈ ð1 + b1 ðnÞÞ cos ðrk xðnÞ + ψ k Þ +
M X
bl ðnÞyak ðn - l + 1Þ,k
= 1,2, . . . ,Ds
l=2
ð3:100Þ Substituting Eq. (3.97) and Eq. (3.98) into Eq. (3.96), the gradient estimators can be rewritten as: (
T ∇J wa ≈ - ya1 ðnÞ ya2 ðnÞ⋯yaDs ðnÞ T ∇J wb ≈ - yb1 ðnÞ yb2 ðnÞ⋯ybM ðnÞ
ð3:101Þ
Hence, the cascaded random Fourier least mean square (CRF-LMS) algorithm can be derived as:
3.5
Random Fourier Filter
113
( T wa ðn + 1Þ = wa ðnÞ + μa ðnÞec ðnÞ ya1 ðnÞ⋯yaDs ðnÞ T wb ðn + 1Þ = wb ðnÞ + μb ec ðnÞ yb1 ðnÞ⋯ybM ðnÞ
ð3:102Þ
Due to the regression nature of yak(n), once the length of linear auxiliary filter M is large, each iteration will require a lot of storage and computing. According to the assumption in [25], we assume that the recursion based on the past output gradients is negligible, that is the partial derivative of yc(n) with respect to ak(n) in time n is unrelated with gradients of previous times. Based on the above assumption, the gradient operator is obtained as: 8 >
: ∇J wb ≈ - ec ðnÞ½yx ðnÞ yx ðn - 1Þ⋯ yx ðn - M + 1Þ
, ð3:103Þ
Then, based on the stochastic gradient method, as a consequence, the cascaded random Fourier LMS algorithm is given as: wa ðn + 1Þ = wa ðnÞ + μa ðnÞec ðnÞZDs ðnÞ wb ðn + 1Þ = wb ðnÞ + μb ec ðnÞYðnÞ
ð3:104Þ
where μa(n) = μa(1 + b1(n)) and μa and μb are the step sizes for adaptive filters. ZDs ðnÞ = ½z1 ðnÞ z2 ðnÞ⋯zDs ðnÞT and Y(n) is attained by the delayed current output of wa(n), which is represented as: YðnÞ = ½yx ðnÞ yx ðn - 1Þ⋯yx ðn - M + 1ÞT
ð3:105Þ
The summary of CRF-FxLMS algorithm is given in Table 3.2.
3.5.4
Mean Convergence Analysis
According to Eq. (3.104), the optimal weight vector of the nonlinear RFF filter is assumed as wA , and the linear auxiliary filter is wB . The aberration between the weight vector and the optimal weight vector is calculated as: e A ðnÞ = wA ðnÞ - wA ðnÞ w e B ðnÞ = wB - wb ðnÞ w
:
ð3:106Þ
After subtracting both sides of Eq. (3.106) from the optimal weight vectors, the following equation is obtained:
114
3
FLANN Adaptive Filter
Table 3.2 Summary of CRF-FxLMS algorithm
e A ðnÞ - μa ðnÞec ðnÞZDs ðnÞ e A ð n + 1Þ = w w e B ðnÞ - μb ec ðnÞYðnÞ e B ð n + 1Þ = w w
ð3:107Þ
e e of the squared Euclidean Continue to assume h E a ðnÞ,i Eb ðnÞ as the h expectation i 2 e a ðnÞ=E kw e b ðnÞ=E kw e a ð nÞ k , E e b ðnÞk2 , Eq. (3.107) can be written as: norm, i.e.,E ea ðnÞ + E μa 2 ðnÞe2 c ðnÞZT Ds ðnÞZDs ðnÞ e a ð n + 1Þ = Ε E
w A ð nÞ - 2E μa ðnÞec ðnÞZT Ds ðnÞe h i e b ðnÞ + μb 2 E ½ec ðnÞ2 YT ðnÞYðnÞ e b ð n + 1Þ = E E - 2μb E ec ðnÞYT ðnÞe wb ðnÞ
ð3:108Þ
ð3:109Þ
e j ðn + 1Þ, j = a, b should be less than To guarantee the stability and convergence, Ε e e Ta ðnÞZDs ðnÞ, and Ε j ðnÞ, j = a, b. Considering ec(n) = ea(n) + eb(n), ea ðnÞ = w T e b ðnÞYðnÞ, we finally obtain the bounds of the step size as follows: e b ð nÞ = w 0 < μa
> df ð0Þ : , x=0 dx
ð3:121Þ
where q is a normal number that is not equal to 1, and when q → 1, the q derivative becomes a regular derivative. Let f(x) = xn, Eq. (3.121) becomes 8 n < q - 1 xn - 1 , q ≠ 1 Dq,x xn = q - 1 : n-1 nx , q=1
ð3:122Þ
Extending this concept to n dimensional variables, the q derivative of f(x) is defined as: T ∇q,x f ðxÞ = Dq1 ,x1 f ðxÞ, Dq2 ,x2 f ðxÞ, . . . , Dqn ,xn f ðxÞ
ð3:123Þ
where q = [q1, q2, . . ., qn]T, x = [x1, x2, . . ., xn]T. Based on the above definition, the q derivative of Eq. (3.123) with respect to e(n) = [e(n), e(n), . . ., e(n)]T is obtained as: T ∇q,eðnÞ J ðwÞ = Dq1 ,eðnÞ J ðwÞ, Dq2 ,eðnÞ J ðwÞ, . . . , Dqn ,eðnÞ J ðwÞ 8 p p < jqeðnÞj - jeðnÞj , q≠1 p qeðnÞ - eðnÞ Dq,eðnÞ jeðnÞj = : pjeðnÞjp - 1 signðeðnÞÞ, q = 1
ð3:124Þ ð3:125Þ
To visually see the property of the q-gradient function, we plot the function Dq,e(n)je(n)jp for different p and q in Fig. 3.13. The following conclusions can be observed from Fig. 3.13: 1. For the same value of q, the larger value of p, the steeper the gradient is. 2. For the same value of p, a large value of q indicates a steeper gradient. Specially, for q = 1 and p = 2, the q-gradient reduces to the similar form of the MSE-based algorithm. As can be seen, the value of the gradient Dq, e(n)je(n)jp with q ≠ 1 is larger than that of the gradient of the value q = 1 and p = 2. Therefore, such steeper gradient may lead to improved performance of the algorithm.
3.6
Nonlinear Active Noise Control
119
Fig. 3.13 (a) The gradients of Dq, e(n)je(n)jp ( p = 1, 1.5, and 2). (b) The gradients of Dq, e(n)je(n)|p ( p = 2, 3, and 4)
120
3
FLANN Adaptive Filter
Using q derivative descent method, the filter weight coefficient of FsqLMP algorithm is updated as: μ ∇ J ðw Þ 2 q,w
ð3:126Þ
∇q,w J ðwÞ = - 2 QFsqLMP X ðnÞ
ð3:127Þ
wðn + 1Þ = wðnÞ where
QFsqLMP is the diagonal matrix in FsqLMP algorithm, which is expressed as: QFsqLMP 2 3 p p p p jqð2P + 1ÞN eðnÞjp - jeðnÞjp e ð n Þj ð n Þj e ð n Þj ð n Þj jq jq je je 1 2 i5 =4 , , ..., h p½q1 eðnÞ - eðnÞ p½q2 eðnÞ - eðnÞ p qð2P + 1ÞN eðnÞ - eðnÞ ð3:128Þ Substituting Eq. (3.128) and Eq. (3.125) into Eq. (3.126), the filter weight updating formula of FsqLMP algorithm is obtained as: wðn + 1Þ = wðnÞ + μQFsqLMP X ðnÞ
ð3:129Þ
where μ is the step size, X(n) = s(n)x(n), x(n) is given as Eq. (3.113) (Fig. 3.14).
Fig. 3.14 The block diagram using trigonometric expansion for NANC
3.6
Nonlinear Active Noise Control
3.6.1.3
121
RFsLMS Algorithm
To improve the stability of the system under the alpha-stable distributed noise environment and speed up the algorithm convergence and improve the noise elimination ability, the robust filtered-s LMS (RFsLMS) algorithm has been proposed [30]. Similar to the FsLMS algorithm, the nonlinear controller of the RFsLMS algorithm also uses FLANN structure modeling. Observe the weight coefficient update formula of the FsLMS algorithm indicated in the formula. When the error signal is large, such as in an impact noise environment, the FsLMS algorithm may diverge (Fig. 3.15). Therefore, in the RFsLMS algorithm, the cost function is changed to: e 2 ð nÞ ξðwÞ = E log 1 + 2 2σ ðnÞ
ð3:130Þ
In the above formula, it is the estimated variance of the error signal, which can be obtained through the sliding estimation window, expressed as:
Fig. 3.15 The FLANN-based robust nonlinear ANC system trained using RFsLMS algorithm
122
3
FLANN Adaptive Filter
σ 2 ðn + 1Þ = σ 2 ðnÞ + Δm2e ðn + 1Þ ð3:131Þ 1 1 + ðeðn + 1Þ - me ðn + 1ÞÞ2 - ðeðn - H + 1Þ - me ðn + 1ÞÞ2 H H Among them me ðn + 1Þ = me ðnÞ + Δme ðn + 1Þ 1 ð e ð n + 1 Þ - e ð n - H + 1Þ Þ H
Δme ðn + 1Þ =
ð3:132Þ ð3:133Þ
where H is the estimated window length. Using the window function method, the weight coefficient update formula of the RFsLMS algorithm is: wðn + 1Þ = wðnÞ +
μ ∇ðnÞ 2
ð3:134Þ
∇(n) represents the instantaneous estimate of the cost function ξ, and the derivation process is as follows:
∇ ð nÞ =
n
∂ log 1 +
o
e2 ðnÞ 2σ 2 ðnÞ
∂wðnÞ
e ð nÞ = -2 2 X ð nÞ e ðnÞ + 2σ 2 ðnÞ
ð3:135Þ
Therefore, the weight coefficient update formula of the RFsLMS algorithm is:
e ð nÞ X ð nÞ wðn + 1Þ = wðnÞ + 2 e ðnÞ + 2σ 2 ðnÞ
ð3:136Þ
The RFsLMS algorithm uses a function of the error for weight update instead of the direct use of error signal employed in FsLMS algorithm. Figure 3.16 shows the transformation function employed in RFsLMS algorithm with σ 2 = 1. It can be observed that for larger values of e(n), the weight update is small and thus the algorithm is stable. The performance is also improved by the presence of the variance term in the denominator, which tends to a small value for non-impulsive samples. Further, the impact of high-amplitude impulses appearing in the reference signal has been significantly reduced by trigonometric expansion, which limits the strength of the expanded reference signal samples to [-1, 1]. However, the terms which appear directly in the expanded signal vector X(n) could affect the performance for very strong disturbances. But such a situation can also be avoided if an adaptive threshold scheme as suggested in [31] is applied.
3.6
Nonlinear Active Noise Control
123
Fig. 3.16 Schematic diagram of the transformation in the RFsLMS algorithm [for σ 2 = 1]
3.6.1.4
FsMCC Algorithm
As an information theoretic parameter, correntropy is a strong tool to develop robust adaptive algorithms. In this part, the filtered-s maximum correntropy criterion (FsMCC) algorithm for NANC system with FLANN filter is described. It has been an efficient control algorithm for heavy tail non-Gaussian noise in account of a robust similarity measure named correntropy [32]. The correntropy is the correlation between two random variables in a small neighborhood, which is described as: ð3:137Þ
V ðα, βÞ = E ½κ ðα, βÞ
where κ(α, β) is a Mercer’s kernel. The kernel considered in this part is the normalized Gaussian kernel given by: 1 κ ðα, βÞ = pffiffiffiffiffi exp σ 2π
-
ðα - β Þ2 2σ 2
ð3:138Þ
124
3
FLANN Adaptive Filter
Using the correntropy as a measure of the correlation between the d(n) and y(n). Thus the objective function that needs to be maximized to achieve the filter design may be written as: ζ ðwÞ =
n X
exp
i=0
dðnÞ - wðnÞXT ðnÞ 2σ 2
2 ! ð3:139Þ
The weight update rule using a gradient ascent approach may be written as: wðn + 1Þ = wðnÞ -
μ ∇ζ ðwÞ 2
ð3:140Þ
where μ is the step size and ∇ζ(w) denotes the gradient of the cost function nn with respect to the weight vector. Using Eq. (3.140) and Eq. (3.139), the update rule may be approximated as: wðn + 1Þ = wðnÞ + μ exp
e 2 ð nÞ eðnÞXðnÞ 2σ 2
ð3:141Þ
where μ is the step size, X(n) = s(n)x(n), and x(n) is given as Eq. (3.113).
3.6.1.5
RFF-FxMCC Algorithm
The block diagram of the random Fourier filter-based NANC system is shown in Fig. 3.17. The random Fourier filters can simplify the filter iteration by converting the implicit kernel method apparent. P(z) and S(z) represent primary noise path and secondary path, respectively. For the random Fourier filter-based NANC system, the nonlinear relation is explicit. The input vector x(n) = [x(n) x(n - 1)⋯x(n - L + 1)]T is calculated by the module RF(ωk, φk) directly, and after passing through the cosine module, the D-dimensional expansion vector is obtained as [30]: ΖðxðnÞÞ = ½ζ 1 ζ 2 ⋯ζ D T ,
ð3:142Þ
where ζ i = cos (ωix(n) + φi), i = 1, 2, ⋯, D. ωk = [ω1, . . ., ωD] follows Gaussian distribution with zero mean and covariance matrix ε2I which is denoted by ω~N(0, ε2I), and φk, k = 1, . . ., D is sampled from the uniform distribution [0, 2π]. The input signal is expanded to its RFF and then injected into an adaptive filter. Considering the filter weight vector as w(n) = [w1(n), w2(n), ⋯, wD(n)], the output of the random Fourier filter is computed as: yðnÞ = wT ðnÞΖðxðnÞÞ:
ð3:143Þ
3.6
Nonlinear Active Noise Control
125
Fig. 3.17 The block diagram of single-channel NANC system based on random Fourier filter
The residual signal, which is the superposition of the desired and output signal, can be given as: eðnÞ = dðnÞ - yðnÞ sðnÞ
ð3:144Þ
where denotes the linear convolution calculation and s(n) is the impulse response of the secondary path S(z). Through the b SðzÞ module modeled, the expanded input signal vector is filtered as h iT b b b ζ D , and each element at the nth iteration can be computed by: Ζs ðnÞ = ζ 1 ζ 2 ⋯b
b ζ i ð nÞ =
b N -1 X j=0
bsj ζ i - j , i = 1,2,⋯,D,
ð3:145Þ
where ζ k = 0, k ≤ 0 is set to supplement the convolution. The RFF-MCC algorithm was proposed in the basic of the MCC [20]. The current error e(k) is considered in MCC as the cost function, i.e.: J ðkÞ = exp
ðe2 ðk ÞÞ 2σ 2
ð3:146Þ
126
3
FLANN Adaptive Filter
Approximating the expectation by the current values, and following the steepest descent recursion, yields: wðn + 1Þ = wðnÞ - μ∇J:
ð3:147Þ
where ∇J is the gradient of the cost function with respect to the filter coefficient vector. By applying a gradient ascent method, the weight vector of RFF-FxMCC is therefore updated recursively, i.e.: ∂J ðkÞ ∂wð nÞ e2 ðk Þ b s ðnÞ, eðk ÞΖ = wðnÞ + η exp 2σ 2
wðn + 1Þ = wðnÞ + η
ð3:148Þ
b s ðnÞ is obtained by passing the RFF of input where η is the unified step size and Ζ vector through the modeled secondary path. In a special case, the RFF-FxMCC algorithm can be degenerated into the RFF-FxLMS algorithm while
e2 ðk Þ exp - 2σ 2 = 1.
3.7 3.7.1
Nonlinear Channel Equalization Communication Channel Equalization
Figure 3.18 shows the schematic of a wireless digital communication system with an equalizer at the front end of the receiver. The symbol {tk} denotes a sequence of T-spaced complex symbols of an L-QAM constellation in which both in-phase
Fig. 3.18 Schematic diagram of a wireless digital communication system with a channel equalizer
3.7
Nonlinear Channel Equalization
127
component {tk,I} and component {tk,Q} take one of the values pffiffiffi quadrature
± 1, ± 2, . . . , ± L - 1 , where 1/T denotes the symbol rate and k denotes the discrete time index. In a 4-QAM constellation, the unmodulated information sequence {tk} is given by: t k = ± 1 ± j1
ð3:149Þ
where the symbols (1, -1) are assumed to be statistically independent and equiprobable. In Fig. 3.18, the combined effect of transmitter-side filter and wireless transmission medium is included in the “channel.” A widely used model for a linear dispersive channel is an FIR filter whose output at the kth instant is given by NP h -1 hi t k - i , where hi denotes the FIR filter weights and Nh denotes the filter ak = i=0
order. Considering the channel to be a nonlinear one, the “NL” block introduces channel nonlinearity to the filter output. The discrete output of the nonlinear channel is given by: bk = ψ f a k , a k - 1 , a k - 2 , . . . , a k - N h + 1 ; h 0 , h 1 , . . . , h N h - 1 g ,
ð3:150Þ
where ψ{} is a nonlinear function generated by the “NL” block. The channel output is assumed to be corrupted with an additive Gaussian noise qk with a variance of σ 2. The transmitted signal tk after being passed through the nonlinear channel and added with the additive noise arrives at the receiver, which is denoted by rk. The received signal at the k th time instant is given by rk = rk, I + jrk, Q, where rk, I and rk, Q are the in-phase and quadrature components, respectively. The purpose of equalizer attached at the receiver front end is to recover the transmitted sequence tk or its delayed version tk-τ, where τ is the propagation delay associated with the physical channel. In case of a linear channel, an adaptive equalizer (e.g., an adaptive FIR filter) can be used. During training period, the equalizer takes the corrupted sequence rk and its delayed versions as input and produces an output yk. With the knowledge of a desired (or target) output dk(dk = tk-τ), it updates the filter weights so as to minimize the error ek(ek = dk - yk), using an adaptive algorithm (e.g., LMS algorithm). After the completion of training, the weights are frozen, and subsequently these weights are used to estimate the transmitted sequence. In this study, since we consider nonlinear channel models, we used NNs as equalizer in place of adaptive filter.
3.7.2
Channel Equalization Using a Generalized NN Model
Figure 3.19 depicts a schematic diagram of channel equalization for 4-QAM signals using NN. The in-phase component rk,I and quadrature component rk,Q at kth instant are passed through a delay line to obtain the current and past signals. The current and
128
3
FLANN Adaptive Filter
Fig. 3.19 Schematic diagram of an NN-based channel equalizer
the delayed signal samples constitute the input signal vector to the equalizer and are given by Rk = [rk,I, rk,Q, rk-1, I, rk-1, Q, . . .]T = [r1, r2, r3, . . .]T. During training phase, at kth instant, Rk is applied to the NN, and the NN produces an output Yk = [yk,I, yk,Q]T. The NN output is compared thereafter with the desired output Dk = [dk,I, dk,Q]T to produce an error signal. The error signal is used in the BP algorithm to update the weights. The training process continues iteratively until the MSE reaches a predefined small value. Thereafter, the NN weights are frozen. During the test phase and actual use, the NN weights obtained after training are used for equalization purpose. In this study, we use three different NN structures, i.e., MLP, FLANN, and proposed LeNN, for equalization of nonlinear channels. In addition, for comparison purpose, we have simulated a linear FIR-based adaptive equalizer trained with LMS algorithm. In the case of 4-QAM signal constellation, the channel equalization becomes a four-class classification problem. The NN structures, basically, create nonlinear decision boundaries in the input space by generating a discriminant function to classify the received signal into one of the four categories.
3.7.3
FLNN Equalizer
The block diagram of the m-dimensional input FLNN equalizer without hidden layers is shown in Fig. 3.20, where the trigonometric functions, the use of Chebyshev and Legendre orthogonal polynomials, and other methods to dimensionally expand the input pattern to enhance its representation in high-dimensional space are discussed. In addition, the structure has lower computational complexity and
3.7
Nonlinear Channel Equalization
129
Fig. 3.20 The FLNN equalizer structure with an m-dimensional input
higher convergence speed compared to conventional neural networks. Here the extended function block is composed of a subset of orthogonal sin and cos basis functions and the original patterns and their outer products for modeling nonlinear channels. For example, consider a two-dimensional input pattern U(n) = [u1, u2]T = [x(n), x(n - 1)]T. Each element in this vector can be modeled by the trigonometric function X(n) = [1u1 cos (πu1) sin (πu1)⋯u2 cos (πu2)sin(πu2)⋯u1u2]T. The adaptive algorithm is easy to train the network and has low complexity due to the absence of hidden layers. The FLNN-based equalizer outperforms others such as neural network structures for linear and nonlinear channel models and the main advantage is to further reduce the computational cost. Though the FLNN have some advantages, such as simpler structure, faster convergence, and lower computational complexity, the nonlinear approximation capacity is limited due to only one nonlinear function tanh(.). That is, the adaptive FLNN equalizer can only deal with linear and mild nonlinear distortions. For severe nonlinear distortions, the performance of the FLNN equalizer has been very limited. To further improve the performance, one can enlarge the dimensionality of its input signal space. However, this will significantly result in increasing the number of nodes in the input layer and increasing complexity in practical implementation. Therefore, it is very necessary and important to seek a novel method for improving the nonlinear processing capability of FLNN.
130
3.7.3.1
3
FLANN Adaptive Filter
Adaptive Equalizer with FLNN Cascaded with Chebyshev Orthogonal Polynomial
It is well known that nonlinear approximation capacity of the Chebyshev orthogonal polynomial is very powerful by the best approximation theory. Combining the characteristics of the FLNN and Chebyshev orthogonal polynomial, a novel structure nonlinear adaptive equalizer-FLNNCPAE is depicted in Fig. 3.21. The FLNNCPAE utilizes the FLNN input pattern and the nonlinear approximation capabilities of Chebyshev orthogonal polynomial to further improve their nonlinear processing performance. The adaptive algorithm for the novel nonlinear adaptive equalizer is given as follows [33]. Due to better performance of the instantaneous error NLMS among various adaptive algorithms in existence, the coefficient vector W(k) and A(k) are adjusted by the lower complexity NLMS algorithm. Let e(k) = d(k) - y(k), and instantaneous error E(k) is written as follows: E ðk Þ =
1 ðeðk ÞÞ2 2
ð3:151Þ
We can get the following updating equations by the NLMS algorithm: Aðk + 1Þ = Aðk Þ - η1
∂E ðkÞ
∂AðkÞ 1 + kC ðkÞk2
W ðk + 1Þ = W ðkÞ - η2
∂Eðk Þ
∂W ðkÞ 1 + kX ðkÞk2
ð3:152Þ
Fig. 3.21 The diagram of functional link neural network cascaded with Chebyshev orthogonal polynomial adaptive equalizer
3.7
Nonlinear Channel Equalization
131
where η1 and η2 are positive step-size values of the updating equation and both are supposed to be a small positive real value. By conducting partial derivative on E(k), we can obtain the following equalities: ∂E ðkÞ = - eðkÞCðk Þ ∂AðkÞ
ð3:153Þ
∂EðkÞ ∂yðkÞ ∂zðk Þ = - eð k Þ ∂W ðkÞ ∂zðk Þ ∂W ðkÞ N2 X 4e - 2uðkÞ aj ðkÞg0j ðzðkÞÞ = - eðkÞX ðkÞ 2 ð1 + e - 2uðkÞ Þ j = 1
ð3:154Þ
It will be shown that coefficient’s adaptation can be divided into two parts of adaptive algorithms. The first is used to adapt the FLNN filter weight W(k), which is written by: W ðk + 1Þ = W ðk Þ + η2 eðkÞX ðkÞ ×
N2 X j=1
4e - 2uðkÞ 2 ð1 + e - 2uðkÞ Þ
aj ðkÞg0j ðzðkÞÞ
1 1 + kX ðk Þk2
ð3:155Þ
The second part is used to adapt the Chebyshev orthogonal polynomial coefficient A(k), which is given by: 1 Aðk + 1Þ = AðkÞ + η1 eðk ÞCðkÞ 1 + k C ð k Þk2
ð3:156Þ
Selection of the step size in a range 0 < η1, η1 < 1/tr(R), where the matrix R is the autocorrelation matrix of the input signal, should result in the stable operation of the system, and adaptive algorithm convergence can be guaranteed. The summary of adaptive algorithm for the novel nonlinear adaptive equalizer is given as follows (Table 3.5).
3.7.3.2
Decision Feedback Equalizer Using the Combination of FIR and FLNN
Combination of finite impulse response (FIR) filter and functional link neural network (CFFLNNDFE) equalizer with the decision feedback structure is depicted in Fig. 3.22. CFFLNNDFE equalizer compensates the linear and nonlinear distortions and tracks the characteristic of time-varying channels. The CFFLNNDFE adequately utilizes the advantages of the FLNN and characteristics of the linear
132
3
FLANN Adaptive Filter
Table 3.5 Summary of the adaptive algorithm for the novel nonlinear adaptive equalizer
filter to improve the performance. Furthermore, the proposed equalizer based on the decision feedback (DF) structure can get better performance especially on timevarying channels with spectral nulls in their amplitude characteristics, and the major purpose of feeding the feedback signals directly into the input layer of the FLNN instead of the functional expansion blocks is to reduce the number of nodes in the input layer. By using this decision feedback structure, we can benefit from the feedback behavior without increasing the number of the input signals [34]. It is obviously observed that the nonlinear equalizer consists of two subsections. One is a FLNN with decision feedback equalizer (FLNNDFE), and another is a FIR filter. For the former, the input signals consist of the received signal vector RX(k) = [x(k), x(k - 1), . . ., x(k - m + 1)]T , where m = N1 is the length of the input signals. In addition, the feedback output signals are directly fed into the input layer of the FLNN instead of being taken as the input signals, and the feedback signal vector from the decision device is defined by: T V ðkÞ = v1 ðk Þ, v2 ðkÞ, . . . , vq ðk Þ = ½Signðyðk - 1ÞÞ, Signðyðk - 2ÞÞ, . . . , Signðyðk - qÞÞT
ð3:157Þ
3.7
Nonlinear Channel Equalization
133
Fig. 3.22 The diagram of the CFFLNNDFE equalizer
where q is the length of feedback signals, Sign() is a decision function and is defined by: Signðyðk - 1ÞÞ =
-1
yðk - 1Þ < 0
1
yðk - 1Þ ⩾ 0
ð3:158Þ
I = f1, 2, . . .g with the Consider a set of basis functions B = fφi 2 LðAÞgi2I , following properties for the FLNN subsection in Fig. 3.4, where LðAÞ denotes the space of (Lebesgue) measurable function:
134
3
FLANN Adaptive Filter
1. φ1 = 1 2. The subset Bj = fφi 2 BgNi2I1 , I = f1, 2, . . .g is linearly independent set, i.e., if N1 P wi φi = 0, then wi = 0, for all i = 1, 2,. . ., N1 i=1
3. sup
j P i=1
1=2 kφi k2A
< 1 , where “sup” represents supremum operation
Let BN = fφi 2 BgNi =1 1 , I = f1, 2, . . .g be a set of basis functions to be considered. The FLNN consists of N1 basis functions to make up the vector φðkÞ 2 BN: T φðkÞ = φ1 ðkÞ, φ2 ðkÞ, . . . , φN 1 ðkÞ . Thus the output signal z(k) of FLNNDFE is given as follows: zðk Þ = γ ðk Þ Sðuðk ÞÞ ! q N1 X X wi ðk Þφi ðRX ðk ÞÞ + bi ðkÞvi ðk Þ = γ ðk Þ S i=1
i=1
ð3:159Þ
= γ ðkÞ S W ðkÞT φðkÞ + BðkÞT V ðk Þ = γ ðkÞ S W 1 ðk ÞT X ðkÞ where B(k) = [b1(k), b2(k), . . ., bq(k)]T is a weight coefficient vector of the feedback signal vector V (k), and the weight and input vectors of the FLNNDFE are redefined by W1(k) = [W(k)T, B(k)T]T and X(k) = [φ(k)T, V(k)T]T , respectively. W (k) ðW ðkÞ = ½w1 ðkÞ, w2 ðkÞ, . . . , wN 1 ðk ÞT denotes the coefficient vector of the FLNN, and the lengths of the W1(k) and X(k) are both defined by N1 + q. y(k) represents the change of adjusted amplitude, and nonlinear function S(.) is defined by: SðuðkÞÞ =
2 -1 1 + e - 2ðuðkÞÞ
ð3:160Þ
In this study, the function is chosen by S(x) = tanh(x) in Matlab tool, because that the derivative of the S() is easily obtained and given by S′(x) = 1 - tanh2 (x). z1 ðk Þ =
m X
wN 1 + i ðk Þχ ðk - i + 1Þ = W 2 ðk ÞT RX ðkÞ
ð3:161Þ
i=1
where W 2 ðk Þ = ½wN 1 + 1 ðkÞ, wN 1 + 2 ðk Þ, . . . , wN 1 + m ðk ÞT is the coefficient vector of the FIR filter. So the overall output signal y(k) of the proposed equalizer is given by: yðkÞ = λðkÞzðkÞ + ð1 - λðk ÞÞz1 ðk Þ = λðkÞγ ðk ÞS W 1 ðkÞT X ðk Þ + ð1 - λðk ÞÞW 2 ðkÞT RX ðkÞ
ð3:162Þ
3.8
Computer Simulation Examples
135
where λ(k) is a convex combination parameter. The function of λ(k) in (3.162) is to make extreme values of λ(k) lead to either a pure FIR or a pure FLNNDFE (λ(k) = 1 or λ(k) = 0, respectively).
3.8
Computer Simulation Examples
In this part, we carry out experiments by MATLAB to test the robustness of different control algorithms. The standard average noise reduction (averaged noise reduction [ANR]) is used as the quantization index, which is obtained by averaging after the algorithm runs 100 times independently, and its expression is: ANRðnÞ = 20 log
Ae ðnÞ A d ð nÞ
ð3:163Þ
among that, Ae ðnÞ = λAe ðn - 1Þ + ð1 - λÞjeðnÞj, Ae ð0Þ = 0
ð3:164Þ
Ad ðnÞ = λAd ðn - 1Þ + ð1 - λÞjdðnÞj, Ad ð0Þ = 0
ð3:165Þ
where λ is a smoothing parameter, which is set as λ = 0.999 in this experiment.
3.8.1
FLANN-Based NANC with Minimum Phase Secondary Path System
The robust noise mitigation capability of the FsLMP, FsqLMP, RFsLMS, and FsMCC algorithms has been compared, considering a second-order FLANN model. The performance is evaluated through NANC systems with minimum phase secondary path. To verify the performance of algorithms under impulsive noise, a symmetric α-stable (SαS) distribution is modeled by the following characteristic function: ϕðt Þ = e - jtjα
ð3:166Þ
The performance is discussed under three different cases of impulsive noise as shown in Fig. 3.23: Case(a): α = 1.9, Case(b): α = 1.7, and Case(c): α = 1.5, where case(a) corresponds to the high impulsive environment and the case(c) is under the approximate White Gaussian Noise.
136
3
FLANN Adaptive Filter
Fig. 3.23 Waveforms of impulsive noise signal (a) α = 2; (b) α = 1.9; (c) α = 1.7; (d) α = 1.5
The primary noise is observed at the error microphone, and its expression is: d ðnÞ = uðn - 2Þ + δ1 u2 ðn - 2Þ - δ2 u3 ðn - 1Þ
ð3:167Þ
where δi, i = 1, 2, . . . measures the nonlinearity of the primary path, and it is set as δ1 = 0.08, δ2 = 0.04 in this part. The input signal is α-stable distributed noise, and the transfer function of primary path and secondary path with minimum phase characteristics are given as P(z) = z-3 - 0.3z-4 + 0.2z-5 and S(z) = z-2 + 0.5z-3, respectively. Simulation results of algorithms under different intensity of impulsive noises are shown in Figs. 3.21, 3.22, 3.23, and 3.24. It is shown that with the increasing of the noise occurrence probability and intensity, the control ability of the abovementioned algorithm generally decreased. Among the five comparative algorithms, the FxLMS algorithm does not possess robustness under impulsive environment. By contrast, the FxMCC algorithm has the best control effects on strong intensity of α-stable noise. The RFsLMS algorithm is the second ranked robust algorithm to counter with
3.8
Computer Simulation Examples
137
Fig. 3.24 ANR comparison curves of under impulsive noise with α = 2
Fig. 3.25 ANR comparison curves of under impulsive noise with α = 1.9
the impulsive noise environment among the five algorithms. FxLMP has little improvement for the FxLMS; however, it is still unstable and has slow convergence rate. The Jackson derivation improves the FxLMP algorithm in a large extent (Figs. 3.25, 3.26, and 3.27).
138
3
FLANN Adaptive Filter
ANR (dB)
Fig. 3.26 ANR comparison curves of under impulsive noise with α = 1.7
Fig. 3.27 ANR comparison curves of under impulsive noise with α = 1.5
3.8.2
Random Fourier Filter-Based NANC
3.8.2.1
Projection Dimension and Memory Length of Random Fourier Filter
In this part, we measure the steady-state ANR and time consumption of the RF-FxLMS algorithm for different settings of D to figure out the effects of projection dimension on algorithm performance. Another imperative factor for calculation amounts, memory length, is also discussed. The memory length of the input signal
3.8
Computer Simulation Examples
139
Fig. 3.28 Comparison curves of RF-FxLMS algorithm steady-state ANR and consuming time under different D values
determines the calculation amounts of nonlinear expansion, while the length of the linear auxiliary filter, indicated as M, is in command of system performance and stability. The logistic chaotic noise 1 is adopted as input, which is given as x(n) = λx(n - 1) [1 - x(n - 1)], n = 1, 2, 3, ⋯, where λ = 4 and x(n) = 0.9 if 0 ≤ n ≤ 1. The memory length is set as L=10. Besides, the transfer functions of the linear primary path and secondary path are given as P(z) = z-3 - 0.3z-4 + 0.2z-5 and S(z) = z-2 + 0.5z-3. The steady-state ANR is obtained by averaging the ultimate 200 iterations. Figure 3.28 compares the ANR level and consuming time of RF-FxLMS under different D values. The consuming time is based on a 1.8-GHz Intel Core i7 processor with 8GB of RAM, running MATLAB R2017b in Windows 10. It is clear that when the projection order increases, the noise reduction ability of the RF-FxLMS algorithm is enhanced. The projection order D of the random Fourier filter effect the accuracy of the algorithm, while this process is accompanied by a higher amount of calculation. It is also observed that the control effect of the algorithm gradually stabilizes after the projection order is greater than 300, while the consuming time is still increasing linearly. This is owing to the higher projection order D demands for more orders of FIR filter and so as to the greater computation required by the nonlinear module. The influence of different memory lengths on the RF-FxLMS algorithm is shown in Fig. 3.29, and in Fig. 3.30, we compare the performance of different delay lengths of the cascaded filter. In Fig. 3.29, it is observed that the memory length affects the performance of the algorithm, significantly. The diamond-shaped marked pink curve (L = 10) corresponds to the lowest noise that remains. However, after keeping on increasing L, the noise reduction level decreases. In Fig. 3.30, it is obvious that the CRF-FxLMS algorithm with cascaded segment and shorter memory length (L = 5) of the input signal not only attain but exceed the greatest ANR grade of the RF-FxLMS algorithm (L = 10).
140
3
FLANN Adaptive Filter
Fig. 3.29 Learning curves of RF-FxLMS algorithm with different memory length settings
3.8.2.2
Real Example: Random Fourier Filter-Based Active Traction Substation Noise Control
The traction substation noise is shown in Fig. 3.31, sampled through MATLAB and sound card with a sampling frequency of 8 kHz, 16 bits. The memory length of random Fourier filter is chosen as L = 200, and the linear auxiliary filter of cascaded random Fourier filter is selected as M = 70. To study the control effect of RF-FXLMS, CRF-FXLMS, RF-FXMCC, and CRF-FXMCC algorithms in part 6, the robustness of algorithms for primary path variation is considered. The primary and secondary paths are modeled by 256 and 100 FIR filters, of which the magnitude response and phase response are shown in Fig. 3.29. In this simulation, the secondary path modeling is fixed, while the primary path changes from primary path 1 (as Fig. 3.32a) to primary path 2 (as Fig. 3.32b) at the 10,000 iterations. The power spectrum within 100–800 Hz of the noise and residual noise is shown in Fig. 3.33. The traction substation noise is a broadband noise, of which the noise power is spread over the range of 40–60 dB. It is worth noting that because of the few outliers in this noise sample, the control effects of RF-FxLMS and RF-FxMCC algorithms are very similar. The RF-FxLMS and RF-FxMCC algorithm can achieve noise reduction of 10–20 dB, while CRF-FxLMS and CRF-FxGHSF algorithm can achieve 20–30 dB. From 400 to 800 Hz, the residual noise of the four control algorithms is similar, which is reduced to about 10 dB. From 800 to 1000 Hz, the RF-FxLMS and RF-FxMCC algorithms have little control effect, while the CRF-FxLMS and CRF-FxMCC algorithms still achieve an averaged noise power level of about 10 dB.
3.8
Computer Simulation Examples
141
Fig. 3.30 Comparison of the RF-FxLMS and the CRF-FxLMS algorithm with different settings
Fig. 3.31 Waveform of traction substation noise
The corresponding noise power level is shown in Fig. 3.34. It can be seen that the change of the primary path results in the mutation of the noise reductive level of control algorithms at 10000 iterations. The CRF-FxLMS and CRF-FxMCC
142
3
FLANN Adaptive Filter
Fig. 3.32 Magnitude response and phase response of the acoustic paths of primary path and secondary path
algorithms perform lower noise residual. Moreover, the control curves of the CRF-FXLMS and CRF-FxMCC algorithms are more stable in facing noise power variation.
3.8.3
Nonlinear Channel Equalization
Simulations were carried out extensively for the channel equalization problem, with several practical channels and nonlinear models, using three different NN structures and an FIR adaptive filter. The channel impulse response used for this study is given by [32]:
3.8
Computer Simulation Examples
143
Fig. 3.33 PSD of traction substation noise and residual noise controlled by RF-FxLMS, CRF-FxLMS, RF-FxMCC, and CRF-FxMCC algorithm
Fig. 3.34 ANR curves of RF-FxLMS, CRF-FxLMS, RF-FxMCC, and CRF-FxMCC algorithm for traction substation noise
h
i 1 2π 1 + cos ði - 2 Þ for i = 1,2 and 3 2 Λ =0
hð i Þ =
ð3:168Þ
The input correlation matrix is given by ℛ = E Rk RTk , where E is the expectation operator. The eigenvalue ratio (EVR) of a channel is defined as λmax/λmin, where λmax
144
3
FLANN Adaptive Filter
and λmin are the largest and smallest eigenvalues of R. The higher the value of EVR, the worse the channel in terms of channel spread, and it is more difficult to equalize. The parameter L determines the EVR of the channel: the larger the value of L, the higher the EVR. In order to study the channels under different EVR conditions, L varied between 2.9 and 3.5 in steps of 0.2. Thus, the values of EVR become 6.08, 11.12, 21.71, and 46.79 with L values of 2.9, 3.1, 3.3, and 3.5, respectively [35]. The corresponding normalized channel impulse responses in z-transform domain are given by: CH = 1,
Λ = 2:9 : 0:209 + 0:995z - 1 + 0:209z - 2
CH = 2,
Λ = 3:1 : 0:260 + 0:930z - 1 + 0:260z - 2
CH = 3, CH = 4,
Λ = 3:3 : 0:304 + 0:903z - 1 + 0:304z - 2 Λ = 3:5 : 0:341 + 0:876z - 1 + 0:341z - 2
ð3:169Þ
The transmitted message is 4-QAM signal constellation of the form ±1 ± j1. Each symbol was drawn from a uniform distribution. A zero mean white Gaussian noise was added to the channel output. The received signal was normalized to unity so that the signal-to-noise ratio (SNR) becomes equal to the reciprocal of noise variance. The current received symbol rk and past three symbols were used as input in FIR– LMS- and MLP-based equalizer. Whereas for FLANN- and LeNN-based equalizers, the current symbol rk and past two symbols were used as input (see Fig. 3.19). Thus, the FIR-based adaptive equalizer has 16 weights. (It is observed that the increase in the number of weights does not improve the equalizer performance.) A number of experiments were carried out to determine the optimum NN architecture, the learning rate a and momentum parameter β. A 2-layer MLP with {8 - 8 - 2} architecture, i.e., with the number nodes in the input, hidden and output layer as 8, 8, and 2 (excluding the bias unit), respectively, is selected. The 6-dimensional input has been expanded to an 18-dimensional enhanced pattern, by using trigonometric and Legendre polynomials, for FLANN and LeNN, respectively. Thus, both FLANN and LeNN have an architecture of {18 - 2}. Back- propagation algorithm was used to train the NNs. Details of architecture of the four equalizers and the chosen learning parameters are provided in [6]. Nonlinear tanh() function was used in all the nodes (except the input nodes) of the NNs. The delay parameter was selected as 1. The details of functional expansion are given in [6]. The three nonlinear models and a linear model used in this study (see “NL” block of Fig. 3.17) are given by [36]: NL = 0 : bk = ak NL = 1 : bk = tanh ðak Þ NL = 2 : bk = ak + 0:2a2k - 0:1a3k
ð3:170Þ
NL = 3 : bk = ak + 0:2a2k - 0:1a3k + 0:5 cos ðπak Þ The linear channel model is represented by NL = 0. A nonlinear channel model that may occur due to the saturation of transmitter amplifier is represented by
3.8
Computer Simulation Examples
145
NL = 1. The nonlinear models NL = 2 and 3 denote two arbitrary nonlinear channels [36]. The main reason for using the channel models and nonlinear models is that these models have been widely used by other researchers [33, 37, 38].
3.8.3.1
Channel Equalization Using a Generalized NN Model
(i) MSE Performance To study the convergence characteristics and MSE performance of the equalizers, each equalizer was trained with 3000 iterations. To smooth out the randomness of the NN simulation, the MSE was averaged over 500 independent runs. The MSE characteristics for CH = 2 with 15dB additive noise are shown in Fig. 3.35. It may be
Fig. 3.35 MSE characteristics of the NN-based equalizers for CH = 2 with SNR = 15 dB. (a) NL = 0, (b) NL = 1, (c) NL = 2, (d) NL = 3. Here “FLN” and “LEG” denote FLANN and LeNN, respectively. Note: the MSE characteristics of LeNN and FLANN almost overlap
146
3
FLANN Adaptive Filter
noticed that the MSE characteristics of LeNN and FLANN almost overlap each other. It is clear that the performance FIR–LMS-based linear equalizer is the worst among the four equalizers. Its MSE settles between -7 and -10 dB for the four NL models. In addition, it provided the slowest convergence rate. The MLP-based equalizer performs much better than the FIR equalizer. Its MSE settles between 15 and -20 dB for the four NL models. The performances of LeNN and FLANN are found to be similar. The MSE floor for both LeNN- and FLANN-based equalizers is about -23 dB for all the four NL models. The MSE convergence rate is also the fastest for LeNN- and FLANN-based equalizers. The MSE floor settles at about 1500 iterations for LeNN and FLANN equalizers, whereas in case of MLP, it takes about 3000 iterations. Similar performances are observed also for other channels with other values of additive noise. (ii) BER Performance All the four equalizer structures were trained for 3000 iterations, and their weights are frozen and stored. Thereafter, to calculate the bit error rate, the stored weights are loaded into the NN, and new test symbols are transmitted. Based on the new received samples, the equalizer estimates the transmitted symbol. If there is a mismatch between the transmitted symbol (delayed by 1) and the NN equalizer output, it gives a bit error. The BER was computed with 2 × 106 test symbols. This process was repeated for different values of additive noise ranging from 10 to 20dB with step increment of 1 dB. The BER performance of CH = 2 for the four NL models is shown in Fig. 3.36. As expected, the BER decreases as the SNR increases. The NL = 3 case is the most severe nonlinear model. It can be seen that the NN-based equalizers perform much better compared to FIR-LMS-based equalizer. Among the three NN-based equalizers, performance of MLP-based equalizer is inferior to the other two. Interestingly, the performances of LeNN- and FLANN-based equalizers are quite similar. In case of MLP-based equalizer, for NL = 3, when SNR rises from 15 to 20 dB, the log10(BER) falls from -1.68 to -3.22. In the same situation, in case of FLANNand LeNN-based equalizers, the BER fall is from 1.92 to -3.54 and from -1.92 to -3.47, respectively (see Fig. 3.36d). In case of more severe nonlinearity in the channel CH = 3, under similar situation, the BER falls for MLP-, FLANN-, and LeNN-based equalizers, which are from -1.03 to -1.51, from -1.21 to -1.76, and from -1.22 to -1.75, respectively (due to space constraint the figure is not provided). In order to show the performance of the equalizers under different channel nonlinearities, we have plotted BER with varying EVR at SNR = 15 dB, in Fig. 3.37. We have stated that CH = 1; CH = 2; CH = 3, and CH = 4 correspond to EVR values of 6.08, 11.12, 21.71, and 46.82, respectively. The higher the EVR, the more difficult is the channel to equalize. It is observed that as EVR increases, the BER also rises. However, the rise of BER is less severe in case of FLANN- and LeNN-based equalizers compared to MLP-based equalizer. It can be seen that in case of NL = 3, as the EVR increases from 6.08 (CH = 1) to 46.82 (CH = 4), the rise of log10(BER) for the MLP-, FLANN-, and LeNN-based equalizers is given
3.8
Computer Simulation Examples
147
Fig. 3.36 BER performance the NN-based equalizers for CH = 2. (a) NL = 0, (b) NL = 1, (c) NL = 2, (d) NL = 3. Here “FLN” and “LEG” denote FLANN and LeNN, respectively. Note. The BER performance of LeNN and FLANN almost overlap
from -2.77 to -0.69, from -3.06 to -0.85, and from -3.04 to -0.87, respectively (see Fig. 3.37d). Thus, LeNN- and FLANN-based equalizers perform better than MLP-based equalizer, as EVR increases. As the performance of ANN algorithms depends on the random initialization of the network parameters, it is important to analyze the statistical behavior of the algorithm by repeating the experiments several times. We need to calculate 95% or 99% confidence interval of the results to ascertain that it is narrow enough so that we can use the algorithm reliably. To calculate the confidence interval of our algorithm and to compare the results with other competitive algorithms, we selected two cases of BER experiments, for SNR = 12 and 16 dB (please refer to Fig. 3.36d). Out of the four algorithms, we choose our proposed LeNN algorithm and the FLANN algorithm. The FLANN algorithm was chosen as its performance is closely similar to that of LeNN. We repeated the experiments 20 times, starting from random initialization of parameters
148
3
FLANN Adaptive Filter
Fig. 3.37 EVR performance the NN-based equalizers for CH = 2 with SNR = 15 dB. (a) NL = 0, (b) NL = 1, (c) NL = 2, (d) NL = 3. Here “FLN” and “LEG” denote FLANN and LeNN, respectively. Note. The EVR performance of LeNN and FLANN almost overlap
followed by training of the parameters and finally finding the BER. Thus, we obtain two sets of BER results, for LeNN and FLANN, at SNR = 12 dB. Let us name them as ΓL12 and ΓF12. Similarly, experiments were performed for SNR = 16 dB, and the corresponding two sets of results are named as ΓL16 and ΓF16 for LeNN and FLANN algorithms, respectively. To ascertain normal distribution of data, we have used the normal probability plot method. Considering n data points, this is done by sorting of data in ascending order in each set, assigning the index i to each data element, calculating fi = (i - 0.375)/(n + 0.25) corresponding to each data i, and finally plotting data value xi versus fi. An approximate straight line for each data set ensures the normal distribution of data. The mean and standard deviation for four data sets were μL12 = - 1.3557, μF12 = - 1.3500, σ F12 = 0.0069, μL16 = - 2.1749, σ L12 = 0.0169, μF16 = - 2.1733, and σ F16 = 0.0159 for four sets of data ΓL12,
3.8
Computer Simulation Examples
149
Fig. 3.38 The signal constellation produced by the equalizers for CH = 2 with SNR = 15 dB and NL = 3: (a) FIR–LMS, (b) MLP, (c) FLANN, (d) LeNN. Here “FLN” and “LEG” denote FLANN and LeNN, respectively
ΓF12, ΓL16, and ΓF16, respectively. Corresponding 99% confidence intervals are computed and obtained as -0.0043, -0.0040, -0.0097, and -0.0091, respectively. Very low values of confidence interval ensure very high confidence in our algorithm. Though confidence interval for SNR = 16 dB is slightly more than that for SNR = 12 dB, their absolute values are still extremely small. Moreover the intervals for LeNN and FLANN are almost same. (iii) Signal Constellation Diagrams The signal constellation diagram provides a visual representation of the performance of the equalizer. To obtain signal constellation diagram, we trained the equalizers with 3000 samples. Thereafter, we fed 1000 new samples to the equalizer to obtain the equalizer outputs. The equalizer output for these 1000 samples is plotted in Fig. 3.38 for CH = 2 with additive noise of 15 dB and NL = 3. It can
150
3
FLANN Adaptive Filter
Fig. 3.39 The signal constellation produced by the MLP- and LeNN-based equalizers for CH = 4 with SNR = 15dB: (a) MLP, NL = 2, (b) MLP, NL = 3 (c) LeNN, NL = 2, (d) LeNN, NL = 3. Here “LEG” denotes LeNN
be seen that the signal constellation of the FIR-LMS-based equalizer is the clumsiest. The signal constellation of the MLP-based equalizer is less clean than the FLANNand LeNN-based equalizers. However, it can be seen that the signal constellation produced by the LeNN- and FLANN-based equalizers is similar. To demonstrate superior performance of LeNN over MLP, the signal constellation diagrams for a high EVR channel (CH = 4) with NL = 2 and 3 nonlinear models are shown in Fig. 3.39. It can be seen that the LeNN-based equalizer provides a much clean signal constellation than the MLP-based equalizer for high EVR channel with severe nonlinearity. Generally the signal constellation diagram provides a qualitative assessment of the equalizer performance. However, in order to have a quantitative assessment of the constellation diagram, we carried out the computations as follows. After plotting the 1000 equalizer outputs as shown in Fig. 3.38 or 3.39, we counted the number of data points whose both in-phase and quadrature component values are greater than -
3.8
Computer Simulation Examples
151
0.25 (i.e., the number of data points lying near the four corners of the square). Larger number of data points concentrating in these regions implies better performance of the equalizer. In Fig. 3.38, the number of such data points found to be 839, 968, 993, and 991 for the FIR–LMS-, MLP-, FLANN-, and LeNN-based equalizers, respectively. The number of such data points in Fig. 3.39a–d is found to be 947, 803, 981, and 927, respectively. Higher number of data points concentrated about the four corners of the square indicates that performance of LeNN- and FLANN-based equalizers is similar, but LeNN-based equalizer performs much better than MLP-based equalizer, especially for high-EVR channels with severe nonlinearities.
3.8.3.2
Adaptive Equalizer Based on the FLNN Cascaded with Chebyshev Orthogonal Polynomial Structure
(i) MSE Performance The convergence characteristics of the MSE for CH = 3 at SNR of 15 dB are plotted in Fig. 3.40, where in the simulation, we take CH = 3 and SNR = 15 dB, and the MSE is calculated [33]. For different linear and nonlinear channels, results obtained
Fig. 3.40 Convergence characteristics of the four nonlinear equalizers for CH = 3 with different nonlinear models at SNR of 15 dB: (a) NL = 0, (b) NL = 1, (c) NL = 2, and (d) NL = 3
152
3
FLANN Adaptive Filter
Fig. 3.41 BER performance of the four nonlinear equalizers for CH = 3 with variation of SNR: (a) NL = 0, (b) NL = 1, (c) NL = 2, and (d) NL = 3
from the simulations are shown that the convergence rate of the FLNNCPAE is superior to the FLNN, MLP, and RBF, and all show faster convergence. As the nonlinear intensity increases, the steady-state error of FLNN, MLP, and RBF also increases. However, the FLNNCPAE all the times holds lower the steady-state error compared to other equalizers. In particular, for very severe nonlinear distortions, the MSE of the FLNNCPAE is about -40 dB and only degrades by 15 dB compared to the other equalizers. Then, simulations show that the novel adaptive equalizer can remove the different linear and nonlinear distortions. (ii) BER Performance BER performance of the four nonlinear equalizers for CH = 3 for both linear and nonlinear channel models is plotted in Fig. 3.41 with variation of SNR values from 8 to 16 dB for the linear (NL = 0) and three nonlinear channel models (NL = 2, 3, and 4). Though the computational complexity of FLNNCPAE is not better than the FLNN, it is obtained that the FLNNCPAE is superior to the FLNN, MLP, and RBF in terms of BER performance. Especially, for severe nonlinear channel model (NL = 3), the FLNNCPAE much outperforms other structures. Also the performance of the FLNN, RBF, and MLP is more or less similar for a wide variation of SNR values.
3.8
Computer Simulation Examples
153
Fig. 3.42 Effect of variation of EVR on BER performance at SNR of 10 dB: (a) NL = 0, (b) NL = 1, (c) NL = 2, and (d) NL = 3
For the four different equalizers, the effect of the variation of EVR on the BER performance of the equalizers at SNR of 10 dB is depicted in Fig. 3.42. For all the equalizers, the BER increases with an increase in EVR for both linear and nonlinear models. However, the performance degradation because of the increase of EVR is less severe in the FLNNCPAE than that of other equalizers. The performance of all the equalizers deteriorates with the introduction of nonlinear distortions. However, results obtained from the simulations show that the FLNNCPAE on the BER performance is superior to the FLNN, MLP, and RBF with linear and the three nonlinear models for a wide variation of EVR from 1 to 46.8, and we found the performance of the MLP-based equalizers is similar to the RBF-based ones. (iii) The Eye Patterns The eye patterns (the equalizer output values) provide another indication of effectiveness of the equalization process. The eye patterns of the equalizer input and output signals for CH = 3, NL = 4, and SNR = 15 dB are plotted in Fig. 3.43. For ease of visualization, only 2500 sample points are drawn in the figures. The input signals of the equalizer distorted by the noise are shown in Fig. 3.43a. The outputs of the equalizers exploiting FLNNCPAE, FLNN, MLP, and RBF are given in
154
3
FLANN Adaptive Filter
Fig. 3.43 Eye patterns of the four equalizers with 2500 symbols for CH = 3 with the nonlinear model (NL = 3) at SNR of 15 dB: (a) noisy input, (b) FLNNCPAE, (c) FLNN, (d) RBF, and (e) MLP
Fig. 3.43b–d, respectively. It can be seen from Fig. 3.40 that the equalizer output signals are well concentrated at the desired values 71. And FLNN is found to be lightly worse. However, for MLP and RBF, the values of the equalizer output are widely spread along the +1 and -1, and both are very close. So these results clearly demonstrate that the effectiveness of channel equalization using FLNNCPAE is superior to the other three equalizers. Similar observations can be made for all the four channels with the linear and the four nonlinear models studied.
3.8
Computer Simulation Examples
3.8.3.3
155
Adaptive Decision Feedback Equalizer with the Combination of FIR Filter and FLANN
For nonlinear time-variant channel, the channel coefficients ai(k)(i = 1, 2) are varying with time k. The time-varying coefficients are generated by the application of a second-order Markov model in which a white Gaussian noise source drives a second-order Butterworth low-pass filter [34]. Channel model 1 of minimum phase channel The nonlinear time-variant channel model which has a minimum phase is given by: rðkÞ = a1 ðkÞxðkÞ + a2 ðkÞxðk - 1Þ - 0:9½a1 xðk Þ + a2 ðkÞxðk - 1Þ3 + vðkÞ Note that we centered a1(k) about 1 and a2(k) about 0.5. Channel model 2 of non-minimum phase channel The nonlinear time-variant channel model which has a non-minimum phase is given by: r ðkÞ = a1 ðkÞxðkÞ + a2 ðk Þxðk - 1Þ + a3 ðkÞxðk - 2Þ + 0:2½a1 ðkÞxðk Þ + a2 ðkÞxðk - 1Þ + a3 ðk Þxðk - 2Þ2 + vðkÞ Note that we centered a1(k) about 0.3482, a2(k) about 0.8704 and a3(k) about 0.3482. Numerous experiments are carried out to give the best result in the four nonlinear equalizers. In order to have a fair comparison among the FLNN, RBF, LMSDFE, and CFFLNNDFE, in the RBF-based equalizer, a two-layer structure is selected in which the numbers of nodes excluding the bias unit in the input, hidden, and output layer are 2, 30, and 1, respectively. The trigonometric functions are used for the functional expansion of the input pattern, and then in both of the cases, input pattern is expanded from the 5-dimensional to a 26-dimensional input. (i) MSE Performance The convergence characteristics include the convergence speed and stable-steady error of the four equalisers. Simulation results are averaged over 100 independent experiments in time-invariant and time-variable channels for different BPSK random sequences. Figure 3.44 depicts the coefficient modules for different BPSK random sequences and random initialisation with β = 0.1. Each run has coefficients for the equalizers. And SNR of 20 dB is applied. From the figures, we observe that the MSE curves of the FLNN and RBF are not distinguishable, and the MSE performance of the FLNN and RBF both outperform the LMSDFE. Furthermore, convergence performance of the CFFLNNDFE is obviously superior to the others for either time-invariant or time-variant channel modules. Fast convergence rates of the CFFLNNDFE come from the superiority of SMNLMS algorithm and combination type.
156
3
FLANN Adaptive Filter
Fig. 3.44 Convergence properties of the equalizers under SNR = 20 dB. (a) Time-invariant channel model 1. (b) Time-invariant channel model 2. (c) Time-variant channel model 1. (d) Time-variant channel model 2
(ii) BER Performance In the first experiment, we fixed beta = 0.1 and run simulations for night different SNR ranging from SNR = 4 dB to SNR = 20 dB at 2 dB intervals (4:2:20). In Fig. 3.45, BER performance comparisons are presented. In each trial, the 5000 BPSK samples are used for training, and the next 10,000 signals are used for testing. The coefficient vectors of the equalizers are frozen after the training stage, and then the test is continued. It is obviously seen that the CFFLNNDFE shows better performance than the others. In our second experiment, we use the fixed SNR at 20 dB and run simulations for eight different beta values ranging from the BER and standard deviation of BER with respect to different beta = 0.04 to beta = 0.32, with step size 0.04 (0.04:0.04:0.32). Standard deviations of beta are depicted in Fig. 3.46. For the LMS-DFE, we can see that the BER performance is much worse than the other three nonlinear equalizers, and the FLNNs are almost the same as the RBFs. Moreover, results obtained from the figure show that the CFFLNNDFE is better than that of others in terms of BER of various standard deviation values for time-variant nonlinear channel.
3.8
Computer Simulation Examples
157
Fig. 3.45 BER performance of the equalizers. (a) Time-invariant channel model 1. (b) Timeinvariant channel model 2. (c) Time-variant channel model 1. (d) Time-variant channel model 2
Fig. 3.46 BER performance comparison with changing standard deviation. (a) BER performance comparison of time-variant nonlinear channel model 1. (b) BER performance comparison of timevariant nonlinear channel model 2
158
3
FLANN Adaptive Filter
Fig. 3.47 Eye patterns of the four equalizers with 5000 symbols for time-variant nonlinear channel model 2 at SNR of 20 dB. (a) Noisy input. (b) CFFLNNDFE. (c) FLNN. (d) RBF. (e) LMSDFE
(iii) The Eye Patterns It is well known that another indication of effectiveness of the equalization process is presented by the eye patterns (the equalizer output values). Figure 3.47 depicts the eye patterns of the equalizer input and output signals for time-variant nonlinear channel model 2, where 5000 sample points are plotted in the figures. The input signals of the equalizers distorted by the noise are shown in Fig. 3.47a. The
References
159
outputs of the equalizers using CFFLNNDFE, FLNN, RBF, and LMSDFE are given in Fig. 3.47b–e, respectively. It is observed that from Fig. 3.47 that the output signals of the proposed FLNN and RBF show a little bit worse, whereas the output values equalizer are well concentrated at the desired values ±1. Besides, it illustrates that the effectiveness of the LMSDFE is widely spread along +1 and -1. There, channel equalization using the CFFLNNDFE is superior to the other three equalizers.
3.9
Summary
In this chapter, the FLANN filter for nonlinear systems is introduced. The FLANN filter has stronger nonlinearity than the Volterra expansion as in Chap. 3. It is a single-layer learning network without a hidden layer. Hence, it also has a low computation burden. Two typical improvements for the FLANN filter are described. Moreover, some robust FLANN filtering algorithms have been introduced and derived for the FLANN filter in the NANC application. In the next chapter, a similar nonlinear approximation method based on the spline adaptive filter will be presented.
References 1. Y. Tian and Z. Zhang, “Identification of Nonlinear Dynamic Systems Using Neural Networks,” Proc. Int. Symp. Test Meas., vol. 2, no. 2, pp. 997–1000, 2003. 2. G. L. Sicuranza and A. Carini, “A generalized FLANN filter for nonlinear active noise control,” IEEE Trans. Audio, Speech Lang. Process., vol. 19, no. 8, pp. 2412–2417, 2011. 3. K. Burse, R. N. Yadav, and S. C. Shrivastava, “Channel equalization using neural networks: A review,” IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., vol. 40, no. 3, pp. 352–357, 2010. 4. C. B. Borkowf, “Neural Networks: A Comprehensive Foundation (2nd Edition),” Technometrics, vol. 44, no. 2, pp. 194–195, 2002. 5. J. C. Patra, “Chebyshev neural network-based model for dual-junction solar cells,” IEEE Trans. Energy Convers., vol. 26, no. 1, pp. 132–139, 2011. 6. J. C. Patra, P. K. Meher, and G. Chakraborty, “Nonlinear channel equalization for wireless communication systems using Legendre neural networks,” Signal Processing, vol. 89, no. 11, pp. 2251–2262, 2009. 7. I. Shingareva and C. Lizárraga-Celaya, “Special Functions and Orthogonal Polynomials BT Maple and Mathematica: A Problem Solving Approach for Mathematics,” I. K. Shingareva and C. Lizárraga-Celaya, Eds. Vienna: Springer Vienna, 2009, pp. 261–268. 8. D. P. Das and G. Panda, “Active mitigation of nonlinear noise processes using a novel filtered-s LMS algorithm,” IEEE Trans. Speech Audio Process., vol. 12, no. 3, pp. 313–322, 2004. 9. R. M. A. Zahoor and I. M. Qureshi, “A modified least mean square algorithm using fractional derivative and its application to system identification,” Eur. J. Sci. Res., vol. 35, no. 1, pp. 14–21, 2009. 10. D. C. Le, J. Zhang, and Y. Pang, “A bilinear functional link artificial neural network filter for nonlinear active noise control and its stability condition,” Appl. Acoust., vol. 132, no. March 2017, pp. 19–25, 2018.
160
3
FLANN Adaptive Filter
11. L. Luo, W. Zhu, and A. Xie, “A novel acoustic feedback compensation filter for nonlinear active noise control system,” Mech. Syst. Signal Process., vol. 158, p. 107675, 2021. 12. H. Zhao, X. Zeng, Z. He, T. Li, and W. Jin, “Nonlinear adaptive filter-based simplified bilinear model for multichannel active control of nonlinear noise processes,” Appl. Acoust., vol. 74, no. 12, pp. 1414–1421, 2013. 13. H. Zhao, X. Zeng, Z. He, and T. Li, “Adaptive RSOV filter using the FELMS algorithm for nonlinear active noise control systems,” Mech. Syst. Signal Process., vol. 34, no. 1–2, pp. 378–392, 2013. 14. R. Majhi, G. Panda, and G. Sahoo, “Development and performance evaluation of FLANN based model for forecasting of stock markets,” Expert Syst. Appl., vol. 36, no. 3, pp. 6800–6808, Apr. 2009. 15. G. L. Sicuranza and A. Carini, “Adaptive recursive FLANN filters for nonlinear active noise control,” in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 4312–4315. 16. H. Zhao, X. Zeng, Z. He, S. Yu, and B. Chen, “Improved functional link artificial neural network via convex combination for nonlinear active noise control,” Appl. Soft Comput. J., vol. 42, pp. 351–359, 2016. 17. M. Ferrer, A. Gonzalez, S. Member, and M. De Diego, “Convex Combination Filtered-X Algorithms for Active Noise Control Systems,” vol. 21, no. 1, pp. 156–167, 2013. 18. J. Kivinen, A. J. Smola, and R. C. Williamson, “Online learning with kernels,” IEEE Trans. Signal Process., 2004. 19. T. Deb, D. Ray, and N. V George, “A Reduced Complexity Random Fourier Filter Based Nonlinear Multichannel Narrowband Active Noise Control System,” IEEE Trans. Circuits Syst. II Express Briefs, vol. 68, no. 1, pp. 516–520, 2021. 20. X. Xu and W. Ren, “Random Fourier feature kernel recursive maximum mixture correntropy algorithm for online time series prediction,” ISA Trans., no. xxxx, 2021. 21. A. Rahimi and B. Recht, “Random features for large-scale kernel machines,” in Advances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference, 2009. 22. Y. Zhu, H. Zhao, X. He, Z. Shu, and B. Chen, “Cascaded Random Fourier Filter for Robust Nonlinear Active Noise Control,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 9290, no. c, 2021. 23. K. Pelekanakis and M. Chitre, “Adaptive sparse channel estimation under symmetric alphastable noise,” IEEE Trans. Wirel. Commun., vol. 13, no. 6, pp. 3183–3195, 2014. 24. H. Zhao, X. Zeng, Z. He, T. Li, and W. Jin, “Nonlinear adaptive filter-based simplified bilinear model for multichannel active control of nonlinear noise processes,” Appl. Acoust., vol. 74, no. 12, pp. 1414–1421, 2013. 25. L. Eriksson, M. Allie, and R. Greiner, “The selection and application of an IIR adaptive filter for use in active sound attenuation,” IEEE Trans. Acoust., vol. 35, no. 4, pp. 433–437, 1987. 26. M. Shao and C. L. Nikias, “Signal Processing with Fractional Lower Order Moments: Stable Processes and Their Applications,” Proc. IEEE, vol. 81, no. 7, pp. 986–1010, 1993. 27. L. Lu and H. Zhao, “Adaptive Volterra filter with continuous lp-norm using a logarithmic cost for nonlinear active noise control,” J. Sound Vib., vol. 364, pp. 14–29, 2016. 28. K. Yin, H. Zhao, and L. Lu, “Functional link artificial neural network filter based on the q-gradient for nonlinear active noise control,” J. Sound Vib., vol. 435, pp. 205–217, 2018. 29. N. V. George and G. Panda, “A robust filtered-s LMS algorithm for nonlinear active noise control,” Appl. Acoust., vol. 73, no. 8, pp. 836–841, 2012. 30. N. C. Kurian, K. Patel, and N. V. George, “Robust active noise control: An information theoretic learning approach,” Appl. Acoust., vol. 117, pp. 180–184, 2017. 31. E. Roy, R. W. Stewart, and T. S. Durrani, “High-order system identification with an adaptive recursive second-order polynomial filter,” IEEE Signal Process. Lett., vol. 3, no. 10, pp. 276–279, 1996.
References
161
32. S. Wang, L. Dang, B. Chen, S. Duan, L. Wang, and C. K. Tse, “Random Fourier Filters Under Maximum Correntropy Criterion,” IEEE Trans. Circuits Syst. I Regul. Pap., vol. 65, no. 10, pp. 3390–3403, 2018. 33. H. Zhao and J. Zhang, “Functional link neural network cascaded with Chebyshev orthogonal polynomial for nonlinear channel equalization,” Signal Processing, vol. 88, no. 8, pp. 1946–1957, 2008. 34. H. Zhao, X. Zeng, X. Zhang, J. Zhang, Y. Liu, and T. Wei, “An adaptive decision feedback equalizer based on the combination of the FIR,” Digit. Signal Process., vol. 21, no. 6, pp. 679–689, 2011. 35. S. Haykin, “Adaptive Filter Theory (3rd Ed.) by Simon Haykin.pdf,” pp. 1–997, 2002. 36. J. C. Patra and R. N. Pal, “A functional link artificial neural network for adaptive channel equalization,” Signal Processing, 1995. 37. C. T. Yen, W. De Weng, and Y. T. Lin, “FPGA realization of a neural-network-based nonlinear channel equalizer,” IEEE Trans. Ind. Electron., 2004. 38. W. De Weng, C. S. Yang, and R. C. Lin, “A channel equalizer using reduced decision feedback Chebyshev functional link artificial neural networks,” Inf. Sci. (Ny)., 2007.
Chapter 4
Spline Adaptive Filter
4.1
Introduction
In Chaps. 3 and 4, we have introduced the nonlinear model with Volterra expansion in general [1] and the functional link artificial neural network (FLANN) [2], respectively. In practice, one of the most used structures in nonlinear filtering is the so-called block-oriented representation. In this chapter, we will further introduce this type of nonlinear filter. There are several base types of the block-oriented nonlinear structure including the Wiener model [3], the Hammerstein model [4, 5], and the variants originated from these two classes in accordance with different topologies (i.e., parallel, feedback, and cascade) [6]. Specifically, the Wiener model consists of a cascade of a linear time-invariant (LTI) filter followed by a static nonlinear function, which is sometimes deemed as a linear-nonlinear (LN) model, and the Hammerstein model comprises a static nonlinear function connected behind an LTI filter, which is usually considered as a nonlinear-linear (NL) model [7, 8]. In recent years, some scholars have combined block-oriented architecture with spline functions and proposed a new Wiener model for LN blocks, called Wiener spline adaptive filter (SAF) [9]. Spline adaptive filter is composed by a linear combiner followed by an adaptable lookup table (LUT) addressed by the linear combiner output and interpolated by a local low-order polynomial spline curve. Both the weights of the linear filter and the interpolating points of the LUT can be adapted by minimization of a specified cost function [10]. In addition, there also have Hammerstein spline filters, cascade spline filter, and IIR spline adaptive filter (IIR-SAF) [11, 12]. And the main abbreviations and symbols in this chapter are given in Table 4.1.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Zhao, B. Chen, Efficient Nonlinear Adaptive Filters, https://doi.org/10.1007/978-3-031-20818-8_4
163
164
4
Table 4.1 Reverberation time T60 and filter length L used in experiments
T60[ms] Anechoic 50 100 200 350
Spline Adaptive Filter N 256 512 1024 1280 2048
Fig. 4.1 Structure of SAF
4.2 4.2.1
Spline Filter Model Spline Adaptive Filter
Figure 4.1 shows the spline adaptive filter, which is essentially a linear-nonlinear network, and the linear part is an FIR filter. The nonlinear network consists of an adaptive lookup table (LUT) and a spline interpolation network. As shown in the figure, when the instantaneous moment is n, the input of the system is x(n), the output of the linear network is s(n), and y(n) is the output of the spline filter. Therefore: sðnÞ = wT ðnÞxðnÞ
ð4:1Þ
where w(n) = [w0, w1 ,. . ., wN-1]T is the adaptive weight vector of the FIR filter. x(n) = [x(n), x(n-1),. . ., x(n-N+1)]T is the tap delay input signal vector, and N is the number of filter taps. s(n) and y(n) are related by a nonlinear function. In fact, y(n)
4.2
Spline Filter Model
165
Fig. 4.2 Schematic structure of the SAF. Block S1 computes the parameters u and i by (4.2) and (4.3), while S2 computes the SAF output through the spline patch determined by S1
Fig. 4.3 Example of qy,i control points interpolation using a CR-spline function with a fixed step for the x-axes control points Δx = qx,i - qx,i-1
depends on s(n) a function determined by the span index i and the local parameter u, where u 2 [0, 1]. In the simple case of uniform spacing of knots and referring to the top of Fig. 4.2, we constrain the control point abscissas to be equidistant and, most importantly, not adaptable [14]. Moreover, for the sake of efficiency, another constraint is imposed on the control points, forcing the sampling interval to be centered on the x-axis origin (see Fig. 4.3).
166
4
Spline Adaptive Filter
The calculation of the local parameter u is: uð nÞ =
s ð nÞ s ð nÞ Δx Δx
ð4:2Þ
And the span index i is obtained by:
s ð nÞ Q-1 i= þ Δx 2
ð4:3Þ
where Q is the total number of control points, Δx is the gap between the control points, and b•c is the floor operator. The output of the spline adaptive filter is given by: yðnÞ = φi ðuÞ = uT Cqi,n
ð4:4Þ
where u = [u3(n), u2(n), u(n), 1]T and qi = [qi, qi+1, qi+2, qi+3]T is the control point vector and C is the spline basis matrix. Among the spline nonlinear filters, the CR-spline base matrix and the B-spline base matrix are the most widely used, which are given by: 0
1 -1 3 -3 1 B -5 4 -1C 1B 2 C C CR = B C 2@ -1 0 1 0 A 0 2 0 0 0 1 -1 3 -3 1 B 0C -6 3 1B 3 C CB = B C 6@ -3 3 0A 0 1 4 1 0
4.2.2
Basic Spline Filter Algorithm
In this section, we will introduce some basic algorithms based on spline filter, such as SAF-LMS [10], SAF-NLMS [14], SAF-SNLMS [11], and SAF-VSS-SNLMS [11]. And the simulation results of them will be placed in 4.5.1.
4.2
Spline Filter Model
4.2.2.1
167
SAF-LMS Algorithm
In this section, we will briefly introduce the most basic spline filter nonlinear adaptive algorithm, called SAF-LMS. The online learning rules for nonlinear spline adaptive filters can also be derived by minimizing n o the cost function, and the most typical definition is e J wn , qi,n = E jeðnÞ2 j . In general, the cost function can be approximated by instantaneous error, and the expression is as follows: J wn , qi,n = e2 ðnÞ
ð4:5Þ
where e(n) is the prior error signal, referring to Figs. 4.1 and 4.2, and formula (4.4), and we can see that its expression is: eðnÞ = dðnÞ - φi ðuÞ
ð4:6Þ
In order to minimize formula (4.5), we use the stochastic gradient adaptive method to find its derivative with respect to the weight vector w: ∂J wn , qi,n ∂φ ðuÞ ∂u ∂sðnÞ = - 2eðnÞ i ∂wn ∂u ∂sðnÞ ∂wn
ð4:7Þ
where sðnÞ = wTn xn . From (4.4) the local derivative of the i-th span is ∂φ∂ui ðuÞ = T = ½3u2 ðnÞ, 2uðnÞ, 1, 0 . φ0i ðuÞ = u_ T Cqi,n , where u_ = ∂u ∂u 1 . Hence (4.7) becomes: And from expression (4.2), we have that ∂s∂uðnÞ = Δx ∂J wn , qi,n 2 =eðnÞφi 0 ðuÞxn Δx ∂wn
ð4:8Þ
For the derivative computation of (4.5) with respect to the control points qi,n, we have: ∂J wn , qi,n ∂ðd ðnÞ - φi ðuÞÞ2 ∂φi ðuÞ = ∂φi ðuÞ ∂qi,n ∂wn
ð4:9Þ
i ðuÞ where, from (4.4), we have that ∂φ = CT u, so we can write: ∂q i,n
∂J wn , qi,n = - 2eðnÞCT u ∂wn
ð4:10Þ
168
4
Spline Adaptive Filter
At this point, we can obtain the update equations for the weight vector and the control point, respectively: wn + 1 = wn + μw eðnÞφi 0 ðuÞxn
ð4:11Þ
qn + 1 = qn + μq eðnÞCT u
ð4:12Þ
where the parameters μw and μq represent the learning rates for the weights and for the control points, respectively. The algorithm convergence conditions are: 0 < μw ≤
2 φ i 0 2 ð uÞ kxn k2
ð4:13Þ
2 kCuk2
ð4:14Þ
0 < μq ≤
A summary of the SAF-LMS algorithm is as follows:
4.2.2.2
SAF-NLMS Algorithm
However, the updating process of the above SAF algorithms will be affected by the eigenvalue spread of the autocorrelation matrix of the input signal. The stability of adaptive filtering algorithm is an important indicator for verifying the algorithm [13]. So, people investigate a normalized variant for the SAF-LMS called normalized LMS algorithm based on the SAF model (SAF-NLMS) [14].
4.2
Spline Filter Model
169
Using the Lagrange multiplier method, the cost function for the SAF-NLMS can be defined as: J qi,n + 1 =
2 1 1 e2 ðnÞ + qi,n + 1 - qi,n T 2 2un un
ð4:15Þ
where ð1=2Þ × eðnÞ=uTn un can be viewed as the Lagrange multiplier. Taking the derivative of (4.15) with respect to qi,n+1 and wn+1, respectively, and setting them to zeros, we can obtain two recursive equations of the tap weights and control points for the SAF-NLMS algorithm: wn + 1 = wn + μw
e ð nÞ 1 T u_ Cq x uTn un + ε Δx n i,n n
ð4:16Þ
e ð nÞ CT un uTn un + ε
ð4:17Þ
qi,n + 1 = qi,n + μq
where μw and μq are the step sizes for the linear network and nonlinear network, respectively, the small positive constant ɛ is used for avoiding zero division. The algorithm convergence conditions are: h i 2 ε + kun k2 0 < μw ≤ 2 2 1 _T Δx un Cqi,n kxn k + δw ðnÞ h i 2 ε + kun k2 0 < μq ≤ 2 CT un
4.2.2.3
ð4:18Þ
ð4:19Þ
SAF-SNLMS Algorithm
In the paper [11], Liu et al. proposed SAF-SNLMS algorithm based on SAF-NLMS algorithm. The updating equation of qi,n+1 in the SAF-SNLMS algorithm can be formulated by the following constrained optimization problem: min jep ðnÞj = jdðnÞ - yðn + 1Þj = jdðnÞ - uTn Cqi,n + 1 j 2 subject to qi,n + 1 - qi,n ≤ β2 qi,n + 1
ð4:20Þ
where ep ðnÞ is the posteriori error, β2 is chosen to be small to ensure the steady update of qi, n does not change drastically, jj is the absolute value operation, and kk denotes the Euclidean norm of a vector.
170
4
Spline Adaptive Filter
Then, using the Lagrange multiplier method, the cost function can be expressed by h i 2 J qi,n + 1 = jep ðnÞj + ρ0 qi,n + 1 - qi,n - β2
ð4:21Þ
where ρ0 denotes the Lagrange multiplier. Setting the derivative of the cost function J(qi,n+1) with respect to qi,n+1 equal to zero, we have: qi,n + 1 = qi,n +
1 T C un sgn ep ðnÞ 2ρ0
ð4:22Þ
where sgn[] is the sign function. Substituting (4.22) into the constraint condition in (4.20), we obtain: β 1 = 2ρ0 CT un
ð4:23Þ
T Note that CT is a constant matrix andkCTunk ≤ kC kCTk is Tk kunk, where T T defined as the spectral norm of matrix C , C ≔ sup C un =kun k . Thus, (4.23)
un ≠ 0
can be rewritten as: β0 1 ≥ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2ρ0 uTn un + ε0
ð4:24Þ
where β0 = β/kCTk and ε0 are small positive constants used for avoiding zero division. Considering the lower bound of 1/(2ρ0) in (4.24), the updating equation of qi,n can be derived as: sgn ep ðnÞ T qi,n + 1 = qi,n + μq pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C un uTn un + ε0
ð4:25Þ
Similarly, the cost function associated with the weight vector of FIR filter wn can be formulated as: h i J ðwn + 1 Þ = jep ðnÞj + ρ0 kwn + 1 - wn k2 ‐β2 The updating equation of wn can expressed as:
ð4:26Þ
4.2
Spline Filter Model
171
sgn ep ðnÞ 1 T wn + 1 = wn + μw pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u_ n Cqi,n + 1 xn uTn un + ε0 Δx
ð4:27Þ
We replace the posteriori error ep(n) by using the a priori error e(n) approximately. The updating equations can be rewritten as:
4.2.2.4
sgn ½eðnÞ qi,n + 1 = qi,n + μq pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi CT un uTn un + ε0
ð4:28Þ
sgn ½eðnÞ 1 T wn + 1 = wn + μw pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u_ n Cqi,n + 1 xn uTn un + ε0 Δx
ð4:29Þ
SAF-VSS-SNLMS Algorithm
In the paper [11], Liu et al. added variable step size scheme to SAF-NLMS algorithm and proposed SAF-VSS-SNLMS algorithm. In this work, the adjustments of the variable step sizes are controlled by the squared value of the impulsive-free error, i.e.: i h μw ðnÞ = αμw ðn - 1Þ + ð1 - αÞ min be20 ðnÞ, μw ðn - 1Þ h i μq ðnÞ = αμq ðn - 1Þ + ð1 - αÞ min be20 ðnÞ, μq ðn - 1Þ
ð4:30Þ ð4:31Þ
where α is the forgetting factor approaching one. be20 ðnÞ is the estimate of squared value of the impulsive-free error which can be obtained by: be20 ðnÞ = λbe20 ðn‐1Þ + c1 ð1 - λÞmedðγ n Þ
ð4:32Þ
where λ is another forgetting factor close to but smaller than one, c1 = 1.483 (1 + 5/(Nw - 1)) is a finite correction factor and Nw is the data window. γ n = [e2(n), e2(n - 1), ⋯, e2(n - Nw + 1)] and med() denotes the median operator. In the paper [11], the initial values of step sizes will be chosen as μw(0) = μq(0) = 0.05 for all the simulations. The SAF-VSS-SNLMS algorithm is as follows:
172
4.3 4.3.1
4
Spline Adaptive Filter
Robust Spline Filtering Algorithm SAF-MCC Algorithm
The nonlinear spline filter algorithm mentioned above achieves ideal performance in the Wiener system. However, when the desired signal is disturbed by non-Gaussian noise, especially in the presence of large outliers (observed values deviate significantly from a large amount of data), the performance of the above algorithm may deteriorate rapidly. The main reason is that the above algorithm is based on the mean square error (MSE) criterion, which only captures the second-order statistics of the data and relies heavily on the assumption of Gaussian distribution. However, in most practical situations, the Gaussian assumption does not hold [15]. Therefore, some scholars have proposed a new nonlinear adaptive filter, called the nonlinear spline adaptive filter under the maximum correntropy criterion
4.3
Robust Spline Filtering Algorithm
173
(SAF-MCC) [15]. Instead of using MSE, we use correntropy as a cost function to identify spline adaptive filters. Correntropy is a nonlinear similarity measure between two signals. The MCC aims at maximizing the similarity (measured by correntropy) between the model output and the desired response such that the adaptive model is as close as possible to the unknown system [16]. Given two random variables X and Y, the correntropy is [17]: Z V ðX, Y Þ = E ½κðX, Y Þ =
κðX, Y Þf XY ðx, yÞdxdy
ð4:33Þ
where E[•] denotes the expectation operator, κ(, ) is a shift-invariant Mercer kernel. The most widely used kernel in correntropy is the Gaussian kernel, given by: 1 κ σ ðx, yÞ = pffiffiffiffiffi exp σ 2π
-
e2 2σ 2
ð4:34Þ
where e = x - y and σ stands for the kernel bandwidth. In practical situations, the join distribution of X and Y is usually unknown, and only a finite number of data fðxðiÞ, yðiÞÞgNi= 1 are available. In these cases, we can use a sample mean estimator of the correntropy: K X b K,σ ðX, Y Þ = 1 V κ ðxðiÞ, yðiÞÞ K i=1 σ
ð4:35Þ
Correntropy can be used as a cost function for adaptive systems training. For example, under the maximum correntropy criterion (MCC), an adaptive model can be learned by maximizing the correntropy between the desired response and the model output. The optimization criterion in MCC training is: b K,σ ðX, Y Þ = max J MCC = max V
K 1 X κ ð eð i Þ Þ K i=1 σ
ð4:36Þ
where e(i) = x(i) - y(i) is an error sample. We can calculate the sensitivity of the MCC cost with respect to the error e(i), by taking the derivative of equation (4.36) (with Gaussian kernel and K = 1): b K,σ ðX, Y Þ ∂V ∂J MCC 1 = = - pffiffiffiffiffi exp 3 ∂eðiÞ ∂eðiÞ σ 2π
- ðeðiÞÞ2 eðiÞ 2σ 2
ð4:37Þ
Derivative curves with different kernel widths are shown in Fig. 4.4. We can see the relationship between the derivative and the error. When the magnitude of the error is very large, the derivative becomes very small, especially if the kernel width
174
4
Spline Adaptive Filter
Fig. 4.4 Derivative curves of JMCC with respect to e(i) for different kernel widths
is smaller. Therefore, MCC training will be insensitive (hence robust) to impulsive noise, which usually causes large error. Using MCC instead of MSE as the cost function, the following cost function can be obtained: 1 J wn , qi,n = K
n X
1 1 κ σ ðdðjÞ, yðjÞÞ = pffiffiffiffiffi σ 2π K j=n-K +1
n X
exp
j=n-K +1
- e2 ð j Þ 2σ 2
ð4:38Þ where e( j) = d( j) - y( j). To obtain the optimal weight vector, we can use a gradient-based approach to maximize the above cost. For online adaptation, an instantaneous correntropy (with K = 1) can be used to derive a random gradient:
∂J wn , qi,n - ðeðnÞÞ2 ∂φi ðuÞ ∂u ∂sðnÞ e ð nÞ = pffiffiffiffiffi exp 2σ 2 ∂u ∂sðnÞ ∂wn ∂wn σ 3 2π
2 - ðeðnÞÞ e ð nÞ ∂φ0i ðuÞxn = pffiffiffiffiffi exp 3 2σ 2 σ 2π
ð4:39Þ
4.3
Robust Spline Filtering Algorithm
175
∂J wn , qi,n - ðeðnÞÞ2 1 eðnÞφi 0 ðuÞxn wn + 1 = wn + μw = w n + μw exp Δx 2σ 2 ∂wn ð4:40Þ Similarly, for the control points, we derive:
∂J wn , qi,n - ðeðnÞÞ2 ∂φi ðuÞ e ð nÞ p ffiffiffiffiffi = exp 2σ 2 ∂qi,n ∂qi,n σ 3 2π
2 - ðeðnÞÞ e ð nÞ CT u = pffiffiffiffiffi exp 3 2σ 2 σ 2π
∂J wn , qi,n - ðeðnÞÞ2 eðnÞCT u = qi,n + μq exp qi,n + 1 = qi,n + μq 2σ 2 ∂qi,n
ð4:41Þ
ð4:42Þ
SAF-MCC algorithm was used as reference for the above update equations. The MCC has very strong robustness to impulse noise, especially impulse non-Gaussian noise.in the presence of impulsive non-Gaussian noises. And the simulation results of the above algorithm will appear in 4.5.2.
4.3.2
Performance Analysis
Although the MCC algorithm may achieve an excellent performance in non-Gaussian environments, its performance analysis is so far still not addressed [18]. Therefore, we will introduce the steady-state performance of the spline nonlinear filter under the MCC criterion in detail in this section. For the spline nonlinear filter under the MCC criterion, we use energy conservation arguments [19] to analyze its performance. Next, we will introduce the theoretical mean square error (EMSE) for Gaussian noise and non-Gaussian noise. Through theoretical performance analysis, the steady-state performance of the system can be accurately predicted [20]. The steady-state excess mean square error (EMSE) is a significant measure of performance [21], which is defined as: ς = MSE - σ 2v
ð4:43Þ
where MSE denotes the mean square error. Figure 4.5 depicts the structure of SAF-MCC for performance analysis, which includes the adaptive part and the unknown spline parameters to be estimated (denoted with subscript 0). The a priori error of the whole system is defined as ε(n) = d(n) - y(n). When the spline control points are fixed, the a priori error for the linear filter is denoted with εw. In contrast, when the linear filter is fixed, the a priori error for the control points is denoted i-th
176
4
Spline Adaptive Filter
Fig. 4.5 SAF-MCC model for performance analysis
εq. The wo and qo,i are the optimal solution of linear filter weights and spline control points for the spline nonlinear system, respectively. In the steady state, the mean square values of the weight vector and the spline control points are: lim = Efwn g = wo
ð4:44aÞ
lim = E qi,n = qi,o
ð4:44bÞ
n→1 n→1
According to Fig. 4.5, the prior error is: ε ð nÞ = = = = =
d ð nÞ - y ð nÞ φo ðlðnÞÞ - φðsðnÞÞ φi ðul Þ - φi ðus Þ uTl Cqi,o - uTs Cqi,o ε w ð nÞ + εq ð nÞ
ð4:45Þ
where ul = [u3l, u2l, ul, 1]T and us = [u3s, u2s, us, 1]T, with ul and us denoting the local variable u for l(n) and s(n), respectively, and having the following relationship:
4.3
Robust Spline Filtering Algorithm
177
l ð nÞ lðnÞ s ð nÞ s ð nÞ ul - us = + Δx Δx Δx Δx l ð nÞ s ð n Þ = Δx Δx ðwo - wn ÞT xðnÞ ≈ Δx ðwÞ T v x ð nÞ = n Δx
ð4:46Þ
where vðnwÞ = wo - wn denotes the weight error vector. The spline function φi(n) in (4.4) can be rewritten as: φi ðuÞ = u3 c1 qi,n + u2 c2 qi,n + uc3 qi,n + c4 qi,n ≈ uc3 qi,n + c4 qi,n
ð4:47Þ
where Ck denotes the k-th row of matrix C. And the a priori error εw(n) can be expressed as: εw ðnÞ = uTl - uTs Cqi,o = uTl Cqi,o - uTs Cqi,o = ðul - us Þc3 qi,o c3 qi,o ðwÞT = xðnÞ v Δx n
ð4:48Þ
εq ðnÞ = uTs C qi,o - qi,n = uTs CvðnqÞ = vðnqÞT CT us
ð4:49Þ
Similarly:
For convenience and simplification of analysis, the following assumptions are necessary: a1. The noise {v(n)} is independent, zero mean, and is independent of x(n), s(n), εq(n), εw(n), and ε(n). a2. The priori error ε(n) and the error e(n) are independent of φi(u), φ0i ðuÞ, kx(n)k2, and kCTuk2.
a3. At noise state, the update term eðnÞ exp asymptotically uncorrelated with φi(u),
φ0i ðuÞ,
e2 ðnÞ 2σ 2
in (4.45) and (4.47) is
kx(n)k , and kCTuk2.
(A) Steady-state performance for the linear filter Subtracting wo from both sides of (4.11) yields:
2
178
4 ðwÞ vn + 1
= vðnwÞ
φ0 ð uÞ - μw i eðnÞ exp Δx
Spline Adaptive Filter
e 2 ð nÞ xðnÞ 2σ 2
ð4:50Þ
Take the norm on both sides of the above formula and take the expectation:
0 n 2 o φ ðuÞ e 2 ð nÞ ðwÞ 2 E vn + 1 = E vðnwÞ - 2μw E i ε eðnÞ exp ð n Þ w c3 qi,o 2σ 2 ( )
2 φ0 2 ð u Þ e2 ð nÞ + μ2w E i 2 eðnÞ exp k x ð nÞ k 2 Δx 2σ 2 ð4:51Þ n 2 o ðwÞ 2 Assuming the spline adaptive filter is stable and E vn + 1 = E vðnwÞ when n → 1. The formula (4.46) can be expressed as:
φ0i ðuÞ e2 ðnÞ ε w ð nÞ eðnÞ exp 2Δx μw E c3 qi,o 2σ 2 ( )
2 2 ð n Þ e 2 2 = μw 2 E φ0i ðuÞ eðnÞ exp k x ð nÞ k 2σ 2
2
ð4:52Þ
Applying assumptions A2–A3, the expectation on the left side of equation (4.51) can be expressed as:
φ0i ðuÞ e2 ð nÞ E ε w ð nÞ eðnÞ exp c3 qi,n 2σ 2 0
φi ðuÞ e2 ðnÞ E eðnÞ exp ε w ð nÞ =E c3 qi,n 2σ 2
ð4:53Þ
For the expectation on the right side of equation (4.52), we have to consider the following two situations: 2 1. Case 1:v(n) is Gaussian: Let gðeðnÞÞ = eðnÞ exp - e2σðn2Þ : Assuming that εq(n) ≈ 0, at the steady state for n → 1, the second expectation on the right of (4.48) can be calculated as:
4.3
Robust Spline Filtering Algorithm
179
e 2 ð nÞ εw ðnÞ E eðnÞ exp 2σ 2 = E fgðeðnÞÞεw ðnÞg = E fgðεw ðnÞ + vðnÞÞεw ðnÞg
= E fg0 ðeðnÞÞgE ε2w ðnÞ
Z1
E ε2w ðnÞ e2 ðnÞ e 2 ð nÞ e2 ð nÞ pffiffiffiffiffi = × 1- exp deðnÞ exp 2σ 2 2σ 2 2σ 2e σ e 2π =E
ε2w ðnÞ
-1
σ3
3 σ 2 + E ε2w ðnÞ + σ 2v 2 ð4:54Þ
where σ e2 denotes the variance of the error e(n), which is expressed as σ e 2 = E ε2w ðnÞ + σ v 2 . Applying assumptions A2–A3, the expectation on the right of the (4.52) can be calculated as: ( E
) 2 e 2 ð nÞ 2 eðnÞ exp k x ð nÞ k 2σ 2 n o σ 3 E ε2 ðnÞ + σ 2 2 w v 02 = E φi ðuÞkxðnÞk
3 2E ε2w ðnÞ + 2σ 2v + σ 2 2
2 φ0i ðuÞ
ð4:55Þ
Substituting (4.53) and (4.55) into (4.52) yields:
φ0i ðuÞ 2 σ3 E ε w ð nÞ
3 c3 qi,n E ε2w ðnÞ + σ 2v + σ 2 2 n o σ 3 E ε 2 ð nÞ + σ 2 2 2 w v 2 02 = μw E φi ðuÞ kxðnÞk
3 2E ε2w ðnÞ + 2σ 2v + σ 2 2
2Δx2 μw E
ð4:56Þ
Rearranging (4.56) and letting n → 1, we have: n o μw E φ0i 2 ðuÞ2 kxðnÞk2 n 0 o E ε2w ð1Þ = φ ðuÞ 2E c3iq Δx2 i,n 2 3
E εw ð1Þ + σ 2v σ 2 + E ε2w ð1Þ + σ 2v 2 ×
3 2E ε2w ð1Þ + 2σ 2v + σ 2 2
ð4:57Þ
The steady-state EMSE can be obtained by solving the fixed-point equation (4.57).
180
4
Spline Adaptive Filter
2. Case 2: v(n) is non-Gaussian: In this case, we use the Taylor series expansion of the function to calculate the EMSE of the algorithm. According to (4.40), we have:
n o
φ0i ðuÞ 2 E fgðeðnÞÞεw ðnÞg = μ2w E φ0i ðuÞkxðnÞk2 E g2 ðeðnÞÞ ð4:58Þ 2Δx μw E c3 qi,n 2
We approximate g(e(n)) using a second-order Taylor series approximation of the function g(e(n)) around v(n) as: gðeðnÞÞ = gðεw ðnÞ + vðnÞÞ ≈ gðvðnÞÞ + g0 ðvðnÞÞεw ðnÞ +
1 00 g ðvðnÞÞε2w ðnÞ 2
ð4:59Þ
Omit higher-order terms:
v2 ð nÞ v 2 ð nÞ 1 2σ 2 2σ 2
v2 ðnÞ v3 ðnÞ 3vðnÞ g00 ðvðnÞÞ = exp 2σ 2 σ4 2σ 2 g0 ðvðnÞÞ = exp
-
ð4:60Þ ð4:61Þ
Using (4.60) and (4.61) and assumptions A1-A3, two of the expectations in (4.58) can be approximated by:
E fgðeðnÞÞεw ðnÞg ≈ E gðvðnÞÞεw ð nÞ + g0 ð vðnÞÞε2w ðnÞ ≈ E fg0 ðvðnÞÞgE ε2w ðnÞ
E g2 ðeðnÞÞ = E g2 ðεw ðnÞ + vðnnÞÞ o
≈ E g2 ðvðnÞÞ + E jg0 ðvðnÞÞj2 E ε2w ðnÞ
+ E jg0 ðvðnÞÞg0 0 ðvðnÞÞj E ε2w ðnÞ
ð4:62Þ
ð4:63Þ
Finally, substituting (4.62) and (4.63) into (4.58) yields (4.64):
Efφ0 2i ðuÞkxðnÞk2 g × E exp ð - v2 ðnÞ=σ 2 Þv2 ðnÞ μ w 2
Δx2 E fφ0i ðuÞ=c3 qi,n g
E εw ð1Þ = 2E exp ð - v2 ðnÞ=σ 2 Þv2 ðnÞð1 - v2 ðnÞ=σ 2 Þ n o
- μw E φ0 2i ðuÞkxðnÞk2 =Δx2 E φ0i ðuÞ=c3 qi,n ) ( exp ð - v2 ðnÞ=σ 2 Þv2 ðnÞ × ×E ð1 + 2v4 ðnÞ=σ 2 - 5v2 ðnÞ=σ 2 Þ
ð4:64Þ
4.3
Robust Spline Filtering Algorithm
181
(B) Steady-state performance for the spline control points Subtracting qi, o from both sides of (4.12) yields: ð qÞ vn + 1
= vðnqÞ
- μq eðnÞ exp
e 2 ð nÞ T C u 2σ 2
ð4:65Þ
Similarly, we can take the norm on both sides of the above formula and take the expectation. Using A2–A3 and assuming that the spline adaptive filter assumptions n 2 o ðwÞ 2 is stable and E v holds at the steady state for n → 1, = E vðwÞ n+1
n
we have:
e 2 ð nÞ ε q ð nÞ 2μq E eðnÞ exp 2σ 2 (
2 ) n 2 T 2 o ð n Þ e C u = μq 2 E eðnÞ exp E 2σ 2
ð4:66Þ
n o To obtain E ε2q ðnÞ , we consider the following two cases: 1. Case 1: v(n) is Gaussian: Using a similar approach to deriving (4.55) and assuming that εw[n] ≈ 0, the expectation on the lift of (4.66) can be expressed as:
n o e 2 ð nÞ ε ð n Þ = E ε2q ðnÞ lim E eðnÞ exp q 2 n→1 2σ
σ2
+E
n
σ3 ε2q ðnÞ
o
+ σ 2v
32 ð4:67Þ
The first expectation on the right of (4.66) is given by: ( E
eðnÞ exp
-
e ð nÞ 2σ 2 2
2 )
o n σ 3 E ε2q ðnÞ + σ 2v = n o 32 2E ε2q ðnÞ + 2σ 2v + σ 2
ð4:68Þ
n o Substituting (4.67) and (4.68) into (4.66), and letting n → 1 in E ε2q ðnÞ yields:
182
4
Spline Adaptive Filter
n o n 2 o E ε2q ð1Þ = μq E CT u o n o n 32 E ε2q ð1Þ + σ 2v σ 2 + E ε2q ð1Þ + σ 2v × o n 32 2 2E ε2q ð1Þ + 2σ 2v + σ 2
ð4:69Þ
The steady-state EMSE can be obtained by solving the fixed-point equation (4.69). 2. Case 2: v(n) is non-Gaussian: According to (4.53), we have: n 2 o
2E gðeðnÞÞεq ðnÞ = μq ðnÞE CT u E g2 ðeðnÞÞ
ð4:70Þ
Using a similar method to the EMSE calculation for the linear filter, we obtain the EMSE for the spline control points as given by (4.70): n o exp ð - v2 ðnÞ=σ 2 Þ T 2 ×E μq E C u n o × v 2 ð nÞ
E ε2q ð1Þ = 2 2 2 2E exp ð - v ðnÞ=σ Þv ðnÞð1 - v2 ðnÞ=σ 2 Þ ( ) n exp ð - v2 ðnÞ=σ 2 Þv2 ðnÞ 2 o T - μq E C u E × ð1 + 2v4 ðnÞ=σ 2 - 5v2 ðnÞ=σ 2 Þ
ð4:71Þ
(C) Steady state of EMSE of the whole SAF-MCC Using (4.43), (4.44a), and (4.44b), the EMSE can be calculated as:
ς = En ε2 ð1Þ 2 o = E εw ð nÞ + εq ð 1 Þ n o
= E ε2w ð1Þ + E ε2q ð1Þ + 2E εw ðnÞεq ð1Þ
ð4:72Þ
where E{εw(1)εq(1)} denotes the cross-EMSE. Since at steady-state ul ≈ us and qi,o ≈ qi,n, using (4.46) and (4.45), we see that E{εw(1)εq(1)} can be omitted. Therefore, the EMSE of the whole SAF-MCC algorithm is given by: n o
ς ≈ E ε2w ð1Þ + E ε2q ð1Þ
ð4:73Þ
4.4
Applications
4.4
183
Applications
With the development of spline filter, its application has been more and more extensive. In this section, we will introduce the applications based on spline filters, namely, active noise control and echo cancellation.
4.4.1
Active Noise Control Based on Spline Filter
Active noise control (ANC), which is based on the destructive superposition of sound waves, has successfully achieved enormous significance to reduce unwanted noise [22]. However, when the impulsive noise occurs in the original ANC systems, the performance of the ANC system would degrade [22]. In an endeavor to improve the noise cancellation achieved in a nonlinear ANC system, Vinal Patel et al. first propose a spline adaptive filter-based nonlinear ANC system [24]. On the basis of [24], a filtered-c generalized maximum correntropy criterion based on the nonlinear spline adaptive filtering. The GMCC criterion’s kernel function can make it less sensitive to abnormal data, so the FcGMCC algorithm outperforms the filtered-c least mean square (FcLMS) algorithm in the impulsive noise environment [25].
4.4.1.1
FcGMCC Algorithm
Figure 4.6 depicts the schematic diagram of the nonlinear ANC system based on the spline adaptive filter. The residual noise sensed by the error microphone is as follows: eðnÞ = d ðnÞ - yðnÞ sN ðnÞ
ð4:74Þ
where represents the linear convolution operator and sN(n) is the impulse response of the secondary path. To efficiently provide a robust solution for active impulsive noise control, the novel cost function based on the generalized maximum correntropy criterion is proposed. The generalized maximum correntropy criterion (GMCC) has been proven to be extremely efficient in dealing with mutations. The cost function based on GMCC criterion is as follows: J = Eα,β ½GðeðnÞÞ = γ α,β E ½ exp ð - λjeðnÞjα Þ where γ α,β is the normalization constant, and its definition is as follows:
ð4:75Þ
184
4
Spline Adaptive Filter
Fig. 4.6 Diagram of ANC system for GMCC [27]
γ α,β = α=ð2β Γð1=αÞÞ
ð4:76Þ
where Γ() represents the gamma function, α is the shape parameter greater than 0, β is the scale coefficient, and λ = 1/βα represents the kernel parameter. In (4.75), when α = 2, the GMCC algorithm will degenerate into the MCC algorithm. From the gradient descent method, the weight update formula can be obtained as follows: ∂J wn , qi,n ∂wn ∂½yðnÞ sN ðnÞ = w n + μ w f ð e ð nÞ Þ ∂wn i h 1 T = wn + μw f ðeðnÞÞ u_ Cqi x ð nÞ s N ð nÞ Δx = wn + μw f ðeðnÞÞ ½xnt sN ðnÞ = w n + μ w f ð e ð nÞ Þ x0 ð nÞ
wn + 1 = wn - μw
ð4:77Þ
1 xðnÞ, and where f(e(n)) = αλγ α,β exp (-λje(n)jα)je(n)jα-1 sign (e(n)), xnt = u_ T Cqi Δx ′ b x (n) are the signal generated by xnt through the estimated value SðzÞ of the secondary channel S(z). Similarly, the weight update for control points is expressed as:
4.4
Applications
185
qi,n + 1 = qi,n - μq
∂J ðnÞ ∂qi,n
∂½yðnÞ sN ðnÞ ∂qi,n = qi,n + μq f ðeðnÞÞ CT u sN ðnÞ = qi,n + μq f ðeðnÞÞ CT u0 = qi,n + μq f ðeðnÞÞ
ð4:78Þ
where u′ is the u sN(n) and μq is the learning rate for updating the control points. Equations (4.77) and (4.78) are the update formulas of the novel robust filtered-c generalized maximum correntropy criterion (FcGMCC) algorithm.
4.4.1.2
Convergence Analysis
In this subsection, we provide the simple convergence analysis of FcLMGM algorithm. The error signal can be calculated by using the Taylor series expansion as follows: e ð n + 1Þ = e ð nÞ +
∂eðnÞ Δwn + η ∂wTn
ð4:79Þ
where η is the higher-order terms in the Taylor series expansion and Δwn = wn + 1 wn. Using (4.74) and (4.79), we have: ∂eðnÞ ∂y ∂u ∂α T = sN ðnÞ = - x0 ðnÞ ∂u ∂α ∂wTn ∂wTn
ð4:80Þ
Δwn = μw f ðeðnÞÞx0 ðnÞ
ð4:81Þ
Δwn = μw f ðeðnÞÞx0 ðnÞ
ð4:82Þ
eðn + 1Þ = eðnÞ - μw kx0 ðnÞk f ðeðnÞÞ
ð4:83Þ
And
Using (4.81), we have:
2
Substituting (4.79) and (4.82) into (4.78) yields, we can get: In order to ensure the convergence of the algorithm, we have: jeðn + 1Þj ≤ jeðnÞ - μw kx0 ðnÞk f ðeðnÞÞj 2
So we have:
ð4:84Þ
186
4
Spline Adaptive Filter
j1 - αλγ α,β μw kx0 ðnÞk exp ð - λjeðnÞjα ÞjeðnÞjα - 2 j ≤ 1 2
ð4:85Þ
According to (4.86), we can get: 0 < μw ≤
2 αλγ α,β kx0 ðnÞk2 exp ð - λjeðnÞjα ÞjeðnÞjα - 2
ð4:86Þ
Similarly, we can get the range of μq, which makes the algorithm converge as follows: 0 < μq ≤
4.4.2
αλγ α,β
kCu0 k2
2 exp ð - λjeðnÞjα ÞjeðnÞjα - 2
ð4:87Þ
Echo Cancellation Based on Spline Filter
In recent years, there has been increasing interest in acoustic echo cancellation (AEC) issues due to teleconferencing and hands-free telephone systems. Unfortunately, the well-known linear algorithms proposed in the literature are too unrealistic in many practical situations because their performance is limited by the existence of nonlinearity. Therefore, some scholars have applied spline filter and proposed echo cancellation based on spline filter. This section will introduce echo cancellation based on Hammerstein filter and Wiener filter, respectively, and conduct simulation comparison of their performance [28]. The simulation results are shown in Sect. 4.5.5.
4.4.2.1
The Nonlinear Echo Canceler
An acoustic echo canceler is an adaptive system designed to reduce the echo generated by the sound produced by a speaker that can be picked up by microphones in the same room. The difficulty with acoustic echo cancellation is that the surrounding space changes the original sound, making the sound that reenters the microphone colorful. In addition, due to low-cost audio equipment, some nonlinear distortion can occur in the signal. In experiments, we simulate the effects of amplifier and speaker distortion by applying a nonlinear function (NL) to the input signal. The influence of environment is described by room impulse response (RIR)h(n) model. This nonlinear AEC (NAEC) model is referred to in the literature as the Hammerstein system, as shown in Fig. 4.7a. A nonlinear distortion model of amplifier and loudspeaker is presented by cascade of nonlinear function and RIR h(n). The latter model of NAEC is called the Wiener system, as shown in Fig. 4.7c).
4.4
Applications
187
Fig. 4.7 Two different implementations of a distorting echo path (b): (a) cascade of a nonlinear function (NL) and a room impulse response (RIR) (Hammerstein system) and (c) cascade of a RIR and a NL (Wiener system) [28]
Fig. 4.8 The Hammerstein NAEC architecture
The system is depicted in Fig. 4.8, where x(n) is the excitation signal, d(n) is the reference signal, and b(n) is a local background noise. In the Hammerstein NAEC architecture, s(n) = f(x(n)) is the distorted signal, y(n) = wTs is the output of the adaptive filter, that is the estimate of distorted echo signal, where w = [w1, w2, . . ., wN] are the N coefficients of the adaptive filter and s = [s(n), s(n-1), . . ., s(n-N +1)], while e(n) = d(n)-y(n) is the error signal. As shown in Fig. 4.8, the architecture consists of a cascade of nonlinear functions and a linear adaptive filter, the adaptive rules of which are described in the next section: Alternatively Fig. 4.9 shows the Wiener counterpart. In this new scheme, s(n) = wTx is the output of the linear adaptive filter, while y(n) = f(s(n)) is the distorted output signal. w = [w1, w2, . . ., wN] are the N coefficients of the adaptive
188
4
Spline Adaptive Filter
Fig. 4.9 The proposed Wiener NAEC architecture
filter and x = [x(n), x(n-1), . . ., x(n-N+1)], while e(n) = d(n)-y(n) is the error signal. In addition an additive environmental noise b(n) can be considered.
4.4.2.2
The Architectures Proposed in [28]
In this experiment, we use the basic least mean square (LMS) algorithm. Thus the cost function adopted is J(n) = je(n)j2 = jd(n)-y(n)j2. (A) Hammerstein System The Hammerstein model consists of a cascade of a static nonlinear function followed by an LTI filter, known as a nonlinear-linear (NL) model. Therefore, its updated rules are similar to those of the SAF-LMS algorithm based on Wiener filter in Sect. 4.2.2.1. Its updating formula is as follows: wn + 1 = wn + μw eðnÞsn
ð4:88Þ
qn + 1 = qn + μq eðnÞCT Ui,n wn
ð4:89Þ
where Ui,n 2 ℝ4×N = [ui,n, ui,n-1, . . ., ui, vectors ui,n-k
n-N+1]
is a matrix which collects N past
(B) Wiener System The update strategy of Wiener filter is shown in Sect. 4.2.2.1. Its updating formula is as follows:
4.5
Computer Simulation Examples
4.5
189
wn + 1 = wn + ηw eðnÞφi 0 ðuÞxn
ð4:90Þ
qn + 1 = qn + ηq eðnÞCT u
ð4:91Þ
Computer Simulation Examples
System identification is a mathematical model describing system behavior based on the time function of input and output. Its purpose is to estimate the model parameters inherent in the system by using adaptive filtering algorithm to get the desired output, as shown in Fig 4.5. In this Sects. 4.5.1, 4.5.2, and 4.5.3 are all simulated on the basis of system identification.
4.5.1
Basic Spline Filter Algorithm Simulation
In this section, we will introduce the performance comparison of the algorithms in Sect. 4.2.2 in different environments based on the Matlab simulation platform [5]. All simulations are based on system identification. We evaluate the performance of the above algorithms in the context of Wiener-type system identification. All the following results are obtained by averaging over 100 Monte Carlo trials. The performance is measured by the use of mean square error (MSE) defined as 2 10log 10[e(n)] . The input signal is generated by the process xðnÞ = ωxðn - 1Þ + pffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 - ω aðnÞ, where a(n) is the white Gaussian noise signal with zero-mean and unitary variance, and the parameter ω is selected in the range [0, 0.95], which can be interpreted as the degree of correlation for the adjacent samples. Real speech inputs are also applied. The FIR filter coefficients for the SAF are initialized as w-1 = [1, 0,. . ., 0] with length N = 5, while the spline model is initially set to a straight line with a unitary slope. For convenience, only B-spline basis is applied in the simulations; however, similar results can also be achieved using the CR-spline basis. The unknown Wiener spline model comprises an FIR filter wo=[0.6, -0.4, 0.25, -0.15, 0.1]T and a nonlinear spline function represented by a LUT qo with 23 control points, and Δx is set to 0.2, and qo is defined by qo = [-2.2,. . ., -0.8, -0.91, -0.4, -0.2, 0.05, 0.0, -0.4, 0.58, 1.0, 1.0, 1.2. . ., 2.2]. An independent White Gaussian background noise, v(n), is added to the output of the unknown system, with 30 dB signal to noise ratio (SNR), which is defined as dðnÞ. The impulsive SNR = 10 log 10 σ 2d =σ 2v , where σ 2d is the variance of noise-free e noise is considered as the contaminated Gaussian (CG) impulse or the symmetric α ‐ S noise. For the symmetric α ‐ S noise, its fractional-order signal-to-noise a0 a0 e ratio (FSNR) can be defined as FSNR = 10 log 10 jdðnÞj =jη0 ðnÞj , where η0(n) denotes the symmetric α ‐ S noise, 0 < a0 < α0, α0 is the characteristic exponent of
190
4
Spline Adaptive Filter
Fig. 4.10 The variation of the step sizes for white Gaussian input in the absence of impulsive noise (SNR = 30 dB) [11]
the symmetric α ‐ S noise, α0 is set to 0.8, and a0 is selected to be 0.7 in simulations. The values of other parameters can be set as follows: μw = μq = 0.01, ε = 0.001, ε0 = 0.001, α = λ = 0.99, Nw = 11, μw(0) = μq(0) = 0.05, and be2o ð0Þ = σ 2x , where σ 2x denotes the variance of the input. Figure 4.10 shows the variation of the step sizes of the SAF-VSS-SNLMS for white Gaussian input, and the step size in the beginning is higher, which leads to faster convergence rate, and when the filter approaches its steady state, the step sizes become lower to ensure the small error. Figures 4.11 and 4.12 show the MSE learning curves of the SAF-LMS [3], SAF-NLMS [14], SAF-SNLMS [11], and SAF-VSS-SNLMS [11] in the absence of impulsive noise. The input signal is the white Gaussian sequence (ω is set to zero) in Fig. 4.11, and colored input (ω is set to 0.9) is used in Fig. 4.12. It clearly can be seen that the SAF-SNLMS algorithm suffers from the steadystate performance deterioration due to the sign operation of the error. However, the SAF-VSS-SNLMS nearly gets the steady-state performance comparable to that of the SAF-LMS and SAF-NLMS algorithms; besides, it obtains a better tracking ability than these two algorithms because of the variable step-size scheme. From the small figure on the top left corner of Fig. 4.11, we also can see that the SAF-VSSSNLMS obtains the fastest convergence rate in the beginning (about 1000 samples in
4.5
Computer Simulation Examples
191
Fig. 4.11 MSE curves for white Gaussian input in the absence of impulsive noise (SNR = 30 dB) [11]
Fig. 4.12 MSE curves for colored input in the absence of impulsive noise (SNR = 30 dB) [11]
192
4
Spline Adaptive Filter
Fig. 4.13 MSE curves for colored input in CG impulsive noise (SNR = 30 dB, t = 100,000, p = 0.01) [11]
the initial phase of filtering) of adaptation. Figure 4.10 shows the variation of the step sizes of the SAF-VSS-SNLMS for white Gaussian input, and the step sizes in the beginning are higher which lead to faster convergence rate, and when the filter approaches its steady state, the step sizes become lower to ensure the small error. Figures 4.13 and 4.14 indicate the learning curves of four algorithms in the case of CG impulsive noise; the input is the colored signal, and ω is set to 0.9. It is clearly in this case that the SAF-SNLMS algorithms outperform the other cited algorithms, obtaining the lower steady-state MSE and better tracking ability. In addition, the SAF-VSS-SNLMS achieves the best performance. Figures 4.15, 4.16, and 4.17 show the MSE learning curves of four algorithms in the symmetric α ‐ S noise environment at different FSNR. The other simulation parameters are the same of Fig. 4.13. As can be seen in cases of 0 dB in Fig. 4.15 and 20 dB FSNR in Fig. 4.16, the SAF-SNLMS algorithm does not acquire the satisfactory steady-state performance. However, due to the variable step-size solution, the SAF-VSS-SNLMS provides good tracking and steady-state performances. At high FSNR (-5 dB) in Fig. 4.17, the SAF-LMS and SAF-NLMS fail to track the unknown nonlinear system, but the SAF-SNLMS algorithms have the robust performance against the impulsive noise. Figure 4.19 shows the MSE learning curves of four algorithms in case of speech signal input which is shown in Fig. 4.18. The other simulation parameters are the
4.5
Computer Simulation Examples
193
Fig. 4.14 MSE curves for colored input in CG impulsive noise (SNR = 30 dB, t = 10,000, p = 0.1) [11]
Fig. 4.15 MSE curves for colored input in symmetric α ‐ S noise (SNR = 30 dB, symmetric α ‐ S noise FSNR = 0 dB) [11]
194
4
Spline Adaptive Filter
Fig. 4.16 MSE curves for colored input in symmetric α ‐ S noise (SNR = 30 dB, symmetric α ‐ S noise FSNR = 20 dB) [11]
Fig. 4.17 MSE curves for colored input in symmetric α ‐ S noise (SNR = 30 dB. Symmetric α ‐ S noise FSNR = -5 dB) [11]
4.5
Computer Simulation Examples
195
Fig. 4.18 Speech signal [11]
same with in Fig. 4.12. The impulsive noise is the CG noise. From Fig. 4.19, SAF-SNLMS algorithms perform better than other cited algorithms which demonstrate the effectiveness to the speech signal input.
4.5.2
SAF-MCC Algorithm Simulation
In this section, simulation results are presented to illustrate the performance of the SAF-MCC algorithm in Sect. 4.3.1. In the experiment, we identify an unknown Wiener system consisting of a linear component with a parameter vector w = [0.2, -0.1, 0.25, -0.15] and a nonlinear spline function implemented by a 21-point length LUT qo with an interval sampling Δx = 0.12, given by qo = [-1.20, -1.08, -0.96, -0.84, -0.72, . . ., -0.36, -0.24, -0.12, 0.00, 0.12, 0.24, 0.36, . . ., 0.72, 0.84, 0.96, 1.08, 1.20]. And CR spline is the only one considered in the paper and whose basis matrix CCR is the following matrix:
196
4
Spline Adaptive Filter
Fig. 4.19 MSE curves for speech input in CG impulsive noise (SNR = 30 dB, t = 100,000, p = 0.01) [11]
0
-1 B 2 1B C CR = B 2@ -1 0
3 -5
-3 4
0
1
2
0
1 1 -1C C C 0 A 0
The input signal x(n) is generated by the following equation: pffiffiffiffiffiffiffiffiffiffiffiffiffi xðnÞ = axðn - 1Þ + 1 - a2 ξðnÞ
ð4:92Þ
where ξ(n) is a zero mean white Gaussian noise with unitary variance and α is a parameter, which is used to determine the level of correlation between adjacent samples. In this simulation, we employ the alpha-stable distribution to generate the disturbance noise at the desired output. The characteristic function of the alphastable distribution is as follows: f ðt Þ = exp fjδt - γjtjα ½1 + jβ sgn ðt ÞSðt, αÞg where
ð4:93Þ
4.5
Computer Simulation Examples
8 απ > < tan 2 Sðt, αÞ = >2 : log jtj π
197
ifα ≠ 1 ifα = 1
ð4:94Þ
where a is the characteristic exponent, which is also called the stability index and satisfies a 2 (0, 2], and β 2 [-1, 1] is the symmetry parameter, γ > 0 is the dispersion parameter, and δ 2 ℝ is the location parameter. When β = 0, the alpha-stable distribution is called a symmetric alpha-stable (SαS) distribution. We set the parameters as α = 1.6, β = 0, γ = 0.05, δ = 0. The filter length is 4, and the input vector is x(n) = [x(n-1), x(n-2), ..., x(n-4)]. The simulation results are averaged over 50 independent Monte Carlo runs. A segment of 8000 samples is used as the training data and another 100 samples as testing data. The Gaussian kernel is used in MCC, and the kernel width is set at σ = 0.1 Figure 4.20 demonstrates the performance comparison between SAF-LMS and SAF-MCC. The learning rates are set at μw = 0.02, μq = 0.01, and μw = 0.14, μq = 0.05 for SAF-LMS and SAF-MCC, respectively. It is obvious that SAF-MCC performs better than the original SAF-LMS, with faster convergence rate and smaller test error. We also investigate the convergence curves of the algorithm with different interval sampling △x = [0.08, 0.12, 0.14, 0.16] and different kernel width σ = [0.05, 0.10, 0.50, 1.00]. The step sizes are chosen such that all the algorithms have the same initial convergence rate. The performances are shown in Figs. 4.21 and 4.22, respectively. From the simulation results, we can see that, when Δx = 0.12, the convergence performance will be the best. The width size of the kernel will affect the steady-state performance and convergence speed of the algorithm. So, how to choose the best value of σ is a challenging problem in the future.
Fig. 4.20 Convergence curves of SAF-LMS and SAF-MCC
198
4
Spline Adaptive Filter
Fig. 4.21 Convergence curves of SAF-MCC with different Δx value
Fig. 4.22 Convergence curves of SAF-MCC with different σ value
4.5.3
Performance Analysis Simulation
In this section, we will carry out simulation analysis on the algorithm mentioned in Sect. 4.3.2. (A) Performance Result In this section, the performance of the MCC algorithm will be simulated and analyzed [15]. The input signal x(n) is generated by (4.92), (4.93), and (4.94). Let α = 1.6, β = 0, γ = 0.05, δ = 0. For Fig. 4.23, we demonstrate the performance comparison between SAF-LMS and SAF-MCC. The learning rates are set at μw = 0.02, μq = 0.01, and μw = 0.14, μq = 0.05 for SAF-LMS and SAF-MCC, respectively. Gaussian kernel σ = 0.1. It is
4.5
Computer Simulation Examples
199
Fig. 4.23 Theoretical and simulated EMSEs with Gaussian noise. (a) EMSE versus step size, σ 2v = 10 - 3 , (b) EMSE versus SNR, μ = 0.006 [20]
obvious that SAF-MCC achieves much better convergence speed and smaller testing error than the SAF-LMS. For Figs. 4.24 and 4.25, we investigate the convergence curves of the algorithm with different interval sampling Δx = [0.08, 0.12, 0.14, 0.16] and different kernel width σ = [0.05, 0.10, 0.50, 1.00], respectively. The step sizes are chosen such that all the algorithms have the same initial convergence rate. We can see that the convergence performance will be the best when Δx = 0.12 in Fig. 4.24. If the kernel width is too large or too small, the performance becomes poor. However, the parameter is set manually in this simulation. Therefore, how to choose the best σ is a challenging problem and is a research topic in the future. (B) Verification of Analysis Results In 4.4 we introduced the steady-state performance analysis method of MCC. In this subsection, we conduct the simulation with two types of noise, Gaussian and non-Gaussian to prove the correctness of mean analysis of it. The unknown system is a Wiener system consisting of the linear component wo = [0.6, -0.4, 0.25, -0.15, 0.1, -0.05, 0.001] and a nonlinear spline function expressed by a LUT qo with 23 control points. The interval sampling Δx = 0.2, and qo is given by qo = [-2.2, -2.0, -1.8, . . ., -1.0, -0.8, -0.91, -0.40, 0.20, -0.05,
200
4
Spline Adaptive Filter
Fig. 4.24 Theoretical and simulated EMSEs with uniform noise. (a) EMSE versus step size, σ 2v = 3:33 × 10 - 4 , (b) EMSE versus SNR, μ = 0.008 [20]
Fig. 4.25 Theoretical and simulated EMSEs with step size with binary σ 2v = 4 × 10 - 4 [20]
0.0, -0.15, 0.58, 1.0, 1.0, 1.2, 1.4, . . ., 0.02]. The input signal is a Gaussian process with zero mean and unit variance. And the step sizes are selected using the 17 values in the range μw = μq = μ = {0.004, 0.005, . . ., 0.02} [18].
4.5
Computer Simulation Examples
201
Figure 4.23 checks the validity of the analytic results, where the noise is Gaussian noise. The theoretical and simulated steady-state EMSEs versus the step size are depicted in Fig. 4.23a, where σ 2v is set to 10-3. As can be seen, the theoretic calculation results have a good match with the simulated results. The theoretical and simulated steady-state EMSEs versus the signal-to-noise ratio (SNR) are depicted in Fig. 4.23b, where μ is set to 0.008. As can be seen, the simulated results still match very well with the theoretical results. Figure 4.24 considers the uniform noise which is a typical non-Gaussian noise, where the theoretical and simulated EMSEs versus the step size and SNR are plotted. It can be seen that the simulated EMSEs converge to the theoretical ones, which again indicate the effectiveness of the analysis. In addition, as the step size increases, the steady-state EMSE gradually increases, which is consistent with (4.64) and (4.71). Figure 4.25 depicts the theoretical and simulated EMSEs versus step size where the noise is binary noise. Comments about the results in Fig. 4.25 are similar to those concerning the results of Fig. 4.24.
4.5.4
Simulation of ANC
In this section, we verify the effectiveness of the algorithm in Sect. 4.4 by using the ANC application. The performance of algorithms is evaluated by the average noise reduction (ANR) factor: ANRðnÞ =
Ae ðnÞ Ad ðnÞ
ð4:95Þ
where Ae(n) = λAe(n - 1) + (1 - λ)je(n)j and Ad(n) = λAd(n - 1) + (1 - λ)jd(n)j with λ being the forgetting factor. The initial value of Ae(n) and Ad(n) is zero.
4.5.4.1
Performance of the FcGMCC Algorithm
The reference signal x(n) is modeled by the standard symmetric α-stable (SαS)α distribution: φSαS ðt Þ = exp f - jtjα g
ð4:96Þ
where α is a characteristic exponent which with a small value indicates a peaky and heavy tailed distribution. In this example, the α is set to 1.1.
202
4
Spline Adaptive Filter
The primary noise sensed by the error microphone is given by: dðnÞ = uðn - 2Þ + 0:8u2 ðn - 2Þ - 0:4u3 ðn - 1Þ
ð4:97Þ
where u(n) = x(n) p(n) with p(n) being the impulse response of the transfer function [27]: PðZ Þ = z - 3 - 0:3z - 4 + 0:2z - 5
ð4:98Þ
The transfer function of the secondary path used is given by [27]: SN ðZ Þ = z - 2 - 0:5z - 3
ð4:99Þ
Figure 4.26 shows the ANR learning curves of the FcLMS [26], FcSNLMS [29], FcMCC, and FcGMCC algorithms where Fig. 4.28a, b choose α = 1.9 and α = 1.8, respectively. The various simulation parameters used in (a) are: N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:3, p = 1:3, β = 1:1 ðFcGMCCÞ N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:3 ðFcLMSÞ N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:3 ðFcSNLMSÞ N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:3, σ = 1 ðFcMCCÞ In (b), the various simulation parameters are: N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:1, p = 1:3, β = 1:1 ðFcGMCCÞ N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:1 ðFcLMSÞ N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:1 ðFcSNLMSÞ N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:1, σ = 1 ðFcMCCÞ While the basis matrix C is given by: 0 CB =
-1
B 1B 3 B 6@ -3 1
3
-3 1
1
-6 0
3 3
0C C C 0A
4
1
0
As can be seen from Fig. 4.26, the FcGMCC algorithm achieves better stability and lower noise reduction than FcLMS, FcSNLMS [29], and FcMCC algorithms. Additionally, a better noise removal effect can be achieved even when the noise source contains impulsive interference. Through simulation experiments, the good
4.5
Computer Simulation Examples
203
(a)
(b) Fig. 4.26 Comparison of ANR in NANC system with nonlinear secondary path for SaS primary noise: (a) α = 1.9 and (b) α = 1.8 [27]
204
4
Spline Adaptive Filter
performance of the proposed FcGMCC algorithm in nonlinear environment and non-Gaussian noise environment can be assured.
4.5.5
Simulation of Echo Cancellation
In this section, some experimental results are presented to prove the validity of the algorithm in 4.4.2. The results are compared with the standard linear echo canceler using LMS algorithm. Performance is evaluated in terms of echo return loss enhancement (ERLE), which is defined as:
E d 2 ð nÞ ERLE = 10 log 10 E f e 2 ð nÞ g
ð4:100Þ
The experimental tests were carried out in a simulated environment of T60 with different reverberation time. The impulse responses were evaluated using the Matlab toolbox Roomsim in a room of dimensions of 6 × 4 × 2.5 m and changing the wall absorption coefficients. The receiver is positioned at [1.56, 1.88, 1.1] m, while the source is located 1m in front of the microphone along the x-direction. In the first experiment, we apply a Gaussian white noise with unitary variance. The length of the signal is 50,000 samples. The learning rates are set to the following values: μw = 10-3 and ηw = 10-2, while μq = ηq = 10-1. B-spline basis is used, and the control points are equispaced of Δx = 0.1, and their number is set to 21. The initialization filter coefficient is w = [0, 0, . . . . , 0]T. The filter length N depends on the reverberation time used, and it is listed in Table 4.1. Figure 4.27 shows the ERLE comparison for the case of an anechoic environment. The results show that the method proposed in [28] is superior to the linear AEC method. In particular, the performance of Hammerstein system is better than that of Wiener system in ERLE index. Figure 4.28 shows an ERLE comparison using spline functions in an anechoic environment. It is clear from the figure that ERLE is lower than the previous case, and the convergence rate is slower. This can be explained by remembering the latter case, where we have to adjust an S-type function with 21 arguments instead of 2. Figure 4.29 depicts the error signal of a Hammerstein system using an S-shaped compensation nonlinear function. In addition, Fig. 4.30 describes the contour of the spline compensation nonlinear function at the end of convergence. We can see that this method can restore the original distortion function.
4.5
Computer Simulation Examples
205
ERLE comparison
40 35 30
ERLE [dB]
25 20 15 10 5 ERLE Hammerstein ERLE Wiener ERLE Linear
0 -5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5 x 104
samples Fig. 4.27 ERLE comparison for the anechoic environment using sigmoid function
ERLE comparison
25
20
ERLE [dB]
15
10
5
0
-5
ERLE Hammerstein ERLE Wiener ERLE Linear 0
0.5
1
1.5
2
2.5
3
3.5
4
samples Fig. 4.28 ERLE comparison for the anechoic environment using spline function
4.5
5 x 104
206
4
Spline Adaptive Filter
Fig. 4.29 Error signal of the Hammerstein system with sigmoid function in an anechoic environment
Fig. 4.30 Profile of the estimated loudspeaker nonlinearity using a spline function for the anechoic environment
References
4.6
207
Summary
In this section, we mainly introduce the nonlinear spline filter. In the first section, we introduce the spline filter model. It is essentially a linear-nonlinear network, and the linear part is an FIR filter. Nonlinear networks consist of adaptive lookup tables (LUT) and spline interpolation networks. In addition, the SAF-LMS algorithm under MSE criterion is also introduced. However, the performance of SAF-LMS algorithm deteriorates rapidly when interfered with non-Gaussian noise, especially in the presence of large outliers (significant deviations from observed values). Therefore, we introduce a robust spline filtering algorithms in the second section, called spline adaptive filtering (SAF-MCC) algorithm based on MCC criterion. The performance of the algorithms is also simulated and analyzed. Simulation results show that SAF-MCC performs well in non-Gaussian environment. At the same time, the theoretical analysis of the steady-state performance of the spline nonlinear filter under the MCC criterion is introduced in detail in the third section. In the fourth section, the validity of the mean value analysis is proved by simulation analysis. Finally, we introduce the application of spline filter in active noise control field.
References 1. Moodi H, D Bustan. On Identification of Nonlinear Systems Using Volterra Kernels Expansion on Laguerre and Wavelet Function[C]// Control & Decision Conference. IEEE, 2010. 2. Le D C, Zhang J, Li D, et al. A generalized exponential functional link artificial neural networks filter with channel-reduced diagonal structure for nonlinear active noise control[J]. Applied Acoustics, 2018, 139:174–181 3. F. Lindsten, T. B. Schon, M. I. Jordanb, Bayesian semiparametric Wiener system identification. Automatica. 49, 2053–2063 (2013) 4. M. Rasouli, D. Westwick, W. Rosehart, Quasiconvexity analysis of the Hammerstein model. Automatica. 50, 277–281 (2014) 5. Scarpiniti M, Comminiello D, Parisi R, et al. Hammerstein uniform cubic spline adaptive filters: Learning and convergence properties[J]. Signal Processing, 2014, 100(JUL.):112–123 6. Scarpiniti, Michele, Parisi, et al. Novel Cascade Spline Architectures for the Identification of Nonlinear Systems[J]. IEEE transactions on circuits and systems, I. Regular papers: a publication of the IEEE Circuits and Systems Society, 2015. 7. Liu C, Peng C, Tang X, et al. Two variants of the IIR spline adaptive filter for combating impulsive noise[J]. Journal on Advances in Signal Processing, 2019, 2019(1). 8. Liu C, Zhang Z, Tang X. Sign Normalised Hammerstein Spline Adaptive Filtering Algorithm in an Impulsive Noise Environment[J]. Neural Processing Letters, 2019, 50(1):477–496 9. Yang Y, Yang B, Niu M. Spline adaptive filter with fractional-order adaptive strategy for nonlinear model identification of magnetostrictive actuator[J]. Nonlinear Dynamics, 2017, 90(1):1647–1659. 10. Scarpiniti M, Comminiello D, Parisi R, et al. Nonlinear spline adaptive filtering[J]. Signal Processing, 2013, 93(4):772–783. 11. Liu, Chang, Zhang, et al. Sign normalised spline adaptive filtering algorithms against impulsive noise[J]. Signal Processing: The Official Publication of the European Association for Signal Processing (EURASIP), 2018.
208
4
Spline Adaptive Filter
12. Uncini, Aurelio, Parisi, et al. Nonlinear system identification using IIR Spline Adaptive Filters [J]. Signal Processing: The Official Publication of the European Association for Signal Processing (EURASIP), 2015, 108:30–35 13. Chang L, Zhang Z, Tang X. Sign-Normalized IIR Spline Adaptive Filtering Algorithms for Impulsive Noise Environments[J]. Circuits Systems & Signal Processing, 2018. 14. Guan S, Li Z. Normalised Spline Adaptive Filtering Algorithm for Nonlinear System Identification[J]. Neural Processing Letters, 2017, 46(2):595–607. 15. Peng S, Wu Z, Zhang X, et al. Nonlinear spline adaptive filtering under maximum correntropy criterion[C]// Tencon IEEE Region 10 Conference. IEEE, 2016. 16. Zongze W, Siyuan P, Badong C, et al. Robust Hammerstein Adaptive Filtering under Maximum Correntropy Criterion[J]. Entropy, 2015, 17(10):7149–7166. 17. Singh A , José Carlos Príncipe. Using Correntropy as a cost function in linear adaptive filters [C]// International Joint Conference on Neural Networks. IEEE, 2009. 18. None. Steady-State Mean-Square Error Analysis for Adaptive Filtering under the Maximum Correntropy Criterion[J]. IEEE Signal Processing Letters, 2014, 21(7):880–884. 19. A. H. Sayed, Fundamentals of adaptive filtering, John Wiley & Sons,2003 20. Wang W, Zhao H, Zeng X, et al. Steady-State Performance Analysis of Nonlinear Spline Adaptive Filter Under Maximum Correntropy Criterion[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2020, 67(6):1154–1158. 21. Arenas-Garcia J, Figueiras-Vidal A R, Sayed A H. Steady state performance of convex combinations of adaptive filters[C]// Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP ‘05). IEEE International Conference on. IEEE, 2005. 22. Wu L, Qiu X, Guo Y. A generalized leaky FxLMS algorithm for tuning the waterbed effect of feedback active noise control systems[J]. Mechanical Systems & Signal Processing, 2018, 106 (jun.):13–23. 23. Patel V, George N V. Compensating acoustic feedback in feed-forward active noise control systems using spline adaptive filters[J]. Signal Processing, 2016, 120(MAR.):448–455. 24. George, Nithin V, Patel, et al. Nonlinear active noise control using spline adaptive filters [J]. Applied acoustics, 2015. 25. Y Gao, H Zhao, J Lou, Robust Spline Adaptive filtering algorithm based-GMCC for nonlinear active noise control, Submitted to Applied Acoustics. 26. Luo L, Sun J, Huang B. A novel feedback active noise control for broadband chaotic noise and random noise[J]. Applied Acoustics, 2017, 116(jan.):229–237. 27. Lu L, Zhao H. Adaptive Volterra filter with continuous lp-norm using a logarithmic cost for nonlinear active noise control[J]. Journal of Sound & Vibration, 2016, 364:14–29. 28. Scarpiniti M, Comminiello D, Parisi R, et al. Comparison of Hammerstein and Wiener systems for nonlinear acoustic echo cancelers in reverberant environments[C]// International Conference on Digital Signal Processing. IEEE, 2011. 29. Liu C, Zhang Z, Tang X. Sign normalised spline adaptive filtering algorithms against impulsive noise[J]. Signal Processing, 2018, 148: 234–240.
Chapter 5
Kernel Adaptive Filters
5.1
Introduction
Kernel method is a nonlinear nonparametric modeling tool. The key idea is to transform the input data to a high-dimensional feature space via a reproducing kernel. Then appropriate linear methods are subsequently applied on the transformed data. The kernel method needs to find the inner products in the formulation and transform the inner products to a kernel function. This methodology is also called the “kernel trick”, which has been widely used in many well-known algorithms, including support vector machine (SVM), principal component analysis (PCA), Fisher discriminant analysis. Moreover, reproducing kernel Hilbert space (RKHS) plays a central role to provide linearity, convexity, and universal approximation capability. The kernel adaptive filters, which create a growing Linear-In-the-Parameter (LIP) model, are developed by implementing the well-established linear adaptive filters in kernel space. The KAF algorithms have a bottleneck of computational complexity, requiring linear or superlinear time and space with respect to the sample number n. Besides, the size of the network increasing with the number of training data raises the challenge for applying KAFs in non-stationary signal processing tasks. On the one hand, a fundamental question is if it is necessary to memorize all the past inputs. Then, by removing redundant data, it is possible to keep a minimal set of centers and cover the area where inputs will likely appear (Imagine that a kernel is a sphere in the input space with the kernel bandwidth as the radius). On the other hand, a sparse model (a network with as few nodes as possible) is desirable, because it reduces the complexity in terms of computation and memory, and it usually gives better generalization ability. There are many approaches to simplify the network, and we divide them into sparsification, quantization, and kernel approximation methods. In this chapter, we will introduce some classic approaches in each class. In addition, the main abbreviations in this chapter are shown in Table 5.1.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Zhao, B. Chen, Efficient Nonlinear Adaptive Filters, https://doi.org/10.1007/978-3-031-20818-8_5
209
210
5
Kernel Adaptive Filters
Table 5.1 Abbreviations in the chapter Abbreviation KAF KLMS KRLS KAPA RKHS LMS RLS APA RBF NC ALD VQ QKLMS DQ VQIT PRQ RFF
5.2 5.2.1
Complete spelling Kernel adaptive filter Kernel least mean square Kernel recursive least square Kernel affine projection algorithm Reproducing kernel Hilbert space Least mean square Recursive least square Affine projection algorithm Radial basis function Novelty criterion Approximate linear dependency Vector quantization Quantized kernel least mean square Density-dependent vector quantization Vector quantization using information theoretic learning Probability density rank based quantization Random fourier feature
Kernel Adaptive Filters Reproducing Kernel Hilbert Space
A Hilbert space is an inner-product space, which has an orthonormal basis fxk g1 k = 1. be a basis and be the largest and most inclusive space of vectors. Any Let fxk g1 k=1 vector not necessarily lying in the original inner-product space can be represented as x=
1 X
ak x k :
ð5:1Þ
k=1
where x can be spanned by the basis fxk g1 k = 1 , ak are the coefficients of the representation. Define a new vector ym =
m X
ak x k :
ð5:2Þ
k=1
We can calculate the Euclidean distance between the vector yn and the vector ym as follows
5.2
Kernel Adaptive Filters
211
k yn - ym k 2 = k =k =
n P
ak x k -
k=1
n P k = mþ1 n P
k = mþ1
m P k=1
ak x k k 2
ak x k k 2
:
ð5:3Þ
ak 2
Therefore, to make the definition of x meaningful, the following conditions hold: n P
1. 2.
k = mþ1 m P k=1
ak 2 → 0 as both n, m →1.
ak 2 < 1.
In other word, define a Cauchy sequence fxk g1 k = 1 , thus a vector x can be expanded on the basis fxk g1 k = 1 , if and only if x is a linear combination of the basis vectors and the associated coefficients fak g1 k = 1 are square summable. Obviously, the space is more “complete” than the inner-product space. We may, therefore, derive the following important statement: An inner-product space is complete, if every Cauchy sequence taken from the space converges to a limitation in ; a complete inner-product space is called a Hilbert space. A Mercer kernel is a continuous, symmetric, positive-definite function k : × → , where is the input domain, a subset of L and L is the input dimension. The well-known Gaussian kernel is kðx1 ,x2 Þ = exp ð - a k x1 - x2 k 2 Þ:
ð5:4Þ
Let be any vector space of all real-valued functions of input x that are generated by the kernel k(x, ). Suppose two functions h() and g() are picked from the space that are, respectively, represented by h=
l X
ai kðxi , Þ
ð5:5Þ
bj kðxj , Þ,
ð5:6Þ
i=1
and g=
m X j=1
where ai and bj are the expansion coefficients and both xi and xj 2 for all i and j. The bilinear form defined as
212
5
< h, g > =
Kernel Adaptive Filters
l X m X
ai kðxi , xj Þbj
ð5:7Þ
i=1 j=1
satisfies the following properties: 1. Symmetry < h,g > = < g,h >
ð5:8Þ
2. Scaling and distributive property < ðcf þ dgÞ,h > = c < f ,h > þ d < g,h >
ð5:9Þ
k f k 2 = < f ,f > ≥ 0
ð5:10Þ
3. Squared norm
Accordingly, the bilinear term < h, g > is indeed an inner product. There is one additional property that follows directly. Specifically, setting g() = k(x, ), we obtain < h, kðx, Þ > =
l X
ai kðxi , xÞ >
i=1
ð5:11Þ
= hðxÞ This property is known as the reproducing property. The kernel k(xi, x), representing a function of the two vectors xi and x, is called a reproducing kernel of the vector space if it satisfies the following two conditions: • For every xi 2 , k(xi, x) being a function of the vector x belongs to . • It satisfies the reproducing property. These two conditions are indeed satisfied by the Mercer kernel, thereby endowing it with the designation “reproducing kernel.” If the inner-product space , in which the reproducing kernel space is defined and also complete, then it is called a reproducing kernel Hilbert space, for which we use the acronym RKHS hereafter. The analytic power of RKHS is expressed in an important theorem called the Mercer theorem. The Mercer theorem states that any reproducing kernel k(xi, x) can be expanded as follows: kðxi , xÞ =
1 X
λφðxi ÞT φðxÞ,
ð5:12Þ
i=1
where λ and φ() denote eigenvalues and the eigenfunctions, respectively. If the eigenvalues are non-negative, a mapping φ can be constructed as
5.2
Kernel Adaptive Filters
213
φ : → φ=
ð5:13Þ
pffiffiffiffiffi pffiffiffiffiffi λ1 φ1 ðxÞ, λ2 φ2 ðxÞ, ⋯ :
ð5:14Þ
The dimensionality of is determined by the number of strictly positive eigenvalues, which can be infinite in the Gaussian kernel case. In the machine learning literature, φ is usually treated as the feature mapping and φ(x) is the transformed feature vector lying in the feature space (which is an innerproduct space). Therefore, an important implication is kðxi ,xÞ = φðxi ÞT φðxÞ:
ð5:15Þ
It is obvious that is essentially the same as the RKHS, which is induced by the kernel by identifying φ(x) = k(x, ). Sometimes, one can find an explicit expression of φ, but in the most cases, it is hard to explicitly express φ. Here, we use an example to illustrate the mapping φ. Define 2
kðx,yÞ = ð1 þ xT yÞ ,
ð5:16Þ
where x = [x1, x2]T and y = [y1, y2]T with y1 and y2 being constant values, we have kðx,yÞ = 1 þ x21 y21 þ 2x1 x2 y1 y2 þ þx22 y22 þ 2x1 y1 þ 2x2 y2 :
ð5:17Þ
Therefore, the mapping φ of the input vector x can be written as pffiffiffi pffiffiffi pffiffiffi T φðxÞ = 1, x21 , 2x1 x2 , x22 , 2x1 , 2x2 :
ð5:18Þ
It is easy to verify that φðxÞT φðyÞ = kðx,yÞ:
ð5:19Þ
In general, the dimensionality of φ scales with O(L p), where L is the dimension of input vectors and p is the order of the polynomial kernel.
5.2.2
Kernel Least Mean Square
A simple linear finite impulse response filter is least mean square (LMS) algorithm, which uses the stochastic gradient to optimize the cost function. If the mapping between d and x is highly nonlinear, very poor performance can be expected from LMS. Therefore, we can use the kernel method to transform the input x into a high-
214
5
Kernel Adaptive Filters
dimensional feature space as φ(x). Meanwhile, due to the difference in dimensionality, wTφ(x) is a more powerful model. Denote φ(x) as φi for simplicity. Basically, the LMS algorithm operates by minimizing the instantaneous cost function JðiÞ =
1 2 e : 2 i
ð5:20Þ
Using the LMS algorithm on the kernel space yields 8 w0 = 0 > > < ei = di - wTi- 1 φi > > : wi = wi - 1 þ ηei φi ,
ð5:21Þ
where wi denotes the estimate of the weight vector in feature space . Since we cannot write the expression of φ directly, which is usually in high-dimensional space, we use an alternative way of carrying out the computation. The repeated application of the weight update Eq. (5.21) through iterations yields wi = wi - 1 þ ηei φi = ½wi - 2 þ ηei - 1 φi - 1 þ ηei φi = wi - 2 þ ½ηei - 1 φi - 1 þ ηei φi = w0 þ η =η
i P j=1
i P j=1
ð5:22Þ
e j φj
e j φj ,
where i denotes the i step training, the weight estimation is expressed as a linear combination of all the previous and present inputs and is weighted by the training errors and a training step η. Thus, we can compute the output of the filter, " wi T φðxÞ = η
i P j=1
#T e j φj
φðxÞ
i P ej φj T φðxÞ : =η
ð5:23Þ
j=1
We can efficiently compute the inner products in the feature space by a kernel function as follows
5.2
Kernel Adaptive Filters
215
Fig. 5.1 Network topology of KLMS at iteration i
wi T φðxÞ = η
i X
ej kðxj ,xÞ:
ð5:24Þ
j=1
Comparing the LMS with the above iterations, we find that there is not any weight in the models because of the kernel method. Instead, we have the sum of all past errors multiplied by the kernel functions on the previously received data, which is equivalent to the weights in Eq. (5.22). Therefore, the model no longer iterates the weights and does not need compute the inner products. The new algorithm is named Kernel Least Mean Square (KLMS). It is the form of LMS in RKHS and allocates a new kernel unit for the new training data with input xi as a new center and ηe(i) as the corresponding coefficient. The algorithm is summarized in Algorithm 1, and the corresponding topology is shown in Fig. 5.1.
The KLMS and radial basis function (RBF) network have a similar topology. The differences include: (1) the weight before each kernel function is the corresponding
216
5
Kernel Adaptive Filters
training error; (2) KLMS has a growing network, where each new unit is placed over each new input; (3) the kernel function is not limited to be a radial basis function and can be any other Mercer kernel. KLMS requires O(i) operations in each iteration and weight update, but we need to pay attention to several aspects that are still unspecified. The first is how to select the kernel k(, ), the second is how to select the step-size parameter η, and finally how to cope with the growing memory/computation requirement for online operation.
5.2.2.1
Kernel Selection
The types of kernel functions are very important and define the measurement of data and can finally affect the performance. In the following, a brief discussion of kernel and parameter selection are provided. To apply the kernel method, we first need to pick a kernel function. In the existing works of nonparametric regression, it is known that any bell-shaped weight function (Gaussian function, tri-cube function, etc.) leads to equivalent asymptotic accuracy. Actually, we have discussed that weight functions are not necessarily reproducing kernels and vice versa. The RKHS approach examines more closely the eigenfunctions of the kernel and its richness for approximation. It is known that the Gaussian kernel creates a reproducing kernel Hilbert space with universal approximating capability while the polynomial kernel of finite order does not, because its Taylor expansion has infinite order. At the same time, Gaussian kernel has been widely used due to its excellent mathematical properties. Model function composed of Gaussian kernels is usually very smooth and brings advantages in numerical calculation. Gaussian kernel function is shown in the following. By contrast, the approximating capability of the polynomial kernel with order p is limited to any polynomial function with its degree less than or equal to p. Unless it is clear from the problem domain that the target function is a polynomial function or can be well approximated by a polynomial function, the Gaussian kernel is usually a default choice. The studies in many approximation fields show that the Gaussian kernel has the universal approximating capability, is numerically stable, and usually gives reasonable results. The well-known Gaussian kernel function is shown as kðxi ,xj Þ = exp
! k xi - x j k 2 , σ2
ð5:25Þ
where σ is the kernel bandwidth. Up to now, there are many methods for selecting a kernel size for the Gaussian kernel borrowed from the areas of statistics, nonparametric regression and kernel density estimation. Available methods to select suitable kernel bandwidth include cross-validation, nearest neighbors and penalizing functions. Cross-validation is easy and efficient, but the computational complexity is
5.2
Kernel Adaptive Filters
217
very huge especially when the data set is large or there are several hyper parameters need to be chosen. penalizing functions [1], plug-in methods are also used to select kernel bandwidth [2], but both cost huge computation resources. The Silverman’s rule is widely accepted in kernel density estimation although it is derived under a Gaussian assumption and is usually not appropriate for multimodal distributions [3]. Silverman’s rule uses the mean integrated square error (MISE) between the estimated and the actual PDF to get the optimal metric [3]: 1 n odþ4 , σ opt = σ X 4N - 1 ð2d þ 1Þ - 1
ð5:26Þ
P where d is the dimensionality of the data and σ 2x = d - 1 Ni= 1 X ii with Xii being the diagonal elements of the sample covariance matrix. In some situations, researchers use some statistic quantities, and the empirical kernel bandwidth is usually related to the mean or the variance of the data [4, 5]. Some adaptive kernel bandwidth selection methods are applied to kernel density estimation [6, 7]. Other adaptive kernel bandwidth optimization methods optimize the kernel bandwidth by gradient decent methods [8–10]. These methods are often used in online learning because of cheap computation, but they also have the disadvantage of slow convergence. Multi-kernel learning is often used to solve the problem of selection of kernel bandwidth [11, 12]. Multi-kernel learning replaces one single Gaussian kernel with multiple Gaussian kernels with different kernel bandwidths. In the training phase, the kernel bandwidth is selected by putting different weights on different Gaussian kernels. The kernel with more appropriate kernel bandwidth will get higher weight after training. However, multi-kernel learning is a suboptimal solution of find best kernel bandwidth and brings unnecessary computation burden. Although a large number of methods can be used to select the kernel bandwidth, cross-validation is the most used approach in practical problem and it is very straightforward, which always needs huge computations. From the perspective of functional analysis, the kernel function define the inter products in RKHS, which is also the similarity measure in RKHS. Therefore, selecting different kernel bandwidth, the same input data might be mapped to very different functions. In practical problems, if the kernel size is too large, all the data in RKHS will look similar (internal products are all close to 1) and the system will be reduced to linear regression. If the kernel size is too small, all data looks different (internal products are all close to 0) and the system cannot infer the unseen samples between the training points.
5.2.2.2
Step-Size Selection
After choosing the kernel and its free parameter, we need to find a suitable step-size. Since KLMS is the form of LMS algorithm in RKHS, we can use the same way to analysis the step-size. In particular, the step-size is a compromise between convergence time and mis-adjustment (i.e., increasing the step-size parameter decreases
218
5
Kernel Adaptive Filters
convergence time but increases mis-adjustment). Moreover, the step-size is upper bounded by the reciprocal of the largest eigenvalue of the transformed data autocorrelation matrix. Consider data set fðxi ,d i ÞgNi= 1 with i being the time index. Define the data matrix in the feature space Φ = [φ1, φ2, ⋯ , φN], Rφ is an autocorrelation matrix, and Gφ is a Gram matrix, we have Rφ =
1 ΦΦT N
ð5:27Þ
Gφ =
1 T Φ Φ N
ð5:28Þ
where Gφ is an N × N matrix with k(xi, xj) being its (i, j)-th element. The step-size is required to satisfy the following condition to keep the algorithm staying stable η