Efficient Nonlinear Adaptive Filters: Design, Analysis and Applications 303120817X, 9783031208171

This book presents the design, analysis, and application of nonlinear adaptive filters with the goal of improving effici

423 146 13MB

English Pages 270 [271] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Efficient Nonlinear Adaptive Filters: Design, Analysis and Applications
 303120817X, 9783031208171

Table of contents :
Preface
Acknowledgments
Contents
Abbreviations and Acronyms
Chapter 1: Adaptive Filter
1.1 Introduction
1.2 Linear Adaptive Filters
1.2.1 LMS Algorithm
1.2.2 Affine Projection Algorithm
1.2.3 Recursive Least-Squares Algorithm
1.2.4 Subband Algorithm
1.2.5 Kalman Filter
1.3 Nonlinear Adaptive Filters
1.3.1 Volterra Filter
1.3.2 FLANN Adaptive Filter
1.3.3 Spline Adaptive Filter
1.3.4 Kernel Adaptive Filter
1.4 Summary
References
Chapter 2: Volterra Adaptive Filter
2.1 Introduction
2.2 Volterra Filter Model
2.3 Pipelined Volterra Filter
2.4 Convex Combination of Volterra Filter
2.4.1 The Algorithm I
2.4.2 The Algorithm II
2.5 Robust Volterra Filtering Algorithm
2.6 The Volterra Expansion Model Based Filtered-x Logarithmic Continuous Least Mean p-Norm (VFxlogCLMP) Algorithm for Active N...
2.6.1 VFxlogLMP Algorithm
2.6.2 VFxlogCLMP Algorithm
2.6.3 Performance Analysis of the VFxlogCLMP Algorithm
2.6.4 EMSE Analysis
2.6.5 Convergence Condition of the VFxlogCLMP Algorithm
2.7 Diffusion Volterra Nonlinear Filtering Algorithm
2.7.1 Diffusion Least Mean Square (DLMS) Algorithm
2.7.2 Problem Formulation
2.7.3 The DV Filtering Algorithm
2.8 Simulation Results
2.8.1 Pipelined Volterra Filter
2.8.2 Convex Combination of Volterra Filter
2.8.3 Robust Volterra Filtering Algorithm
2.8.4 The VFxlogCLMP Algorithm for ANC Application
2.8.5 Diffusion Volterra Filtering Algorithm
2.9 Summary
References
Chapter 3: FLANN Adaptive Filter
3.1 Introduction
3.2 Neural Network Structures
3.2.1 MLP
3.2.2 ChNN
3.2.3 FLANN
3.2.4 LeNN
3.3 Recursive FLANN
3.3.1 Feedback FLANN Filter
3.3.2 Reduced Feedback FLANN Filter
3.3.3 Recursive FLANN Structure
3.3.3.1 A BIBO Stability Condition
3.4 Convex Combination of FLANN Filter
3.5 Random Fourier Filter
3.5.1 Random Fourier Feature
3.5.2 RF-LMS Algorithm
3.5.3 Cascaded RF-LMS (CRF-LMS) Algorithm
3.5.4 Mean Convergence Analysis
3.5.5 Computational Complexity
3.6 Nonlinear Active Noise Control
3.6.1 Robust Control Algorithms for NANC
3.6.1.1 FsLMP Algorithm
3.6.1.2 FsqLMP Algorithm
3.6.1.3 RFsLMS Algorithm
3.6.1.4 FsMCC Algorithm
3.6.1.5 RFF-FxMCC Algorithm
3.7 Nonlinear Channel Equalization
3.7.1 Communication Channel Equalization
3.7.2 Channel Equalization Using a Generalized NN Model
3.7.3 FLNN Equalizer
3.7.3.1 Adaptive Equalizer with FLNN Cascaded with Chebyshev Orthogonal Polynomial
3.7.3.2 Decision Feedback Equalizer Using the Combination of FIR and FLNN
3.8 Computer Simulation Examples
3.8.1 FLANN-Based NANC with Minimum Phase Secondary Path System
3.8.2 Random Fourier Filter-Based NANC
3.8.2.1 Projection Dimension and Memory Length of Random Fourier Filter
3.8.2.2 Real Example: Random Fourier Filter-Based Active Traction Substation Noise Control
3.8.3 Nonlinear Channel Equalization
3.8.3.1 Channel Equalization Using a Generalized NN Model
3.8.3.2 Adaptive Equalizer Based on the FLNN Cascaded with Chebyshev Orthogonal Polynomial Structure
3.8.3.3 Adaptive Decision Feedback Equalizer with the Combination of FIR Filter and FLANN
3.9 Summary
References
Chapter 4: Spline Adaptive Filter
4.1 Introduction
4.2 Spline Filter Model
4.2.1 Spline Adaptive Filter
4.2.2 Basic Spline Filter Algorithm
4.2.2.1 SAF-LMS Algorithm
4.2.2.2 SAF-NLMS Algorithm
4.2.2.3 SAF-SNLMS Algorithm
4.2.2.4 SAF-VSS-SNLMS Algorithm
4.3 Robust Spline Filtering Algorithm
4.3.1 SAF-MCC Algorithm
4.3.2 Performance Analysis
4.4 Applications
4.4.1 Active Noise Control Based on Spline Filter
4.4.1.1 FcGMCC Algorithm
4.4.1.2 Convergence Analysis
4.4.2 Echo Cancellation Based on Spline Filter
4.4.2.1 The Nonlinear Echo Canceler
4.4.2.2 The Architectures Proposed in
4.5 Computer Simulation Examples
4.5.1 Basic Spline Filter Algorithm Simulation
4.5.2 SAF-MCC Algorithm Simulation
4.5.3 Performance Analysis Simulation
4.5.4 Simulation of ANC
4.5.4.1 Performance of the FcGMCC Algorithm
4.5.5 Simulation of Echo Cancellation
4.6 Summary
References
Chapter 5: Kernel Adaptive Filters
5.1 Introduction
5.2 Kernel Adaptive Filters
5.2.1 Reproducing Kernel Hilbert Space
5.2.2 Kernel Least Mean Square
5.2.2.1 Kernel Selection
5.2.2.2 Step-Size Selection
5.2.2.3 Mean Square Convergence Analysis
5.2.3 Kernel Affine Projection Algorithms
5.2.3.1 Affine Projection Algorithms
5.2.3.2 Kernel Affine Projection Algorithms
5.2.4 Kernel Recursive Least Squares
5.3 Network Optimization
5.3.1 Sparsification Algorithms
5.3.1.1 Novelty Criterion
5.3.1.2 Approximate Linear Dependency
5.3.1.3 Surprise Criterion
5.3.2 Quantization Algorithms
5.3.2.1 On-Line Quantization
5.3.2.2 Off-Line Quantization
5.3.3 Kernel Approximation
5.3.3.1 Nyström Method
5.3.3.2 Random Fourier Feature Method
5.4 Computer Simulation Examples
5.4.1 Comparisons of Different KAFs
5.4.1.1 Mackey-Glass Chaotic Time Series Prediction
5.4.1.2 Nonlinear Channel Equalization
5.4.2 Comparisons of Network Optimization Methods
5.4.2.1 Relation Between Code Book Size and Performance
5.4.2.2 Comparison of Several Network Optimization Methods
5.4.2.3 KRLS with Different Sparsification Methods
5.4.2.4 Comparison of Different Quantization Methods
5.4.2.5 KRR with Different Quantization Methods
5.5 Summary
References
Index

Citation preview

Haiquan Zhao Badong Chen

Efficient Nonlinear Adaptive Filters

Design, Analysis and Applications

Efficient Nonlinear Adaptive Filters

Haiquan Zhao • Badong Chen

Efficient Nonlinear Adaptive Filters Design, Analysis and Applications

Haiquan Zhao School of Electrical Engineering Southwest Jiaotong University Chengdu, China

Badong Chen Inst Artificial Intelligence & Robo Xi'an Jiaotong University Xi'an, China

ISBN 978-3-031-20817-1 ISBN 978-3-031-20818-8 https://doi.org/10.1007/978-3-031-20818-8

(eBook)

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

In recent years, signal-processing technology taken great a leap forward. Especially with the development of digital circuit technology, the efficiency of digital signal processing (DSP) has been greatly improved. Digital filtering technology, an important branch of DSP, has been widely studied and applied in many fields, which mainly aims to extract the useful information contained in the received signal. In practice, the device that achieves filtering function is generally called a filter, which can extract the desired information from the input signal. Digital filter is used to process discrete-time signal. For linear time invariant (LTI) filter, its internal parameters and structure are fixed, and the output signal is the linear mapping of the input signal. However, when the statistical characteristics of the signal to be processed are unknown, the LTI filter cannot provide good signal processing capability. At this time, adaptive filter is a very attractive solution, which can optimize its internal free parameters according to the input signal to provide effective performance. Strictly speaking, adaptive filter is a kind of nonlinear filter (its characteristics depend on the input signal), so it does not satisfy superposition and homogeneity. However, at a certain moment, the parameters of the filter are fixed, and the output of the filter is a linear mapping of the input signal. At the crux of adaptive filters is the design of the filtering algorithm, i.e. how the parameters of the filter are adaptively adjusted to meet the performance requirements in response to changes in the environment (input and desired signal). The algorithms discussed in this book are all based on discrete-time signals, because the rapid development of VLSI technology makes the processing of discrete-time signals more rapid and convenient. An adaptive filter generally consists of three parts: (1) Application. Adaptive filtering technology has been applied in many aspects, such as channel equalization, signal prediction, echo cancellation, beam-forming, system identification, and signal enhancement. (2) Structure. Adaptive filter can be composed of many structures, and different structure corresponds to different computational complexity. According to the form of impulse response, adaptive filter can be divided into finite impulse response (FIR) filter and infinite impulse response (IIR) filter. The most widely v

vi

Preface

used FIR filter is transverse filter, its transfer function has no pole point, so there is no system stability issue. For this structure, the output of the filter is a linear combination of the input signals. However, most of the actual systems are nonlinear, the linear adaptive filter is not suitable to deal with this kind of situation because of its inherent defects, so the nonlinear adaptive filter is proposed to overcome abovementioned problem, such as Volterra filter, function link artificial neural network (FLANN), spline filter, and kernel function–based filter. (3) Algorithm. The algorithm adaptively adjusts the coefficients of the filter to minimize a certain optimization criterion. In fact, the theory of linear adaptive filtering is mature enough, and a large number of journals and books have summarized it in detail. However, there are very few books on nonlinear adaptive filters. Therefore, the core content of this book is to introduce some nonlinear adaptive filters with complete theoretical systems, including some classical applications, nonlinear filter structures, and algorithms. The first chapter of this book briefly introduces the basic knowledge of classical linear adaptive filtering. The understanding of this basic knowledge is the basis for further study of nonlinear adaptive filtering methods in the following chapters. The main contents of this book consist of five chapters, which are summarized as follows: Chapter 1 mainly introduces the linear adaptive filter and several classical adaptive filtering algorithms. Finally, a brief introduction is given to the nonlinear filter that will be described in the following chapters. Chapter 2 introduces the Volterra filter for nonlinear systems, mainly includes the pipelined Volterra filter, convex combined Volterra filter and robust Volterra filter, and their corresponding nonlinear filtering algorithms. Moreover, a robust diffusion Volterra (DV) algorithm for distributed nonlinear network is also described in detail. Finally, computer simulations are provided. Chapter 3 describes the functional link artificial neural network (FLANN)-based nonlinear filter, mainly includes the structure, principle, and some improved models of the FLANN-based filter. The nonlinear property and modelling ability of the FLANN-based filter are verified by computer simulations. In Chap. 4, the nonlinear spline filter and adaptive algorithms are introduced. In addition, the convergence behavior of a robust spline filtering algorithm is analyzed, and the validity of analysis results are verified by computer simulations. Finally, the application of spline filter in active noise control is given. In Chap. 5, we introduce the kernel adaptive filter and several classical kernel adaptive filtering algorithms. In particular, in order to reduce the high computing consumption and storage space caused by the large-scale hidden layer nodes of these algorithms, several network optimization methods are presented. Finally, computer simulations are provided to verify the validity of these optimization methods. This book provides a reference for researchers and students in the field of developing and researching advanced signal processing of adaptive filters, and also provides a convenient way for practical engineers in related fields to understand effective algorithms. The readers of this book need to understand some basic principles of digital signal processing, random processes, and matrix theory,

Preface

vii

including finite impulse response (FIR) digital filter realization, random variables, and first-order and second-order statistics. Assuming that the readers have such a background, they will have no problem reading this book. In addition, a number of references are given at the end of each chapter to facilitate the readers’ further study of a chapter. Chengdu, China Xi’an, China

Haiquan Zhao Badong Chen

Acknowledgments

We would like to thank some of my former and current graduate students. In particular, we would like to thank PhD. Yingying Zhu, PhD. Shaohui Lv, PhD. Wenjing Xu, PhD. Dongxu Liu, PhD. Pengfei Li, Dr. Chuang Liu, Ms. Yuan Gao, Ms. Boyu Tian, Ms. Jinwei Lou, Ms. Xinhao Xu, Dr. Zhengda Qin, and Dr. Lei Xing with whom we have worked on the topic of this book and who contributed to some of the results reported here. This work was partially supported by National Natural Science Foundation of China (grant: 62171388, 61871461), Fundamental Research Funds for the Central Universities (grant: 2682021ZTPY091), and Southwest Jiaotong University Graduate Teaching Materials (Monograph) Funding Construction Project (grant:SWJTU-ZZ2022-017).

ix

Contents

1

Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Linear Adaptive Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 LMS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Affine Projection Algorithm . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Recursive Least-Squares Algorithm . . . . . . . . . . . . . . . . . 1.2.4 Subband Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.5 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Nonlinear Adaptive Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Volterra Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 FLANN Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Spline Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.4 Kernel Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 1 1 2 4 5 10 11 12 13 14 14 15 15

2

Volterra Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Volterra Filter Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Pipelined Volterra Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Convex Combination of Volterra Filter . . . . . . . . . . . . . . . . . . . . 2.4.1 The Algorithm I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 The Algorithm II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Robust Volterra Filtering Algorithm . . . . . . . . . . . . . . . . . . . . . . . 2.6 The Volterra Expansion Model Based Filtered-x Logarithmic Continuous Least Mean p-Norm (VFxlogCLMP) Algorithm for Active Noise Control Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 VFxlogLMP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 VFxlogCLMP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . .

17 17 17 20 24 27 30 33

37 39 40 xi

xii

Contents

2.6.3

3

Performance Analysis of the VFxlogCLMP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.4 EMSE Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.5 Convergence Condition of the VFxlogCLMP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Diffusion Volterra Nonlinear Filtering Algorithm . . . . . . . . . . . . . 2.7.1 Diffusion Least Mean Square (DLMS) Algorithm . . . . . . . 2.7.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.3 The DV Filtering Algorithm . . . . . . . . . . . . . . . . . . . . . . . 2.8 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.1 Pipelined Volterra Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.2 Convex Combination of Volterra Filter . . . . . . . . . . . . . . . 2.8.3 Robust Volterra Filtering Algorithm . . . . . . . . . . . . . . . . . 2.8.4 The VFxlogCLMP Algorithm for ANC Application . . . . . . 2.8.5 Diffusion Volterra Filtering Algorithm . . . . . . . . . . . . . . . 2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48 48 49 50 51 56 56 60 64 66 71 78 78

FLANN Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Neural Network Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 MLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 ChNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 FLANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 LeNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Recursive FLANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Feedback FLANN Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Reduced Feedback FLANN Filter . . . . . . . . . . . . . . . . . . . 3.3.3 Recursive FLANN Structure . . . . . . . . . . . . . . . . . . . . . . . 3.4 Convex Combination of FLANN Filter . . . . . . . . . . . . . . . . . . . . 3.5 Random Fourier Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Random Fourier Feature . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 RF-LMS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Cascaded RF-LMS (CRF-LMS) Algorithm . . . . . . . . . . . . 3.5.4 Mean Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . 3.5.5 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Nonlinear Active Noise Control . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 Robust Control Algorithms for NANC . . . . . . . . . . . . . . . 3.7 Nonlinear Channel Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 Communication Channel Equalization . . . . . . . . . . . . . . . . 3.7.2 Channel Equalization Using a Generalized NN Model . . . . 3.7.3 FLNN Equalizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Computer Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . .

83 83 84 84 85 87 88 90 90 91 95 100 107 107 108 110 113 115 116 116 126 126 127 128 135

42 45

Contents

xiii

3.8.1

FLANN-Based NANC with Minimum Phase Secondary Path System . . . . . . . . . . . . . . . . . . . . . . 3.8.2 Random Fourier Filter-Based NANC . . . . . . . . . . . . . . . . 3.8.3 Nonlinear Channel Equalization . . . . . . . . . . . . . . . . . . . . 3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

135 138 142 159 159

4

Spline Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Spline Filter Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Spline Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Basic Spline Filter Algorithm . . . . . . . . . . . . . . . . . . . . . . 4.3 Robust Spline Filtering Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 SAF-MCC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Active Noise Control Based on Spline Filter . . . . . . . . . . . 4.4.2 Echo Cancellation Based on Spline Filter . . . . . . . . . . . . . 4.5 Computer Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Basic Spline Filter Algorithm Simulation . . . . . . . . . . . . . 4.5.2 SAF-MCC Algorithm Simulation . . . . . . . . . . . . . . . . . . . 4.5.3 Performance Analysis Simulation . . . . . . . . . . . . . . . . . . . 4.5.4 Simulation of ANC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.5 Simulation of Echo Cancellation . . . . . . . . . . . . . . . . . . . . 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

163 163 164 164 166 172 172 175 183 183 186 189 189 195 198 201 204 207 207

5

Kernel Adaptive Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Kernel Adaptive Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Reproducing Kernel Hilbert Space . . . . . . . . . . . . . . . . . . 5.2.2 Kernel Least Mean Square . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Kernel Affine Projection Algorithms . . . . . . . . . . . . . . . . . 5.2.4 Kernel Recursive Least Squares . . . . . . . . . . . . . . . . . . . . 5.3 Network Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Sparsification Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Quantization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Kernel Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Computer Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Comparisons of Different KAFs . . . . . . . . . . . . . . . . . . . . 5.4.2 Comparisons of Network Optimization Methods . . . . . . . . 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

209 209 210 210 213 221 224 227 227 232 241 245 245 248 253 254

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

Abbreviations and Acronyms

AEC ALD ANC AP APA BIBO BER CFsLMS CRF-LMS CRF-FxMCC ChNN DIV DIV-LLMP DQ DV DV-LLMP EMSE FIR FLANN FFLANN FLANN-NLMS FcLMS FcGMCC FLOM FsLMP FsLMS FsMCC FsqLMP FxLMS

Acoustic echo cancellation Approximate linear dependency Active noise control Affine projection Affine projection algorithm Bounded-input bounded-output Bit error rate Convex combined FsLMS algorithm Cascaded random Fourier least mean square algorithm Cascaded random Fourier filtered-x maximum correntropy criterion algorithm Chebyshev neural network Diffusion interpolated Volterra Diffusion interpolated Volterra logarithm least mean p-norm Density-dependent vector quantization Diffusion Volterra Diffusion Volterra logarithm least mean p-norm Theoretical mean square error finite impulse response Functional link artificial neural network Feedback functional link neural network FLANN-based normalized least mean square algorithm Filtered-c least mean square Filtered-c generalized maximum correntropy criterion Fractional lower order moments Filtered-s least mean p-power algorithm Filtered-s least mean square algorithm Filtered maximum correntropy criterion algorithm Filtered-s q-gradient least mean p-power algorithm Filtered-x least mean square algorithm xv

xvi

IIR IIR-SAF LeNN IVF IVFF-RLLMP KAF KAPA KF KLMS KRLS LMS LUT MCC MLP MMSE MSE NC NG NLAEC NLMS NLMP NN PDF PRNN PRQ QKLMS RBF RFF RFFLANN RF-FxMCC RF-LMS RFsLMS RKHS RLLMP RLS SAF SAF-LMS SAF-MCC SAF-NLMS SAF-SNLMS SAF-VSS-SNLMS SF

Abbreviations and Acronyms

Infinite impulse response IIR spline adaptive filter Legendre neural network Interpolated Volterra filter Improved variable forgetting factor recursive logarithm least mean p-norm Kernel adaptive Filter Kernel affine projection algorithm Kalman filtering Kernel least mean square Kernel recursive least square Least mean square Look-up table Maximum correntropy criterion Multilayer perceptron Minimum mean square error Mean square error Novelty criterion Non-Gaussian Nonlinear acoustic echo cancellation Normalized least-mean-square Normalized version of the LMP Neural network Probability density function Pipelined recurrent neural network Probability density rank-based quantization Quantized kernel least mean square Radial basis function Random Fourier feature Reduced feedback functional link neural network Random Fourier filtered-x maximum correntropy criterion algorithm Random Fourier least mean square algorithm Robust filtered-s least mean square algorithm Reproducing kernel Hilbert space Recursive logarithm least mean p-norm Recursive least square Spline adaptive filter Spline adaptive filter with LMS Spline adaptive filter with MCC Spline adaptive filter with NLMS Spline adaptive filter with sign NLMS Spline adaptive filter with variable step-size SNLMS Subband filtering

Abbreviations and Acronyms

SNR SOV SαS VFF-RLS VQ VQIT AEC ALD ANC AP APA CFsLMS CRF-FxMCC CRF-LMS DIV DIV-LLMP DQ

xvii

Signal-to-noise ratio Second-order Volterra Symmetric α-stable distribution Variable forgetting factor recursive least square Vector quantization Vector quantization using information theoretic learning Acoustic echo cancellation Approximate linear dependency Active noise control Affine projection Affine projection algorithm Convex combined FsLMS algorithm Cascaded random Fourier filtered-x maximum correntropy criterion algorithm Cascaded random Fourier least mean square algorithm Diffusion interpolated Volterra Diffusion interpolated Volterra logarithm least mean p-norm Density-dependent vector quantization

Chapter 1

Adaptive Filter

1.1

Introduction

In this chapter, conventional linear filters and adaptive filtering algorithms are mainly introduced, including LMS, RLS, AP algorithms, subband filtering, and Kalman filtering algorithms [1]. In addition, several classical nonlinear adaptive filters are briefly introduced, and their detailed descriptions are presented in the following chapters.

1.2 1.2.1

Linear Adaptive Filters LMS Algorithm

The LMS algorithm is the most widely used adaptive filtering algorithm, the main advantages of which are simple structure and low computational complexity [2]. Usually, adaptive filter consists of a transfer filter for processing the input signal and an algorithm unit for updating the transfer filter’s coefficients. A general structure of the adaptive filter is illustrated in Fig. 1.1. In Fig. 1.1, x(n) is the input signal; w(n) = [w0 (n), w1 (n), . . ., wL-1 (n)] is the filter coefficient vector w(n); L is the order of the transfer filter; d(n) is the desired signal; y(n) = wT(n)x(n) is the output signal of the filter; e(n) is the error signal, and given by e(n) = d(n) - y(n). The LMS algorithm is obtained by solving the following minimization mean square error (MSE) problem:   E e 2 ð nÞ min w

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Zhao, B. Chen, Efficient Nonlinear Adaptive Filters, https://doi.org/10.1007/978-3-031-20818-8_1

ð1:1Þ

1

2

1

Fig. 1.1 The structure of adaptive filter

Adaptive Filter d(n)

x(n)

y(n)

+

-



Transfer filter w(n)

e(n)

Adaptive algorithm

For the LMS algorithm [3–6], the filter coefficients can be updated as follows: wðnÞ = wðn - 1Þ þ μxðnÞeðnÞ

ð1:2Þ

where μ is the fixed step size. The normalized LMS (NLMS) algorithm is an improved LMS algorithm, which is developed to get faster convergence speed and lower steady-state misalignment. The NLMS algorithm can be described as follows [7]: wðnÞ = wðn - 1Þ þ μðnÞxðnÞeðnÞ μ μ ð nÞ = δ þ kxðnÞk2

ð1:3Þ ð1:4Þ

where δ is regularization parameter. To ensure the convergence of NLMS in the sense of mean square, the step size condition can be given by [8]: 0 < μðnÞ 0 for every n, it results that: jξðnÞj ≤

NX a -1

ðjai jM x Þ = αM x

i=0

By combining Eq. (3.52) and Eq. (3.53), finally can obtain that:

ð3:53Þ

100

3

jyðnÞj ≤

1 ðαM x + η + θÞ β

FLANN Adaptive Filter

ð3:54Þ

It is worth noting that the BIBO stability condition is essentially that of a recursive linear filter. The recursive FLANN filter is not affected by instabilities when the input signal has a finite amplitude, and the recursive linear part of the filter is stable. This behavior, due to the bounds of the trigonometric functions used in the input-output relationship, is in contrast, in general, with what happens for recursive polynomial filters where the input signals need to be constrained in well-defined ranges.

3.4

Convex Combination of FLANN Filter

To cope with the compromise between the speed of convergence and the steady-state error of FLANN filter, the convex combination scheme is adopted. Take the nonlinear active noise control system as an example. The convex combined active noise control system based on the FLANN filter is shown in Fig. 3.8. To achieve a good performance from the convex combination scheme, two adaptive FLANN filters with different step sizes are adapted individually. Moreover, the output signals of the component filters are convex combined by a mixing parameter in such a manner that the advantages of both FLANN filters are kept, i.e., the fast convergence speed from a large step size adaptive FLANN filter and the low steady-state error from the small step size adaptive FLANN filter [16]. As shown

Fig. 3.8 Block diagram of nonlinear ANC systems-based convex combination of two FLANN filters

3.4

Convex Combination of FLANN Filter

101

in Fig. 3.8, it is clearly shown that the output sy(n) of the nonlinear ANC system at the error microphone can be calculated using the convex combination form as follows: b ðnÞ  yðnÞ = A b ðnÞ  ½λðnÞy1 ðnÞ + ð1 - λðnÞÞy2 ðnÞ syðnÞ = A

ð3:55Þ

where y(n) = λ(n)y1(n) + (1 - λ(n))y2(n) is the output of adaptive combination FLANN filter and output yj(n)( j = 1, 2) of the jth adaptive FLANN controller is given as follows: yj ðnÞ = Wj T SðnÞ =

2p +1 X i=1

yj,i ðnÞ =

2p +1 X

w j T s ð nÞ

ð3:56Þ

i=1

and Wj(n) denotes the corresponding weight coefficients and is defined by: T  Wj ðnÞ = wj,1 ðnÞ, wj,2 ðnÞ, ⋯wj,2p + 1 ðnÞ

ð3:57Þ

The signal matrix S(n) generated by the trigonometric function expansion is given by:  T SðnÞ = s1 ðnÞ, s2 ðnÞ, ⋯s2p + 1 ðnÞ

ð3:58Þ

As for the combination scheme, the mixing parameter λ(n) is kept in the interval (0, 1) by the definition via a Sigmoid activation function as: λ ð nÞ =

1 1 + exp ½ - aðnÞ

ð3:59Þ

Notice that λ(n) increases monotonically with a(n) and lies between 0 and 1. It is also obtained the convex combination W(n) of the weights of the overall filter WðnÞ = λðnÞW1 ðnÞ + ð1 - λðnÞÞW2 ðnÞ

ð3:60Þ

Thus, the filter bank implementation of W(n) is expressed as: Wi ðnÞ = λðnÞW1,i ðnÞ + ð1 - λðnÞÞW2,i ðnÞ, i = 1, 2, . . . , 2P + 1

ð3:61Þ

Hence, according to the FxLMS algorithm, the cost function of the convex combined FsLMS algorithm (CFsLMS) is derived as follows. First, the cost function is given as:

102

3

FLANN Adaptive Filter

  J ðnÞ = E e2 ðnÞ

ð3:62Þ

Taking the derivation of the above function yields: ∂J ðnÞ ∂syðnÞ = - 2eðnÞ ∂WðnÞ ∂WðnÞ

ð3:63Þ

where the overall error at time n is defined as: eðnÞ = dðnÞ - syðnÞ

ð3:64Þ

Subsequently, a different condition of the secondary path is discussed. When the secondary path is linear, the element in Eq. (3.64) is written as: ∂syðnÞ ∂syðnÞ ∂yðnÞ = = Alinear ðnÞ  SðnÞ ∂WðnÞ ∂yðnÞ ∂WðnÞ

ð3:65Þ

While the secondary path is nonlinear, the output becomes a nonlinear function, which is described as: syðnÞ = f ½yðnÞ, yðn - 1Þ, . . . , yðn - m + 1ÞT

ð3:66Þ

As a consequence, the element in Eq. (3.64) is given as: m -1 X ∂syðnÞ ∂syðnÞ ∂yðn - kÞ = ∂WðnÞ ∂y ðn - kÞ ∂WðnÞ k=1

ð3:67Þ

Assume that the convex combination W(n) is slowly varying for small step sizes, so that: ∂yðn - kÞ ∂yðn - k Þ ≈ ∂WðnÞ ∂Wðn - kÞ

ð3:68Þ

Defining the nonlinear part as follows: 

∂syðnÞ ∂syðnÞ ∂syðnÞ , , ..., ANonlinear ðnÞ = ∂yðnÞ ∂yðn - 1Þ ∂yðn - m + 1Þ The virtual secondary path is introduced and can get:

T ð3:69Þ

3.4

Convex Combination of FLANN Filter

∂syðnÞ = ANonlinear ðnÞ  SðnÞ ∂WðnÞ

103

ð3:70Þ

The weight of each combined FLANN filter is individually adjusted by the FsLMS algorithm with its error and step size. And the update rule of the consisted filter is given as: W j ðn + 1Þ = W j ðnÞ + μj ej ðnÞS0 ðnÞ, j = 1, 2

ð3:71Þ

where the error of each component filter is computed as: ( ej ðnÞ =

dðnÞ - Alinear ðnÞ  yj ðnÞ, when the secondary path is linear dðnÞ - ANonlinear ðnÞ  yj ðnÞ, when the secondary path is nonlinear ð3:72Þ

and the filtered signal matrix S′(n) is calculated by: T  S0 ðnÞ = s01 ðnÞ, s02 ðnÞ, . . . , s02P + 1 ðnÞ

ð3:73Þ

Without loss of generality, it is assumed that μ1 > μ2. As compared with the coefficient W2,i(n), W1,i(n) have a faster speed of convergence but a larger steadystate MSE. On condition that the fast FLANN filter significantly outperforms the slow one, the weight update can be modified to improve the performance of CNFSLMS algorithm as follows:   W 2,i ðn + 1Þ = W 2,i ðnÞ + α μj ej ðnÞS0 ðnÞ + ð1 - αÞW 1,i ðn + 1Þ, i = 1, 2, ⋯, 2P + 1 ð3:74Þ where the parameter α (0 < α < 1) is close to 1. For the particular case of the FLANN combination scheme, it is the key factor how to adapt the mixing parameter α(n) so that the error of the overall filter can be minimized. It is obvious that the error e(n) of the overall filter can be calculated using the convex combination from eðnÞ = λðnÞe1 ðnÞ + ½1 - λðnÞe2 ðnÞ

ð3:75Þ

According to the stochastic gradient descent rule, the variable parameter α(n) is adapted by:

104

3

αðn + 1Þ = αðnÞ - μa

FLANN Adaptive Filter

∇J ðnÞ 2

ð3:76Þ

where the step size μa is used to keep λ(n) in the interval [0,1] and must be fixed to a very high value, with the result that the convex combination filter is adapted even faster than the fast FLANN filter. However, a disadvantage of this scheme is that α(n) stops updating whenever λ(n) is too close to 0 or 1. To deal with the drawback, the values of α(n) can be limited to the interval [-4, 4] [17]. In Eq. (3.76), ∇J(n) denotes the gradient estimator and is calculated by ∇J(n) = 2e(n)[e1(n) - e2(n)]λ(n)[1 - λ(n)]. Then, the update equation of mixing parameter α(n) is expressed as: αðn + 1Þ = αðnÞ - μa eðnÞ½e1 ðnÞ - e2 ðnÞλðnÞ½1 - λðnÞ

ð3:77Þ

The selection of the step size plays a crucial role in obtaining appropriate filter behavior. Thus, a normalized form is calculated as follows: αð n + 1 Þ = α ð n Þ -

μa eðnÞ½e1 ðnÞ - e2 ðnÞλðnÞ½1 - λðnÞ δ + r ð nÞ

ð3:78Þ

By using this update rule, the selection of μa is not affected by the signal-to-noise (SNR). Moreover, to further improve performance, a low-pass filtered estimation r(n) is used instead of the instantaneous value of [e1(n) - e2(n)]2 as follows: rðn + 1Þ = βrðnÞ = ð1 - βÞ½e1 ðnÞ - e2 ðnÞ2

ð3:79Þ

where the parameter β is a constant close to 1. Inevitably, the CNFSLMS algorithm needs calculation of the exponential function, which would result in heavy computational complexity. Therefore, to solve the problem, we use the modified Versorial function to replace the sigmoid function and come up with a modified CNFSLMS (MCNFSLMS) algorithm in the following parts. The modified Versorial function given in [18] is expressed as follows: 8 1 > > < 1 - Ba2 ðnÞ + 2 aðnÞ ≥ 0 λðnÞ = 1 > > að nÞ < 0 : 2 Ba ðnÞ + 2

ð3:80Þ

where B is a parameter to adjust the curve shape. Moreover, to guarantee convergence, it requires that B is larger than 1. To select the appropriate value of the parameter B in Eq. (3.80), the λ(n) curve with different B values (B = 1, 2, 3, 4, and 5) is plotted in Fig. 3.9a. It is observed

3.4

Convex Combination of FLANN Filter

105

Fig. 3.9 (a) The modified Versorial function with different B values. (b) The NMSE of the MCNFSLMS algorithm with different B values

106

3

FLANN Adaptive Filter

Table 3.1 Summary of the MCNFSLMS algorithm

that the modified Versorial function with B = 2 is the best approximation of the Sigmoid function. Figure 3.9b describes the normalized mean square error (NMSE) of the MCNFSLMS algorithm with different B values (B = 1, 2, 3, 4, and 5) in a nonlinear ANC case which is with a secondary path transfer function with minimum phase. It has shown that the performance of the MCNFSLMS algorithm with B = 2 outperforms other cases. The pseudo-code for the MCNFSLMS algorithm is listed in Table 3.1.

3.5

Random Fourier Filter

3.5

107

Random Fourier Filter

In above contents, we have introduced several nonlinear filters that based on the FLANN, which uses sines and cosines as basic functions. In this part, we continue to describe another black-box modeling technique. Different with the FLANN, we use random Fourier expansions (RFEs) to model the unknown function, because they offer a unique mix of computational efficiency, theoretical guarantees, and ease of use that make them ideal for online processing.

3.5.1

Random Fourier Feature

General neural networks are more expressive than random Fourier features; they are difficult to use and come without theoretical guarantees. Standard kernel methods suffer from high computational complexity because the number of kernels equals the number of measurements. RFEs have been originally introduced to reduce the computational burden that comes with kernel methods [19]. The basic scheme for the random Fourier feature filter is shown as Fig. 3.10 The random Fourier feature filter. Assume that we provided N scalar measurements yi taken at measurement points xi 2 ℝd as well as a kernel κ(xi, xj) that, in a certain sense, measures the closeness of two measurement points. To train the kernel expansion: f ð xÞ =

N X

ai κ ðx, xi Þ

ð3:81Þ

i=1

a linear system involving the kernel matrix [κ(xi, xj)]i, j has to be solved for the coefficients ai. The computational costs of training and evaluating grow linearly in the number of data points N, respectively. This can be prohibitive for large values of

Fig. 3.10 The random Fourier feature filter

108

3

FLANN Adaptive Filter

N. We now explain how RFEs can be used to reduce the complexity. Assuming the kernel k is shift-invariant and has Fourier transform p, it can be normalized such that p is a probability distribution [20]. That is, we have:   κ xi , x j =

Z

T pðωÞe - iw ðxi - xj Þ dω

ð3:82Þ

We will use several trigonometric properties and the fact that k is real to continue the derivation. This gives:   κ xi , xj =

Z Z

= =

1 2π

T pðωÞe - iw ðxi - xj Þ dω

   pðωÞ cos wT xi - xj + pðωÞ Z

Z



    cos wT xi - xj + 2b dbdω

0

Z



pðωÞ

       cos wT xi - xj + cos wT xi - xj + 2b dbdω

0

Z Z 2π     1 pðωÞ 2 cos wT xi + b cos wT xj + b dbdω = 2π   0    = E 2 cos ΩT xi + B cos ΩT xj + B D     2 X ≈ cos ωk T xi + bk cos ωk T xj + bk D k=1 ð3:83Þ where ωk is vector in which elements are independent samples of the random variable Ω with probability distribution function (PDF) p, and bk 2 [0, 2π] are independent samples of the random variable B with a uniform distribution. N P 2 T For ck = D ai cos ðωk xi + bk Þ, we thus have: i=1

f ð xÞ =

N X

ai κ ðx, xi Þ ≈

i=1

D X

  ck cos ωk T xi + bk

ð3:84Þ

k=1

It is noted that the number of coefficients D is now independent of the number of measurements N. This is especially advantageous in online applications where the number of measurements N keeps increasing.

3.5.2

RF-LMS Algorithm

The block diagram of the random Fourier filter is shown in Fig. 3.11. The random Fourier filters can simplify the filter iteration by converting the implicit kernel method apparent. The input vector is x(n) = [x(n), x(n - 1), . . ., x(n - L + 1)]T,

3.5

Random Fourier Filter

109

Fig. 3.11 The block diagram of single-channel NANC system based on random Fourier filter

which will be expanded through the RF(ωk, φk) module. Then, after passing through the cosine module, the D-dimensional random Fourier feature vector can be obtained as [21]: Ζð x ð n Þ Þ = ½ ζ 1 , ζ 2 , . . . , ζ D  T ,

ð3:85Þ

where ζ i = cos (ωix(n) + φi), i = 1, 2, ⋯, D. ωk = [ω1, . . ., ωD] follows Gaussian distribution with zero mean and covariance matrix ε2I, which is denoted by ω~N(0, ε2I), and φk, k = 1, . . ., D is sampled from the uniform distribution [0, 2π]. The input signal is expanded to its RFF and then injected into an adaptive filter. Considering the filter weight vector as w(n) = [w1(n), w2(n), ⋯, wD(n)], the output of the random Fourier filter is computed as: yðnÞ = wT ðnÞΖðxðnÞÞ:

ð3:86Þ

The residual signal, which is the superposition of the desired and output signal, can be given as: eðnÞ = d ðnÞ - yðnÞ,

ð3:87Þ

According to the stochastic gradient descent approach, the updating cost function of the random Fourier filter-based LMS algorithm is given as:

110

3

FLANN Adaptive Filter

n o 1 J w = E e 2 ð nÞ , 2

ð3:88Þ

Approximating the expectation by the current values, and following the steepest descent recursion, yields: wðn + 1Þ = wðnÞ - μ∇Jw :

ð3:89Þ

where ∇Jw is the gradient of the cost function with respect to the filter coefficient vector. Therefore, the update rule is obtained as: wðn + 1Þ = wðnÞ + μeðnÞΖðnÞ

3.5.3

ð3:90Þ

Cascaded RF-LMS (CRF-LMS) Algorithm

It has been verified that the RFF is a powerful tool to release the heavy computation of kernel mapping [22]. However, the projection dimension of RFF demands considerable calculations to ensure accuracy. To save the computation cost of the random Fourier filter without degrading its performance, a novel model is designed in this part. The block diagram of the cascaded random Fourier filter-based nonlinear system identification model is shown in Fig. 3.12. The cascaded model can be divided into the main nonlinear RFF filter wa = ½a1 , a2 , . . . , aDs T and the linear cascaded filter wb = [b1, b2, . . ., bM]T [23]. For the nonlinear module, the input vector x(n) = [x(n), x(n - 1), ⋯, x(n - Ls + 1)]T is projected to Ds dimensions by random Fourier transform (RFT), and the expanded vector is given as: ΖDs ðnÞ = ½z1 ðnÞ, z2 ðnÞ, . . . , zDs ðnÞT

ð3:91Þ

where zk(n) = cos (rkx(n) + ψ k), k = 1, 2, . . ., Ds. rk = ½r 1 , . . . , r Ls  follows Gaussian distribution with zero mean and covariance matrix ε2I which is denoted by r~N(0, ε2I), and ψ k, k = 1, . . ., Ds is sampled from the uniform distribution [0, 2π]. Accordingly, the output of the nonlinear module is obtained as: yx ð n Þ =

Ds X

ak ðnÞ cos ðrk xðnÞ + ψ k Þ

k=1

where ak(n), k = 1, 2, . . ., Ds is the update coefficients of wa(n).

ð3:92Þ

3.5

Random Fourier Filter

111

Fig. 3.12 Block diagram of the cascaded feedforward RFF system

The cascaded scheme reduces calculations by shortening the memory length of the reference signal and compensating the performance by the linear cascaded filter. The current random Fourier filter output yx(n) is delayed and employed to the auxiliary linear filter. Thus, the overall output of the designed filter can be calculated as: yc ðnÞ = yx ðnÞ +

M X

bl ðnÞyx ðn - l + 1Þ

ð3:93Þ

l=1

where bl(n), l = 1, 2, . . ., M is the update coefficients of the linear cascaded filter wb. The residual signal of the presented system is computed as: ec ðnÞ = d ðnÞ - yc ðnÞ

ð3:94Þ

Taking the cost function J = 12 ec 2 ðnÞ , the gradient estimators of the random Fourier filter and the linear filter can be defined apart as: 8 h ∂ec ðnÞ > < ∇J wa = ∂a , 1 ðnÞ h > : ∇J w = ∂ec ðnÞ , b ∂b1 ðnÞ

∂ec ðnÞ , ∂a2 ðnÞ

...,

∂ec ðnÞ ∂aDS ðnÞ

∂ec ðnÞ , ∂b2 ðnÞ

...,

∂ec ðnÞ ∂bM ðnÞ

iT

iT

According to Eq. (3.94), the error gradients can be further expressed as:

ð3:95Þ

112

3

FLANN Adaptive Filter

8 h iT ∂yc ðnÞ ∂yc ðnÞ > < ∇J wa = - ∂a ⋯ ∂a ð n Þ ð n Þ 1 DS h iT > : ∇J w = - ∂yc ðnÞ ⋯ ∂yc ðnÞ b ∂b1 ðnÞ ∂bM ðnÞ

ð3:96Þ

From Eq. (3.93), the partial derivative above can be defined as: yak ðnÞ =

∂yc ðnÞ ∂ak ðnÞ

= ð1 + b1 ðnÞÞ cos ðrk xðnÞ + ψ k Þ +

M X l=2

bl ð nÞ

∂yx ðn - l + 1Þ ,k = 1,2, . . . ,Ds ∂ak ðnÞ ð3:97Þ

ybm ðnÞ =

∂yc ðnÞ = yx ðn - m + 1Þ,m = 1,2, . . . ,M ∂bm ðnÞ

ð3:98Þ

According to [24], assuming that the step size is small enough, the approximation is made as: ∂yx ðn - l + 1Þ ∂yx ðn - l + 1Þ ≈ : ∂ak ðnÞ ∂ak ðn - l + 1Þ

ð3:99Þ

Therefore, Eq. (3.97) can be modified as: yak ðnÞ =

∂yc ðnÞ ∂ak ðnÞ

≈ ð1 + b1 ðnÞÞ cos ðrk xðnÞ + ψ k Þ +

M X

bl ðnÞyak ðn - l + 1Þ,k

= 1,2, . . . ,Ds

l=2

ð3:100Þ Substituting Eq. (3.97) and Eq. (3.98) into Eq. (3.96), the gradient estimators can be rewritten as: (

 T ∇J wa ≈ - ya1 ðnÞ ya2 ðnÞ⋯yaDs ðnÞ  T ∇J wb ≈ - yb1 ðnÞ yb2 ðnÞ⋯ybM ðnÞ

ð3:101Þ

Hence, the cascaded random Fourier least mean square (CRF-LMS) algorithm can be derived as:

3.5

Random Fourier Filter

113

(  T wa ðn + 1Þ = wa ðnÞ + μa ðnÞec ðnÞ ya1 ðnÞ⋯yaDs ðnÞ  T wb ðn + 1Þ = wb ðnÞ + μb ec ðnÞ yb1 ðnÞ⋯ybM ðnÞ

ð3:102Þ

Due to the regression nature of yak(n), once the length of linear auxiliary filter M is large, each iteration will require a lot of storage and computing. According to the assumption in [25], we assume that the recursion based on the past output gradients is negligible, that is the partial derivative of yc(n) with respect to ak(n) in time n is unrelated with gradients of previous times. Based on the above assumption, the gradient operator is obtained as: 8    >

: ∇J wb ≈ - ec ðnÞ½yx ðnÞ yx ðn - 1Þ⋯ yx ðn - M + 1Þ

, ð3:103Þ

Then, based on the stochastic gradient method, as a consequence, the cascaded random Fourier LMS algorithm is given as:  wa ðn + 1Þ = wa ðnÞ + μa ðnÞec ðnÞZDs ðnÞ wb ðn + 1Þ = wb ðnÞ + μb ec ðnÞYðnÞ

ð3:104Þ

where μa(n) = μa(1 + b1(n)) and μa and μb are the step sizes for adaptive filters. ZDs ðnÞ = ½z1 ðnÞ z2 ðnÞ⋯zDs ðnÞT and Y(n) is attained by the delayed current output of wa(n), which is represented as: YðnÞ = ½yx ðnÞ yx ðn - 1Þ⋯yx ðn - M + 1ÞT

ð3:105Þ

The summary of CRF-FxLMS algorithm is given in Table 3.2.

3.5.4

Mean Convergence Analysis

According to Eq. (3.104), the optimal weight vector of the nonlinear RFF filter is assumed as wA , and the linear auxiliary filter is wB . The aberration between the weight vector and the optimal weight vector is calculated as:  e A ðnÞ = wA ðnÞ - wA ðnÞ w e B ðnÞ = wB - wb ðnÞ w

:

ð3:106Þ

After subtracting both sides of Eq. (3.106) from the optimal weight vectors, the following equation is obtained:

114

3

FLANN Adaptive Filter

Table 3.2 Summary of CRF-FxLMS algorithm

 e A ðnÞ - μa ðnÞec ðnÞZDs ðnÞ e A ð n + 1Þ = w w e B ðnÞ - μb ec ðnÞYðnÞ e B ð n + 1Þ = w w

ð3:107Þ

e e of the squared Euclidean Continue to assume h E a ðnÞ,i Eb ðnÞ as the h expectation i 2 e a ðnÞ=E kw e b ðnÞ=E kw e a ð nÞ k , E e b ðnÞk2 , Eq. (3.107) can be written as: norm, i.e.,E   ea ðnÞ + E μa 2 ðnÞe2 c ðnÞZT Ds ðnÞZDs ðnÞ e a ð n + 1Þ = Ε E

w A ð nÞ - 2E μa ðnÞec ðnÞZT Ds ðnÞe h i e b ðnÞ + μb 2 E ½ec ðnÞ2 YT ðnÞYðnÞ e b ð n + 1Þ = E E   - 2μb E ec ðnÞYT ðnÞe wb ðnÞ

ð3:108Þ

ð3:109Þ

e j ðn + 1Þ, j = a, b should be less than To guarantee the stability and convergence, Ε e e Ta ðnÞZDs ðnÞ, and Ε j ðnÞ, j = a, b. Considering ec(n) = ea(n) + eb(n), ea ðnÞ = w T e b ðnÞYðnÞ, we finally obtain the bounds of the step size as follows: e b ð nÞ = w 0 < μa
> df ð0Þ : , x=0 dx

ð3:121Þ

where q is a normal number that is not equal to 1, and when q → 1, the q derivative becomes a regular derivative. Let f(x) = xn, Eq. (3.121) becomes 8 n < q - 1 xn - 1 , q ≠ 1 Dq,x xn = q - 1 : n-1 nx , q=1

ð3:122Þ

Extending this concept to n dimensional variables, the q derivative of f(x) is defined as: T  ∇q,x f ðxÞ = Dq1 ,x1 f ðxÞ, Dq2 ,x2 f ðxÞ, . . . , Dqn ,xn f ðxÞ

ð3:123Þ

where q = [q1, q2, . . ., qn]T, x = [x1, x2, . . ., xn]T. Based on the above definition, the q derivative of Eq. (3.123) with respect to e(n) = [e(n), e(n), . . ., e(n)]T is obtained as:  T ∇q,eðnÞ J ðwÞ = Dq1 ,eðnÞ J ðwÞ, Dq2 ,eðnÞ J ðwÞ, . . . , Dqn ,eðnÞ J ðwÞ 8 p p < jqeðnÞj - jeðnÞj , q≠1 p qeðnÞ - eðnÞ Dq,eðnÞ jeðnÞj = : pjeðnÞjp - 1 signðeðnÞÞ, q = 1

ð3:124Þ ð3:125Þ

To visually see the property of the q-gradient function, we plot the function Dq,e(n)je(n)jp for different p and q in Fig. 3.13. The following conclusions can be observed from Fig. 3.13: 1. For the same value of q, the larger value of p, the steeper the gradient is. 2. For the same value of p, a large value of q indicates a steeper gradient. Specially, for q = 1 and p = 2, the q-gradient reduces to the similar form of the MSE-based algorithm. As can be seen, the value of the gradient Dq, e(n)je(n)jp with q ≠ 1 is larger than that of the gradient of the value q = 1 and p = 2. Therefore, such steeper gradient may lead to improved performance of the algorithm.

3.6

Nonlinear Active Noise Control

119

Fig. 3.13 (a) The gradients of Dq, e(n)je(n)jp ( p = 1, 1.5, and 2). (b) The gradients of Dq, e(n)je(n)|p ( p = 2, 3, and 4)

120

3

FLANN Adaptive Filter

Using q derivative descent method, the filter weight coefficient of FsqLMP algorithm is updated as: μ ∇ J ðw Þ 2 q,w

ð3:126Þ

  ∇q,w J ðwÞ = - 2 QFsqLMP X ðnÞ

ð3:127Þ

wðn + 1Þ = wðnÞ where

QFsqLMP is the diagonal matrix in FsqLMP algorithm, which is expressed as: QFsqLMP 2 3 p p p p jqð2P + 1ÞN eðnÞjp - jeðnÞjp e ð n Þj ð n Þj e ð n Þj ð n Þj jq jq je je 1 2 i5 =4 , , ..., h p½q1 eðnÞ - eðnÞ p½q2 eðnÞ - eðnÞ p qð2P + 1ÞN eðnÞ - eðnÞ ð3:128Þ Substituting Eq. (3.128) and Eq. (3.125) into Eq. (3.126), the filter weight updating formula of FsqLMP algorithm is obtained as: wðn + 1Þ = wðnÞ + μQFsqLMP X ðnÞ

ð3:129Þ

where μ is the step size, X(n) = s(n)x(n), x(n) is given as Eq. (3.113) (Fig. 3.14).

Fig. 3.14 The block diagram using trigonometric expansion for NANC

3.6

Nonlinear Active Noise Control

3.6.1.3

121

RFsLMS Algorithm

To improve the stability of the system under the alpha-stable distributed noise environment and speed up the algorithm convergence and improve the noise elimination ability, the robust filtered-s LMS (RFsLMS) algorithm has been proposed [30]. Similar to the FsLMS algorithm, the nonlinear controller of the RFsLMS algorithm also uses FLANN structure modeling. Observe the weight coefficient update formula of the FsLMS algorithm indicated in the formula. When the error signal is large, such as in an impact noise environment, the FsLMS algorithm may diverge (Fig. 3.15). Therefore, in the RFsLMS algorithm, the cost function is changed to:   e 2 ð nÞ ξðwÞ = E log 1 + 2 2σ ðnÞ

ð3:130Þ

In the above formula, it is the estimated variance of the error signal, which can be obtained through the sliding estimation window, expressed as:

Fig. 3.15 The FLANN-based robust nonlinear ANC system trained using RFsLMS algorithm

122

3

FLANN Adaptive Filter

σ 2 ðn + 1Þ = σ 2 ðnÞ + Δm2e ðn + 1Þ ð3:131Þ 1 1 + ðeðn + 1Þ - me ðn + 1ÞÞ2 - ðeðn - H + 1Þ - me ðn + 1ÞÞ2 H H Among them me ðn + 1Þ = me ðnÞ + Δme ðn + 1Þ 1 ð e ð n + 1 Þ - e ð n - H + 1Þ Þ H

Δme ðn + 1Þ =

ð3:132Þ ð3:133Þ

where H is the estimated window length. Using the window function method, the weight coefficient update formula of the RFsLMS algorithm is: wðn + 1Þ = wðnÞ +

μ ∇ðnÞ 2

ð3:134Þ

∇(n) represents the instantaneous estimate of the cost function ξ, and the derivation process is as follows:

∇ ð nÞ =

n

∂ log 1 +

o

e2 ðnÞ 2σ 2 ðnÞ

∂wðnÞ



 e ð nÞ = -2 2 X ð nÞ e ðnÞ + 2σ 2 ðnÞ

ð3:135Þ

Therefore, the weight coefficient update formula of the RFsLMS algorithm is: 

 e ð nÞ X ð nÞ wðn + 1Þ = wðnÞ + 2 e ðnÞ + 2σ 2 ðnÞ

ð3:136Þ

The RFsLMS algorithm uses a function of the error for weight update instead of the direct use of error signal employed in FsLMS algorithm. Figure 3.16 shows the transformation function employed in RFsLMS algorithm with σ 2 = 1. It can be observed that for larger values of e(n), the weight update is small and thus the algorithm is stable. The performance is also improved by the presence of the variance term in the denominator, which tends to a small value for non-impulsive samples. Further, the impact of high-amplitude impulses appearing in the reference signal has been significantly reduced by trigonometric expansion, which limits the strength of the expanded reference signal samples to [-1, 1]. However, the terms which appear directly in the expanded signal vector X(n) could affect the performance for very strong disturbances. But such a situation can also be avoided if an adaptive threshold scheme as suggested in [31] is applied.

3.6

Nonlinear Active Noise Control

123

Fig. 3.16 Schematic diagram of the transformation in the RFsLMS algorithm [for σ 2 = 1]

3.6.1.4

FsMCC Algorithm

As an information theoretic parameter, correntropy is a strong tool to develop robust adaptive algorithms. In this part, the filtered-s maximum correntropy criterion (FsMCC) algorithm for NANC system with FLANN filter is described. It has been an efficient control algorithm for heavy tail non-Gaussian noise in account of a robust similarity measure named correntropy [32]. The correntropy is the correlation between two random variables in a small neighborhood, which is described as: ð3:137Þ

V ðα, βÞ = E ½κ ðα, βÞ

where κ(α, β) is a Mercer’s kernel. The kernel considered in this part is the normalized Gaussian kernel given by: 1 κ ðα, βÞ = pffiffiffiffiffi exp σ 2π

-

ðα - β Þ2 2σ 2

ð3:138Þ

124

3

FLANN Adaptive Filter

Using the correntropy as a measure of the correlation between the d(n) and y(n). Thus the objective function that needs to be maximized to achieve the filter design may be written as: ζ ðwÞ =

n X



exp

i=0

dðnÞ - wðnÞXT ðnÞ 2σ 2

2 ! ð3:139Þ

The weight update rule using a gradient ascent approach may be written as: wðn + 1Þ = wðnÞ -

μ ∇ζ ðwÞ 2

ð3:140Þ

where μ is the step size and ∇ζ(w) denotes the gradient of the cost function nn with respect to the weight vector. Using Eq. (3.140) and Eq. (3.139), the update rule may be approximated as: wðn + 1Þ = wðnÞ + μ exp

e 2 ð nÞ eðnÞXðnÞ 2σ 2

ð3:141Þ

where μ is the step size, X(n) = s(n)x(n), and x(n) is given as Eq. (3.113).

3.6.1.5

RFF-FxMCC Algorithm

The block diagram of the random Fourier filter-based NANC system is shown in Fig. 3.17. The random Fourier filters can simplify the filter iteration by converting the implicit kernel method apparent. P(z) and S(z) represent primary noise path and secondary path, respectively. For the random Fourier filter-based NANC system, the nonlinear relation is explicit. The input vector x(n) = [x(n) x(n - 1)⋯x(n - L + 1)]T is calculated by the module RF(ωk, φk) directly, and after passing through the cosine module, the D-dimensional expansion vector is obtained as [30]: ΖðxðnÞÞ = ½ζ 1 ζ 2 ⋯ζ D T ,

ð3:142Þ

where ζ i = cos (ωix(n) + φi), i = 1, 2, ⋯, D. ωk = [ω1, . . ., ωD] follows Gaussian distribution with zero mean and covariance matrix ε2I which is denoted by ω~N(0, ε2I), and φk, k = 1, . . ., D is sampled from the uniform distribution [0, 2π]. The input signal is expanded to its RFF and then injected into an adaptive filter. Considering the filter weight vector as w(n) = [w1(n), w2(n), ⋯, wD(n)], the output of the random Fourier filter is computed as: yðnÞ = wT ðnÞΖðxðnÞÞ:

ð3:143Þ

3.6

Nonlinear Active Noise Control

125

Fig. 3.17 The block diagram of single-channel NANC system based on random Fourier filter

The residual signal, which is the superposition of the desired and output signal, can be given as: eðnÞ = dðnÞ - yðnÞ  sðnÞ

ð3:144Þ

where  denotes the linear convolution calculation and s(n) is the impulse response of the secondary path S(z). Through the b SðzÞ module modeled, the expanded input signal vector is filtered as h iT b b b ζ D , and each element at the nth iteration can be computed by: Ζs ðnÞ = ζ 1 ζ 2 ⋯b

b ζ i ð nÞ =

b N -1 X j=0

bsj ζ i - j , i = 1,2,⋯,D,

ð3:145Þ

where ζ k = 0, k ≤ 0 is set to supplement the convolution. The RFF-MCC algorithm was proposed in the basic of the MCC [20]. The current error e(k) is considered in MCC as the cost function, i.e.: J ðkÞ = exp

ðe2 ðk ÞÞ 2σ 2

ð3:146Þ

126

3

FLANN Adaptive Filter

Approximating the expectation by the current values, and following the steepest descent recursion, yields: wðn + 1Þ = wðnÞ - μ∇J:

ð3:147Þ

where ∇J is the gradient of the cost function with respect to the filter coefficient vector. By applying a gradient ascent method, the weight vector of RFF-FxMCC is therefore updated recursively, i.e.: ∂J ðkÞ ∂wð nÞ e2 ðk Þ b s ðnÞ, eðk ÞΖ = wðnÞ + η exp 2σ 2

wðn + 1Þ = wðnÞ + η

ð3:148Þ

b s ðnÞ is obtained by passing the RFF of input where η is the unified step size and Ζ vector through the modeled secondary path. In a special case, the RFF-FxMCC algorithm can be degenerated into the RFF-FxLMS algorithm while

 e2 ðk Þ exp - 2σ 2 = 1.

3.7 3.7.1

Nonlinear Channel Equalization Communication Channel Equalization

Figure 3.18 shows the schematic of a wireless digital communication system with an equalizer at the front end of the receiver. The symbol {tk} denotes a sequence of T-spaced complex symbols of an L-QAM constellation in which both in-phase

Fig. 3.18 Schematic diagram of a wireless digital communication system with a channel equalizer

3.7

Nonlinear Channel Equalization

127

component {tk,I} and component {tk,Q} take one of the values pffiffiffi quadrature 

± 1, ± 2, . . . , ± L - 1 , where 1/T denotes the symbol rate and k denotes the discrete time index. In a 4-QAM constellation, the unmodulated information sequence {tk} is given by: t k = ± 1 ± j1

ð3:149Þ

where the symbols (1, -1) are assumed to be statistically independent and equiprobable. In Fig. 3.18, the combined effect of transmitter-side filter and wireless transmission medium is included in the “channel.” A widely used model for a linear dispersive channel is an FIR filter whose output at the kth instant is given by NP h -1 hi t k - i , where hi denotes the FIR filter weights and Nh denotes the filter ak = i=0

order. Considering the channel to be a nonlinear one, the “NL” block introduces channel nonlinearity to the filter output. The discrete output of the nonlinear channel is given by: bk = ψ f a k , a k - 1 , a k - 2 , . . . , a k - N h + 1 ; h 0 , h 1 , . . . , h N h - 1 g ,

ð3:150Þ

where ψ{} is a nonlinear function generated by the “NL” block. The channel output is assumed to be corrupted with an additive Gaussian noise qk with a variance of σ 2. The transmitted signal tk after being passed through the nonlinear channel and added with the additive noise arrives at the receiver, which is denoted by rk. The received signal at the k th time instant is given by rk = rk, I + jrk, Q, where rk, I and rk, Q are the in-phase and quadrature components, respectively. The purpose of equalizer attached at the receiver front end is to recover the transmitted sequence tk or its delayed version tk-τ, where τ is the propagation delay associated with the physical channel. In case of a linear channel, an adaptive equalizer (e.g., an adaptive FIR filter) can be used. During training period, the equalizer takes the corrupted sequence rk and its delayed versions as input and produces an output yk. With the knowledge of a desired (or target) output dk(dk = tk-τ), it updates the filter weights so as to minimize the error ek(ek = dk - yk), using an adaptive algorithm (e.g., LMS algorithm). After the completion of training, the weights are frozen, and subsequently these weights are used to estimate the transmitted sequence. In this study, since we consider nonlinear channel models, we used NNs as equalizer in place of adaptive filter.

3.7.2

Channel Equalization Using a Generalized NN Model

Figure 3.19 depicts a schematic diagram of channel equalization for 4-QAM signals using NN. The in-phase component rk,I and quadrature component rk,Q at kth instant are passed through a delay line to obtain the current and past signals. The current and

128

3

FLANN Adaptive Filter

Fig. 3.19 Schematic diagram of an NN-based channel equalizer

the delayed signal samples constitute the input signal vector to the equalizer and are given by Rk = [rk,I, rk,Q, rk-1, I, rk-1, Q, . . .]T = [r1, r2, r3, . . .]T. During training phase, at kth instant, Rk is applied to the NN, and the NN produces an output Yk = [yk,I, yk,Q]T. The NN output is compared thereafter with the desired output Dk = [dk,I, dk,Q]T to produce an error signal. The error signal is used in the BP algorithm to update the weights. The training process continues iteratively until the MSE reaches a predefined small value. Thereafter, the NN weights are frozen. During the test phase and actual use, the NN weights obtained after training are used for equalization purpose. In this study, we use three different NN structures, i.e., MLP, FLANN, and proposed LeNN, for equalization of nonlinear channels. In addition, for comparison purpose, we have simulated a linear FIR-based adaptive equalizer trained with LMS algorithm. In the case of 4-QAM signal constellation, the channel equalization becomes a four-class classification problem. The NN structures, basically, create nonlinear decision boundaries in the input space by generating a discriminant function to classify the received signal into one of the four categories.

3.7.3

FLNN Equalizer

The block diagram of the m-dimensional input FLNN equalizer without hidden layers is shown in Fig. 3.20, where the trigonometric functions, the use of Chebyshev and Legendre orthogonal polynomials, and other methods to dimensionally expand the input pattern to enhance its representation in high-dimensional space are discussed. In addition, the structure has lower computational complexity and

3.7

Nonlinear Channel Equalization

129

Fig. 3.20 The FLNN equalizer structure with an m-dimensional input

higher convergence speed compared to conventional neural networks. Here the extended function block is composed of a subset of orthogonal sin and cos basis functions and the original patterns and their outer products for modeling nonlinear channels. For example, consider a two-dimensional input pattern U(n) = [u1, u2]T = [x(n), x(n - 1)]T. Each element in this vector can be modeled by the trigonometric function X(n) = [1u1 cos (πu1) sin (πu1)⋯u2 cos (πu2)sin(πu2)⋯u1u2]T. The adaptive algorithm is easy to train the network and has low complexity due to the absence of hidden layers. The FLNN-based equalizer outperforms others such as neural network structures for linear and nonlinear channel models and the main advantage is to further reduce the computational cost. Though the FLNN have some advantages, such as simpler structure, faster convergence, and lower computational complexity, the nonlinear approximation capacity is limited due to only one nonlinear function tanh(.). That is, the adaptive FLNN equalizer can only deal with linear and mild nonlinear distortions. For severe nonlinear distortions, the performance of the FLNN equalizer has been very limited. To further improve the performance, one can enlarge the dimensionality of its input signal space. However, this will significantly result in increasing the number of nodes in the input layer and increasing complexity in practical implementation. Therefore, it is very necessary and important to seek a novel method for improving the nonlinear processing capability of FLNN.

130

3.7.3.1

3

FLANN Adaptive Filter

Adaptive Equalizer with FLNN Cascaded with Chebyshev Orthogonal Polynomial

It is well known that nonlinear approximation capacity of the Chebyshev orthogonal polynomial is very powerful by the best approximation theory. Combining the characteristics of the FLNN and Chebyshev orthogonal polynomial, a novel structure nonlinear adaptive equalizer-FLNNCPAE is depicted in Fig. 3.21. The FLNNCPAE utilizes the FLNN input pattern and the nonlinear approximation capabilities of Chebyshev orthogonal polynomial to further improve their nonlinear processing performance. The adaptive algorithm for the novel nonlinear adaptive equalizer is given as follows [33]. Due to better performance of the instantaneous error NLMS among various adaptive algorithms in existence, the coefficient vector W(k) and A(k) are adjusted by the lower complexity NLMS algorithm. Let e(k) = d(k) - y(k), and instantaneous error E(k) is written as follows: E ðk Þ =

1 ðeðk ÞÞ2 2

ð3:151Þ

We can get the following updating equations by the NLMS algorithm: Aðk + 1Þ = Aðk Þ - η1

∂E ðkÞ

 ∂AðkÞ 1 + kC ðkÞk2

W ðk + 1Þ = W ðkÞ - η2

∂Eðk Þ

 ∂W ðkÞ 1 + kX ðkÞk2

ð3:152Þ

Fig. 3.21 The diagram of functional link neural network cascaded with Chebyshev orthogonal polynomial adaptive equalizer

3.7

Nonlinear Channel Equalization

131

where η1 and η2 are positive step-size values of the updating equation and both are supposed to be a small positive real value. By conducting partial derivative on E(k), we can obtain the following equalities: ∂E ðkÞ = - eðkÞCðk Þ ∂AðkÞ

ð3:153Þ

∂EðkÞ ∂yðkÞ ∂zðk Þ = - eð k Þ ∂W ðkÞ ∂zðk Þ ∂W ðkÞ N2 X 4e - 2uðkÞ aj ðkÞg0j ðzðkÞÞ = - eðkÞX ðkÞ 2 ð1 + e - 2uðkÞ Þ j = 1

ð3:154Þ

It will be shown that coefficient’s adaptation can be divided into two parts of adaptive algorithms. The first is used to adapt the FLNN filter weight W(k), which is written by: W ðk + 1Þ = W ðk Þ + η2 eðkÞX ðkÞ ×

N2 X j=1

4e - 2uðkÞ 2 ð1 + e - 2uðkÞ Þ

aj ðkÞg0j ðzðkÞÞ

1 1 + kX ðk Þk2



ð3:155Þ

The second part is used to adapt the Chebyshev orthogonal polynomial coefficient A(k), which is given by: 1  Aðk + 1Þ = AðkÞ + η1 eðk ÞCðkÞ 1 + k C ð k Þk2

ð3:156Þ

Selection of the step size in a range 0 < η1, η1 < 1/tr(R), where the matrix R is the autocorrelation matrix of the input signal, should result in the stable operation of the system, and adaptive algorithm convergence can be guaranteed. The summary of adaptive algorithm for the novel nonlinear adaptive equalizer is given as follows (Table 3.5).

3.7.3.2

Decision Feedback Equalizer Using the Combination of FIR and FLNN

Combination of finite impulse response (FIR) filter and functional link neural network (CFFLNNDFE) equalizer with the decision feedback structure is depicted in Fig. 3.22. CFFLNNDFE equalizer compensates the linear and nonlinear distortions and tracks the characteristic of time-varying channels. The CFFLNNDFE adequately utilizes the advantages of the FLNN and characteristics of the linear

132

3

FLANN Adaptive Filter

Table 3.5 Summary of the adaptive algorithm for the novel nonlinear adaptive equalizer

filter to improve the performance. Furthermore, the proposed equalizer based on the decision feedback (DF) structure can get better performance especially on timevarying channels with spectral nulls in their amplitude characteristics, and the major purpose of feeding the feedback signals directly into the input layer of the FLNN instead of the functional expansion blocks is to reduce the number of nodes in the input layer. By using this decision feedback structure, we can benefit from the feedback behavior without increasing the number of the input signals [34]. It is obviously observed that the nonlinear equalizer consists of two subsections. One is a FLNN with decision feedback equalizer (FLNNDFE), and another is a FIR filter. For the former, the input signals consist of the received signal vector RX(k) = [x(k), x(k - 1), . . ., x(k - m + 1)]T , where m = N1 is the length of the input signals. In addition, the feedback output signals are directly fed into the input layer of the FLNN instead of being taken as the input signals, and the feedback signal vector from the decision device is defined by:  T V ðkÞ = v1 ðk Þ, v2 ðkÞ, . . . , vq ðk Þ = ½Signðyðk - 1ÞÞ, Signðyðk - 2ÞÞ, . . . , Signðyðk - qÞÞT

ð3:157Þ

3.7

Nonlinear Channel Equalization

133

Fig. 3.22 The diagram of the CFFLNNDFE equalizer

where q is the length of feedback signals, Sign() is a decision function and is defined by:  Signðyðk - 1ÞÞ =

-1

yðk - 1Þ < 0

1

yðk - 1Þ ⩾ 0

ð3:158Þ

I = f1, 2, . . .g with the Consider a set of basis functions B = fφi 2 LðAÞgi2I , following properties for the FLNN subsection in Fig. 3.4, where LðAÞ denotes the space of (Lebesgue) measurable function:

134

3

FLANN Adaptive Filter

1. φ1 = 1 2. The subset Bj = fφi 2 BgNi2I1 , I = f1, 2, . . .g is linearly independent set, i.e., if N1 P wi φi = 0, then wi = 0, for all i = 1, 2,. . ., N1 i=1

3. sup



j P i=1

1=2 kφi k2A

< 1 , where “sup” represents supremum operation

Let BN = fφi 2 BgNi =1 1 , I = f1, 2, . . .g be a set of basis functions to be considered. The FLNN consists of N1 basis functions to make up the vector φðkÞ 2 BN:  T φðkÞ = φ1 ðkÞ, φ2 ðkÞ, . . . , φN 1 ðkÞ . Thus the output signal z(k) of FLNNDFE is given as follows: zðk Þ = γ ðk Þ  Sðuðk ÞÞ ! q N1 X X wi ðk Þφi ðRX ðk ÞÞ + bi ðkÞvi ðk Þ = γ ðk Þ  S i=1

i=1

ð3:159Þ

    = γ ðkÞ  S W ðkÞT φðkÞ + BðkÞT V ðk Þ = γ ðkÞ  S W 1 ðk ÞT X ðkÞ where B(k) = [b1(k), b2(k), . . ., bq(k)]T is a weight coefficient vector of the feedback signal vector V (k), and the weight and input vectors of the FLNNDFE are redefined by W1(k) = [W(k)T, B(k)T]T and X(k) = [φ(k)T, V(k)T]T , respectively. W (k) ðW ðkÞ = ½w1 ðkÞ, w2 ðkÞ, . . . , wN 1 ðk ÞT denotes the coefficient vector of the FLNN, and the lengths of the W1(k) and X(k) are both defined by N1 + q. y(k) represents the change of adjusted amplitude, and nonlinear function S(.) is defined by: SðuðkÞÞ =

2 -1 1 + e - 2ðuðkÞÞ

ð3:160Þ

In this study, the function is chosen by S(x) = tanh(x) in Matlab tool, because that the derivative of the S() is easily obtained and given by S′(x) = 1 - tanh2 (x). z1 ðk Þ =

m X

wN 1 + i ðk Þχ ðk - i + 1Þ = W 2 ðk ÞT RX ðkÞ

ð3:161Þ

i=1

where W 2 ðk Þ = ½wN 1 + 1 ðkÞ, wN 1 + 2 ðk Þ, . . . , wN 1 + m ðk ÞT is the coefficient vector of the FIR filter. So the overall output signal y(k) of the proposed equalizer is given by: yðkÞ = λðkÞzðkÞ + ð1 - λðk ÞÞz1 ðk Þ   = λðkÞγ ðk ÞS W 1 ðkÞT X ðk Þ + ð1 - λðk ÞÞW 2 ðkÞT RX ðkÞ

ð3:162Þ

3.8

Computer Simulation Examples

135

where λ(k) is a convex combination parameter. The function of λ(k) in (3.162) is to make extreme values of λ(k) lead to either a pure FIR or a pure FLNNDFE (λ(k) = 1 or λ(k) = 0, respectively).

3.8

Computer Simulation Examples

In this part, we carry out experiments by MATLAB to test the robustness of different control algorithms. The standard average noise reduction (averaged noise reduction [ANR]) is used as the quantization index, which is obtained by averaging after the algorithm runs 100 times independently, and its expression is: ANRðnÞ = 20 log

Ae ðnÞ A d ð nÞ

ð3:163Þ

among that, Ae ðnÞ = λAe ðn - 1Þ + ð1 - λÞjeðnÞj, Ae ð0Þ = 0

ð3:164Þ

Ad ðnÞ = λAd ðn - 1Þ + ð1 - λÞjdðnÞj, Ad ð0Þ = 0

ð3:165Þ

where λ is a smoothing parameter, which is set as λ = 0.999 in this experiment.

3.8.1

FLANN-Based NANC with Minimum Phase Secondary Path System

The robust noise mitigation capability of the FsLMP, FsqLMP, RFsLMS, and FsMCC algorithms has been compared, considering a second-order FLANN model. The performance is evaluated through NANC systems with minimum phase secondary path. To verify the performance of algorithms under impulsive noise, a symmetric α-stable (SαS) distribution is modeled by the following characteristic function: ϕðt Þ = e - jtjα

ð3:166Þ

The performance is discussed under three different cases of impulsive noise as shown in Fig. 3.23: Case(a): α = 1.9, Case(b): α = 1.7, and Case(c): α = 1.5, where case(a) corresponds to the high impulsive environment and the case(c) is under the approximate White Gaussian Noise.

136

3

FLANN Adaptive Filter

Fig. 3.23 Waveforms of impulsive noise signal (a) α = 2; (b) α = 1.9; (c) α = 1.7; (d) α = 1.5

The primary noise is observed at the error microphone, and its expression is: d ðnÞ = uðn - 2Þ + δ1 u2 ðn - 2Þ - δ2 u3 ðn - 1Þ

ð3:167Þ

where δi, i = 1, 2, . . . measures the nonlinearity of the primary path, and it is set as δ1 = 0.08, δ2 = 0.04 in this part. The input signal is α-stable distributed noise, and the transfer function of primary path and secondary path with minimum phase characteristics are given as P(z) = z-3 - 0.3z-4 + 0.2z-5 and S(z) = z-2 + 0.5z-3, respectively. Simulation results of algorithms under different intensity of impulsive noises are shown in Figs. 3.21, 3.22, 3.23, and 3.24. It is shown that with the increasing of the noise occurrence probability and intensity, the control ability of the abovementioned algorithm generally decreased. Among the five comparative algorithms, the FxLMS algorithm does not possess robustness under impulsive environment. By contrast, the FxMCC algorithm has the best control effects on strong intensity of α-stable noise. The RFsLMS algorithm is the second ranked robust algorithm to counter with

3.8

Computer Simulation Examples

137

Fig. 3.24 ANR comparison curves of under impulsive noise with α = 2

Fig. 3.25 ANR comparison curves of under impulsive noise with α = 1.9

the impulsive noise environment among the five algorithms. FxLMP has little improvement for the FxLMS; however, it is still unstable and has slow convergence rate. The Jackson derivation improves the FxLMP algorithm in a large extent (Figs. 3.25, 3.26, and 3.27).

138

3

FLANN Adaptive Filter

ANR (dB)

Fig. 3.26 ANR comparison curves of under impulsive noise with α = 1.7

Fig. 3.27 ANR comparison curves of under impulsive noise with α = 1.5

3.8.2

Random Fourier Filter-Based NANC

3.8.2.1

Projection Dimension and Memory Length of Random Fourier Filter

In this part, we measure the steady-state ANR and time consumption of the RF-FxLMS algorithm for different settings of D to figure out the effects of projection dimension on algorithm performance. Another imperative factor for calculation amounts, memory length, is also discussed. The memory length of the input signal

3.8

Computer Simulation Examples

139

Fig. 3.28 Comparison curves of RF-FxLMS algorithm steady-state ANR and consuming time under different D values

determines the calculation amounts of nonlinear expansion, while the length of the linear auxiliary filter, indicated as M, is in command of system performance and stability. The logistic chaotic noise 1 is adopted as input, which is given as x(n) = λx(n - 1) [1 - x(n - 1)], n = 1, 2, 3, ⋯, where λ = 4 and x(n) = 0.9 if 0 ≤ n ≤ 1. The memory length is set as L=10. Besides, the transfer functions of the linear primary path and secondary path are given as P(z) = z-3 - 0.3z-4 + 0.2z-5 and S(z) = z-2 + 0.5z-3. The steady-state ANR is obtained by averaging the ultimate 200 iterations. Figure 3.28 compares the ANR level and consuming time of RF-FxLMS under different D values. The consuming time is based on a 1.8-GHz Intel Core i7 processor with 8GB of RAM, running MATLAB R2017b in Windows 10. It is clear that when the projection order increases, the noise reduction ability of the RF-FxLMS algorithm is enhanced. The projection order D of the random Fourier filter effect the accuracy of the algorithm, while this process is accompanied by a higher amount of calculation. It is also observed that the control effect of the algorithm gradually stabilizes after the projection order is greater than 300, while the consuming time is still increasing linearly. This is owing to the higher projection order D demands for more orders of FIR filter and so as to the greater computation required by the nonlinear module. The influence of different memory lengths on the RF-FxLMS algorithm is shown in Fig. 3.29, and in Fig. 3.30, we compare the performance of different delay lengths of the cascaded filter. In Fig. 3.29, it is observed that the memory length affects the performance of the algorithm, significantly. The diamond-shaped marked pink curve (L = 10) corresponds to the lowest noise that remains. However, after keeping on increasing L, the noise reduction level decreases. In Fig. 3.30, it is obvious that the CRF-FxLMS algorithm with cascaded segment and shorter memory length (L = 5) of the input signal not only attain but exceed the greatest ANR grade of the RF-FxLMS algorithm (L = 10).

140

3

FLANN Adaptive Filter

Fig. 3.29 Learning curves of RF-FxLMS algorithm with different memory length settings

3.8.2.2

Real Example: Random Fourier Filter-Based Active Traction Substation Noise Control

The traction substation noise is shown in Fig. 3.31, sampled through MATLAB and sound card with a sampling frequency of 8 kHz, 16 bits. The memory length of random Fourier filter is chosen as L = 200, and the linear auxiliary filter of cascaded random Fourier filter is selected as M = 70. To study the control effect of RF-FXLMS, CRF-FXLMS, RF-FXMCC, and CRF-FXMCC algorithms in part 6, the robustness of algorithms for primary path variation is considered. The primary and secondary paths are modeled by 256 and 100 FIR filters, of which the magnitude response and phase response are shown in Fig. 3.29. In this simulation, the secondary path modeling is fixed, while the primary path changes from primary path 1 (as Fig. 3.32a) to primary path 2 (as Fig. 3.32b) at the 10,000 iterations. The power spectrum within 100–800 Hz of the noise and residual noise is shown in Fig. 3.33. The traction substation noise is a broadband noise, of which the noise power is spread over the range of 40–60 dB. It is worth noting that because of the few outliers in this noise sample, the control effects of RF-FxLMS and RF-FxMCC algorithms are very similar. The RF-FxLMS and RF-FxMCC algorithm can achieve noise reduction of 10–20 dB, while CRF-FxLMS and CRF-FxGHSF algorithm can achieve 20–30 dB. From 400 to 800 Hz, the residual noise of the four control algorithms is similar, which is reduced to about 10 dB. From 800 to 1000 Hz, the RF-FxLMS and RF-FxMCC algorithms have little control effect, while the CRF-FxLMS and CRF-FxMCC algorithms still achieve an averaged noise power level of about 10 dB.

3.8

Computer Simulation Examples

141

Fig. 3.30 Comparison of the RF-FxLMS and the CRF-FxLMS algorithm with different settings

Fig. 3.31 Waveform of traction substation noise

The corresponding noise power level is shown in Fig. 3.34. It can be seen that the change of the primary path results in the mutation of the noise reductive level of control algorithms at 10000 iterations. The CRF-FxLMS and CRF-FxMCC

142

3

FLANN Adaptive Filter

Fig. 3.32 Magnitude response and phase response of the acoustic paths of primary path and secondary path

algorithms perform lower noise residual. Moreover, the control curves of the CRF-FXLMS and CRF-FxMCC algorithms are more stable in facing noise power variation.

3.8.3

Nonlinear Channel Equalization

Simulations were carried out extensively for the channel equalization problem, with several practical channels and nonlinear models, using three different NN structures and an FIR adaptive filter. The channel impulse response used for this study is given by [32]:

3.8

Computer Simulation Examples

143

Fig. 3.33 PSD of traction substation noise and residual noise controlled by RF-FxLMS, CRF-FxLMS, RF-FxMCC, and CRF-FxMCC algorithm

Fig. 3.34 ANR curves of RF-FxLMS, CRF-FxLMS, RF-FxMCC, and CRF-FxMCC algorithm for traction substation noise

h

i 1 2π 1 + cos ði - 2 Þ for i = 1,2 and 3 2 Λ =0

hð i Þ =

ð3:168Þ

  The input correlation matrix is given by ℛ = E Rk RTk , where E is the expectation operator. The eigenvalue ratio (EVR) of a channel is defined as λmax/λmin, where λmax

144

3

FLANN Adaptive Filter

and λmin are the largest and smallest eigenvalues of R. The higher the value of EVR, the worse the channel in terms of channel spread, and it is more difficult to equalize. The parameter L determines the EVR of the channel: the larger the value of L, the higher the EVR. In order to study the channels under different EVR conditions, L varied between 2.9 and 3.5 in steps of 0.2. Thus, the values of EVR become 6.08, 11.12, 21.71, and 46.79 with L values of 2.9, 3.1, 3.3, and 3.5, respectively [35]. The corresponding normalized channel impulse responses in z-transform domain are given by: CH = 1,

Λ = 2:9 : 0:209 + 0:995z - 1 + 0:209z - 2

CH = 2,

Λ = 3:1 : 0:260 + 0:930z - 1 + 0:260z - 2

CH = 3, CH = 4,

Λ = 3:3 : 0:304 + 0:903z - 1 + 0:304z - 2 Λ = 3:5 : 0:341 + 0:876z - 1 + 0:341z - 2

ð3:169Þ

The transmitted message is 4-QAM signal constellation of the form ±1 ± j1. Each symbol was drawn from a uniform distribution. A zero mean white Gaussian noise was added to the channel output. The received signal was normalized to unity so that the signal-to-noise ratio (SNR) becomes equal to the reciprocal of noise variance. The current received symbol rk and past three symbols were used as input in FIR– LMS- and MLP-based equalizer. Whereas for FLANN- and LeNN-based equalizers, the current symbol rk and past two symbols were used as input (see Fig. 3.19). Thus, the FIR-based adaptive equalizer has 16 weights. (It is observed that the increase in the number of weights does not improve the equalizer performance.) A number of experiments were carried out to determine the optimum NN architecture, the learning rate a and momentum parameter β. A 2-layer MLP with {8 - 8 - 2} architecture, i.e., with the number nodes in the input, hidden and output layer as 8, 8, and 2 (excluding the bias unit), respectively, is selected. The 6-dimensional input has been expanded to an 18-dimensional enhanced pattern, by using trigonometric and Legendre polynomials, for FLANN and LeNN, respectively. Thus, both FLANN and LeNN have an architecture of {18 - 2}. Back- propagation algorithm was used to train the NNs. Details of architecture of the four equalizers and the chosen learning parameters are provided in [6]. Nonlinear tanh() function was used in all the nodes (except the input nodes) of the NNs. The delay parameter was selected as 1. The details of functional expansion are given in [6]. The three nonlinear models and a linear model used in this study (see “NL” block of Fig. 3.17) are given by [36]: NL = 0 : bk = ak NL = 1 : bk = tanh ðak Þ NL = 2 : bk = ak + 0:2a2k - 0:1a3k

ð3:170Þ

NL = 3 : bk = ak + 0:2a2k - 0:1a3k + 0:5 cos ðπak Þ The linear channel model is represented by NL = 0. A nonlinear channel model that may occur due to the saturation of transmitter amplifier is represented by

3.8

Computer Simulation Examples

145

NL = 1. The nonlinear models NL = 2 and 3 denote two arbitrary nonlinear channels [36]. The main reason for using the channel models and nonlinear models is that these models have been widely used by other researchers [33, 37, 38].

3.8.3.1

Channel Equalization Using a Generalized NN Model

(i) MSE Performance To study the convergence characteristics and MSE performance of the equalizers, each equalizer was trained with 3000 iterations. To smooth out the randomness of the NN simulation, the MSE was averaged over 500 independent runs. The MSE characteristics for CH = 2 with 15dB additive noise are shown in Fig. 3.35. It may be

Fig. 3.35 MSE characteristics of the NN-based equalizers for CH = 2 with SNR = 15 dB. (a) NL = 0, (b) NL = 1, (c) NL = 2, (d) NL = 3. Here “FLN” and “LEG” denote FLANN and LeNN, respectively. Note: the MSE characteristics of LeNN and FLANN almost overlap

146

3

FLANN Adaptive Filter

noticed that the MSE characteristics of LeNN and FLANN almost overlap each other. It is clear that the performance FIR–LMS-based linear equalizer is the worst among the four equalizers. Its MSE settles between -7 and -10 dB for the four NL models. In addition, it provided the slowest convergence rate. The MLP-based equalizer performs much better than the FIR equalizer. Its MSE settles between 15 and -20 dB for the four NL models. The performances of LeNN and FLANN are found to be similar. The MSE floor for both LeNN- and FLANN-based equalizers is about -23 dB for all the four NL models. The MSE convergence rate is also the fastest for LeNN- and FLANN-based equalizers. The MSE floor settles at about 1500 iterations for LeNN and FLANN equalizers, whereas in case of MLP, it takes about 3000 iterations. Similar performances are observed also for other channels with other values of additive noise. (ii) BER Performance All the four equalizer structures were trained for 3000 iterations, and their weights are frozen and stored. Thereafter, to calculate the bit error rate, the stored weights are loaded into the NN, and new test symbols are transmitted. Based on the new received samples, the equalizer estimates the transmitted symbol. If there is a mismatch between the transmitted symbol (delayed by 1) and the NN equalizer output, it gives a bit error. The BER was computed with 2 × 106 test symbols. This process was repeated for different values of additive noise ranging from 10 to 20dB with step increment of 1 dB. The BER performance of CH = 2 for the four NL models is shown in Fig. 3.36. As expected, the BER decreases as the SNR increases. The NL = 3 case is the most severe nonlinear model. It can be seen that the NN-based equalizers perform much better compared to FIR-LMS-based equalizer. Among the three NN-based equalizers, performance of MLP-based equalizer is inferior to the other two. Interestingly, the performances of LeNN- and FLANN-based equalizers are quite similar. In case of MLP-based equalizer, for NL = 3, when SNR rises from 15 to 20 dB, the log10(BER) falls from -1.68 to -3.22. In the same situation, in case of FLANNand LeNN-based equalizers, the BER fall is from 1.92 to -3.54 and from -1.92 to -3.47, respectively (see Fig. 3.36d). In case of more severe nonlinearity in the channel CH = 3, under similar situation, the BER falls for MLP-, FLANN-, and LeNN-based equalizers, which are from -1.03 to -1.51, from -1.21 to -1.76, and from -1.22 to -1.75, respectively (due to space constraint the figure is not provided). In order to show the performance of the equalizers under different channel nonlinearities, we have plotted BER with varying EVR at SNR = 15 dB, in Fig. 3.37. We have stated that CH = 1; CH = 2; CH = 3, and CH = 4 correspond to EVR values of 6.08, 11.12, 21.71, and 46.82, respectively. The higher the EVR, the more difficult is the channel to equalize. It is observed that as EVR increases, the BER also rises. However, the rise of BER is less severe in case of FLANN- and LeNN-based equalizers compared to MLP-based equalizer. It can be seen that in case of NL = 3, as the EVR increases from 6.08 (CH = 1) to 46.82 (CH = 4), the rise of log10(BER) for the MLP-, FLANN-, and LeNN-based equalizers is given

3.8

Computer Simulation Examples

147

Fig. 3.36 BER performance the NN-based equalizers for CH = 2. (a) NL = 0, (b) NL = 1, (c) NL = 2, (d) NL = 3. Here “FLN” and “LEG” denote FLANN and LeNN, respectively. Note. The BER performance of LeNN and FLANN almost overlap

from -2.77 to -0.69, from -3.06 to -0.85, and from -3.04 to -0.87, respectively (see Fig. 3.37d). Thus, LeNN- and FLANN-based equalizers perform better than MLP-based equalizer, as EVR increases. As the performance of ANN algorithms depends on the random initialization of the network parameters, it is important to analyze the statistical behavior of the algorithm by repeating the experiments several times. We need to calculate 95% or 99% confidence interval of the results to ascertain that it is narrow enough so that we can use the algorithm reliably. To calculate the confidence interval of our algorithm and to compare the results with other competitive algorithms, we selected two cases of BER experiments, for SNR = 12 and 16 dB (please refer to Fig. 3.36d). Out of the four algorithms, we choose our proposed LeNN algorithm and the FLANN algorithm. The FLANN algorithm was chosen as its performance is closely similar to that of LeNN. We repeated the experiments 20 times, starting from random initialization of parameters

148

3

FLANN Adaptive Filter

Fig. 3.37 EVR performance the NN-based equalizers for CH = 2 with SNR = 15 dB. (a) NL = 0, (b) NL = 1, (c) NL = 2, (d) NL = 3. Here “FLN” and “LEG” denote FLANN and LeNN, respectively. Note. The EVR performance of LeNN and FLANN almost overlap

followed by training of the parameters and finally finding the BER. Thus, we obtain two sets of BER results, for LeNN and FLANN, at SNR = 12 dB. Let us name them as ΓL12 and ΓF12. Similarly, experiments were performed for SNR = 16 dB, and the corresponding two sets of results are named as ΓL16 and ΓF16 for LeNN and FLANN algorithms, respectively. To ascertain normal distribution of data, we have used the normal probability plot method. Considering n data points, this is done by sorting of data in ascending order in each set, assigning the index i to each data element, calculating fi = (i - 0.375)/(n + 0.25) corresponding to each data i, and finally plotting data value xi versus fi. An approximate straight line for each data set ensures the normal distribution of data. The mean and standard deviation for four data sets were μL12 = - 1.3557, μF12 = - 1.3500, σ F12 = 0.0069, μL16 = - 2.1749, σ L12 = 0.0169, μF16 = - 2.1733, and σ F16 = 0.0159 for four sets of data ΓL12,

3.8

Computer Simulation Examples

149

Fig. 3.38 The signal constellation produced by the equalizers for CH = 2 with SNR = 15 dB and NL = 3: (a) FIR–LMS, (b) MLP, (c) FLANN, (d) LeNN. Here “FLN” and “LEG” denote FLANN and LeNN, respectively

ΓF12, ΓL16, and ΓF16, respectively. Corresponding 99% confidence intervals are computed and obtained as -0.0043, -0.0040, -0.0097, and -0.0091, respectively. Very low values of confidence interval ensure very high confidence in our algorithm. Though confidence interval for SNR = 16 dB is slightly more than that for SNR = 12 dB, their absolute values are still extremely small. Moreover the intervals for LeNN and FLANN are almost same. (iii) Signal Constellation Diagrams The signal constellation diagram provides a visual representation of the performance of the equalizer. To obtain signal constellation diagram, we trained the equalizers with 3000 samples. Thereafter, we fed 1000 new samples to the equalizer to obtain the equalizer outputs. The equalizer output for these 1000 samples is plotted in Fig. 3.38 for CH = 2 with additive noise of 15 dB and NL = 3. It can

150

3

FLANN Adaptive Filter

Fig. 3.39 The signal constellation produced by the MLP- and LeNN-based equalizers for CH = 4 with SNR = 15dB: (a) MLP, NL = 2, (b) MLP, NL = 3 (c) LeNN, NL = 2, (d) LeNN, NL = 3. Here “LEG” denotes LeNN

be seen that the signal constellation of the FIR-LMS-based equalizer is the clumsiest. The signal constellation of the MLP-based equalizer is less clean than the FLANNand LeNN-based equalizers. However, it can be seen that the signal constellation produced by the LeNN- and FLANN-based equalizers is similar. To demonstrate superior performance of LeNN over MLP, the signal constellation diagrams for a high EVR channel (CH = 4) with NL = 2 and 3 nonlinear models are shown in Fig. 3.39. It can be seen that the LeNN-based equalizer provides a much clean signal constellation than the MLP-based equalizer for high EVR channel with severe nonlinearity. Generally the signal constellation diagram provides a qualitative assessment of the equalizer performance. However, in order to have a quantitative assessment of the constellation diagram, we carried out the computations as follows. After plotting the 1000 equalizer outputs as shown in Fig. 3.38 or 3.39, we counted the number of data points whose both in-phase and quadrature component values are greater than -

3.8

Computer Simulation Examples

151

0.25 (i.e., the number of data points lying near the four corners of the square). Larger number of data points concentrating in these regions implies better performance of the equalizer. In Fig. 3.38, the number of such data points found to be 839, 968, 993, and 991 for the FIR–LMS-, MLP-, FLANN-, and LeNN-based equalizers, respectively. The number of such data points in Fig. 3.39a–d is found to be 947, 803, 981, and 927, respectively. Higher number of data points concentrated about the four corners of the square indicates that performance of LeNN- and FLANN-based equalizers is similar, but LeNN-based equalizer performs much better than MLP-based equalizer, especially for high-EVR channels with severe nonlinearities.

3.8.3.2

Adaptive Equalizer Based on the FLNN Cascaded with Chebyshev Orthogonal Polynomial Structure

(i) MSE Performance The convergence characteristics of the MSE for CH = 3 at SNR of 15 dB are plotted in Fig. 3.40, where in the simulation, we take CH = 3 and SNR = 15 dB, and the MSE is calculated [33]. For different linear and nonlinear channels, results obtained

Fig. 3.40 Convergence characteristics of the four nonlinear equalizers for CH = 3 with different nonlinear models at SNR of 15 dB: (a) NL = 0, (b) NL = 1, (c) NL = 2, and (d) NL = 3

152

3

FLANN Adaptive Filter

Fig. 3.41 BER performance of the four nonlinear equalizers for CH = 3 with variation of SNR: (a) NL = 0, (b) NL = 1, (c) NL = 2, and (d) NL = 3

from the simulations are shown that the convergence rate of the FLNNCPAE is superior to the FLNN, MLP, and RBF, and all show faster convergence. As the nonlinear intensity increases, the steady-state error of FLNN, MLP, and RBF also increases. However, the FLNNCPAE all the times holds lower the steady-state error compared to other equalizers. In particular, for very severe nonlinear distortions, the MSE of the FLNNCPAE is about -40 dB and only degrades by 15 dB compared to the other equalizers. Then, simulations show that the novel adaptive equalizer can remove the different linear and nonlinear distortions. (ii) BER Performance BER performance of the four nonlinear equalizers for CH = 3 for both linear and nonlinear channel models is plotted in Fig. 3.41 with variation of SNR values from 8 to 16 dB for the linear (NL = 0) and three nonlinear channel models (NL = 2, 3, and 4). Though the computational complexity of FLNNCPAE is not better than the FLNN, it is obtained that the FLNNCPAE is superior to the FLNN, MLP, and RBF in terms of BER performance. Especially, for severe nonlinear channel model (NL = 3), the FLNNCPAE much outperforms other structures. Also the performance of the FLNN, RBF, and MLP is more or less similar for a wide variation of SNR values.

3.8

Computer Simulation Examples

153

Fig. 3.42 Effect of variation of EVR on BER performance at SNR of 10 dB: (a) NL = 0, (b) NL = 1, (c) NL = 2, and (d) NL = 3

For the four different equalizers, the effect of the variation of EVR on the BER performance of the equalizers at SNR of 10 dB is depicted in Fig. 3.42. For all the equalizers, the BER increases with an increase in EVR for both linear and nonlinear models. However, the performance degradation because of the increase of EVR is less severe in the FLNNCPAE than that of other equalizers. The performance of all the equalizers deteriorates with the introduction of nonlinear distortions. However, results obtained from the simulations show that the FLNNCPAE on the BER performance is superior to the FLNN, MLP, and RBF with linear and the three nonlinear models for a wide variation of EVR from 1 to 46.8, and we found the performance of the MLP-based equalizers is similar to the RBF-based ones. (iii) The Eye Patterns The eye patterns (the equalizer output values) provide another indication of effectiveness of the equalization process. The eye patterns of the equalizer input and output signals for CH = 3, NL = 4, and SNR = 15 dB are plotted in Fig. 3.43. For ease of visualization, only 2500 sample points are drawn in the figures. The input signals of the equalizer distorted by the noise are shown in Fig. 3.43a. The outputs of the equalizers exploiting FLNNCPAE, FLNN, MLP, and RBF are given in

154

3

FLANN Adaptive Filter

Fig. 3.43 Eye patterns of the four equalizers with 2500 symbols for CH = 3 with the nonlinear model (NL = 3) at SNR of 15 dB: (a) noisy input, (b) FLNNCPAE, (c) FLNN, (d) RBF, and (e) MLP

Fig. 3.43b–d, respectively. It can be seen from Fig. 3.40 that the equalizer output signals are well concentrated at the desired values 71. And FLNN is found to be lightly worse. However, for MLP and RBF, the values of the equalizer output are widely spread along the +1 and -1, and both are very close. So these results clearly demonstrate that the effectiveness of channel equalization using FLNNCPAE is superior to the other three equalizers. Similar observations can be made for all the four channels with the linear and the four nonlinear models studied.

3.8

Computer Simulation Examples

3.8.3.3

155

Adaptive Decision Feedback Equalizer with the Combination of FIR Filter and FLANN

For nonlinear time-variant channel, the channel coefficients ai(k)(i = 1, 2) are varying with time k. The time-varying coefficients are generated by the application of a second-order Markov model in which a white Gaussian noise source drives a second-order Butterworth low-pass filter [34]. Channel model 1 of minimum phase channel The nonlinear time-variant channel model which has a minimum phase is given by: rðkÞ = a1 ðkÞxðkÞ + a2 ðkÞxðk - 1Þ - 0:9½a1 xðk Þ + a2 ðkÞxðk - 1Þ3 + vðkÞ Note that we centered a1(k) about 1 and a2(k) about 0.5. Channel model 2 of non-minimum phase channel The nonlinear time-variant channel model which has a non-minimum phase is given by: r ðkÞ = a1 ðkÞxðkÞ + a2 ðk Þxðk - 1Þ + a3 ðkÞxðk - 2Þ + 0:2½a1 ðkÞxðk Þ + a2 ðkÞxðk - 1Þ + a3 ðk Þxðk - 2Þ2 + vðkÞ Note that we centered a1(k) about 0.3482, a2(k) about 0.8704 and a3(k) about 0.3482. Numerous experiments are carried out to give the best result in the four nonlinear equalizers. In order to have a fair comparison among the FLNN, RBF, LMSDFE, and CFFLNNDFE, in the RBF-based equalizer, a two-layer structure is selected in which the numbers of nodes excluding the bias unit in the input, hidden, and output layer are 2, 30, and 1, respectively. The trigonometric functions are used for the functional expansion of the input pattern, and then in both of the cases, input pattern is expanded from the 5-dimensional to a 26-dimensional input. (i) MSE Performance The convergence characteristics include the convergence speed and stable-steady error of the four equalisers. Simulation results are averaged over 100 independent experiments in time-invariant and time-variable channels for different BPSK random sequences. Figure 3.44 depicts the coefficient modules for different BPSK random sequences and random initialisation with β = 0.1. Each run has coefficients for the equalizers. And SNR of 20 dB is applied. From the figures, we observe that the MSE curves of the FLNN and RBF are not distinguishable, and the MSE performance of the FLNN and RBF both outperform the LMSDFE. Furthermore, convergence performance of the CFFLNNDFE is obviously superior to the others for either time-invariant or time-variant channel modules. Fast convergence rates of the CFFLNNDFE come from the superiority of SMNLMS algorithm and combination type.

156

3

FLANN Adaptive Filter

Fig. 3.44 Convergence properties of the equalizers under SNR = 20 dB. (a) Time-invariant channel model 1. (b) Time-invariant channel model 2. (c) Time-variant channel model 1. (d) Time-variant channel model 2

(ii) BER Performance In the first experiment, we fixed beta = 0.1 and run simulations for night different SNR ranging from SNR = 4 dB to SNR = 20 dB at 2 dB intervals (4:2:20). In Fig. 3.45, BER performance comparisons are presented. In each trial, the 5000 BPSK samples are used for training, and the next 10,000 signals are used for testing. The coefficient vectors of the equalizers are frozen after the training stage, and then the test is continued. It is obviously seen that the CFFLNNDFE shows better performance than the others. In our second experiment, we use the fixed SNR at 20 dB and run simulations for eight different beta values ranging from the BER and standard deviation of BER with respect to different beta = 0.04 to beta = 0.32, with step size 0.04 (0.04:0.04:0.32). Standard deviations of beta are depicted in Fig. 3.46. For the LMS-DFE, we can see that the BER performance is much worse than the other three nonlinear equalizers, and the FLNNs are almost the same as the RBFs. Moreover, results obtained from the figure show that the CFFLNNDFE is better than that of others in terms of BER of various standard deviation values for time-variant nonlinear channel.

3.8

Computer Simulation Examples

157

Fig. 3.45 BER performance of the equalizers. (a) Time-invariant channel model 1. (b) Timeinvariant channel model 2. (c) Time-variant channel model 1. (d) Time-variant channel model 2

Fig. 3.46 BER performance comparison with changing standard deviation. (a) BER performance comparison of time-variant nonlinear channel model 1. (b) BER performance comparison of timevariant nonlinear channel model 2

158

3

FLANN Adaptive Filter

Fig. 3.47 Eye patterns of the four equalizers with 5000 symbols for time-variant nonlinear channel model 2 at SNR of 20 dB. (a) Noisy input. (b) CFFLNNDFE. (c) FLNN. (d) RBF. (e) LMSDFE

(iii) The Eye Patterns It is well known that another indication of effectiveness of the equalization process is presented by the eye patterns (the equalizer output values). Figure 3.47 depicts the eye patterns of the equalizer input and output signals for time-variant nonlinear channel model 2, where 5000 sample points are plotted in the figures. The input signals of the equalizers distorted by the noise are shown in Fig. 3.47a. The

References

159

outputs of the equalizers using CFFLNNDFE, FLNN, RBF, and LMSDFE are given in Fig. 3.47b–e, respectively. It is observed that from Fig. 3.47 that the output signals of the proposed FLNN and RBF show a little bit worse, whereas the output values equalizer are well concentrated at the desired values ±1. Besides, it illustrates that the effectiveness of the LMSDFE is widely spread along +1 and -1. There, channel equalization using the CFFLNNDFE is superior to the other three equalizers.

3.9

Summary

In this chapter, the FLANN filter for nonlinear systems is introduced. The FLANN filter has stronger nonlinearity than the Volterra expansion as in Chap. 3. It is a single-layer learning network without a hidden layer. Hence, it also has a low computation burden. Two typical improvements for the FLANN filter are described. Moreover, some robust FLANN filtering algorithms have been introduced and derived for the FLANN filter in the NANC application. In the next chapter, a similar nonlinear approximation method based on the spline adaptive filter will be presented.

References 1. Y. Tian and Z. Zhang, “Identification of Nonlinear Dynamic Systems Using Neural Networks,” Proc. Int. Symp. Test Meas., vol. 2, no. 2, pp. 997–1000, 2003. 2. G. L. Sicuranza and A. Carini, “A generalized FLANN filter for nonlinear active noise control,” IEEE Trans. Audio, Speech Lang. Process., vol. 19, no. 8, pp. 2412–2417, 2011. 3. K. Burse, R. N. Yadav, and S. C. Shrivastava, “Channel equalization using neural networks: A review,” IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., vol. 40, no. 3, pp. 352–357, 2010. 4. C. B. Borkowf, “Neural Networks: A Comprehensive Foundation (2nd Edition),” Technometrics, vol. 44, no. 2, pp. 194–195, 2002. 5. J. C. Patra, “Chebyshev neural network-based model for dual-junction solar cells,” IEEE Trans. Energy Convers., vol. 26, no. 1, pp. 132–139, 2011. 6. J. C. Patra, P. K. Meher, and G. Chakraborty, “Nonlinear channel equalization for wireless communication systems using Legendre neural networks,” Signal Processing, vol. 89, no. 11, pp. 2251–2262, 2009. 7. I. Shingareva and C. Lizárraga-Celaya, “Special Functions and Orthogonal Polynomials BT Maple and Mathematica: A Problem Solving Approach for Mathematics,” I. K. Shingareva and C. Lizárraga-Celaya, Eds. Vienna: Springer Vienna, 2009, pp. 261–268. 8. D. P. Das and G. Panda, “Active mitigation of nonlinear noise processes using a novel filtered-s LMS algorithm,” IEEE Trans. Speech Audio Process., vol. 12, no. 3, pp. 313–322, 2004. 9. R. M. A. Zahoor and I. M. Qureshi, “A modified least mean square algorithm using fractional derivative and its application to system identification,” Eur. J. Sci. Res., vol. 35, no. 1, pp. 14–21, 2009. 10. D. C. Le, J. Zhang, and Y. Pang, “A bilinear functional link artificial neural network filter for nonlinear active noise control and its stability condition,” Appl. Acoust., vol. 132, no. March 2017, pp. 19–25, 2018.

160

3

FLANN Adaptive Filter

11. L. Luo, W. Zhu, and A. Xie, “A novel acoustic feedback compensation filter for nonlinear active noise control system,” Mech. Syst. Signal Process., vol. 158, p. 107675, 2021. 12. H. Zhao, X. Zeng, Z. He, T. Li, and W. Jin, “Nonlinear adaptive filter-based simplified bilinear model for multichannel active control of nonlinear noise processes,” Appl. Acoust., vol. 74, no. 12, pp. 1414–1421, 2013. 13. H. Zhao, X. Zeng, Z. He, and T. Li, “Adaptive RSOV filter using the FELMS algorithm for nonlinear active noise control systems,” Mech. Syst. Signal Process., vol. 34, no. 1–2, pp. 378–392, 2013. 14. R. Majhi, G. Panda, and G. Sahoo, “Development and performance evaluation of FLANN based model for forecasting of stock markets,” Expert Syst. Appl., vol. 36, no. 3, pp. 6800–6808, Apr. 2009. 15. G. L. Sicuranza and A. Carini, “Adaptive recursive FLANN filters for nonlinear active noise control,” in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 4312–4315. 16. H. Zhao, X. Zeng, Z. He, S. Yu, and B. Chen, “Improved functional link artificial neural network via convex combination for nonlinear active noise control,” Appl. Soft Comput. J., vol. 42, pp. 351–359, 2016. 17. M. Ferrer, A. Gonzalez, S. Member, and M. De Diego, “Convex Combination Filtered-X Algorithms for Active Noise Control Systems,” vol. 21, no. 1, pp. 156–167, 2013. 18. J. Kivinen, A. J. Smola, and R. C. Williamson, “Online learning with kernels,” IEEE Trans. Signal Process., 2004. 19. T. Deb, D. Ray, and N. V George, “A Reduced Complexity Random Fourier Filter Based Nonlinear Multichannel Narrowband Active Noise Control System,” IEEE Trans. Circuits Syst. II Express Briefs, vol. 68, no. 1, pp. 516–520, 2021. 20. X. Xu and W. Ren, “Random Fourier feature kernel recursive maximum mixture correntropy algorithm for online time series prediction,” ISA Trans., no. xxxx, 2021. 21. A. Rahimi and B. Recht, “Random features for large-scale kernel machines,” in Advances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference, 2009. 22. Y. Zhu, H. Zhao, X. He, Z. Shu, and B. Chen, “Cascaded Random Fourier Filter for Robust Nonlinear Active Noise Control,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 9290, no. c, 2021. 23. K. Pelekanakis and M. Chitre, “Adaptive sparse channel estimation under symmetric alphastable noise,” IEEE Trans. Wirel. Commun., vol. 13, no. 6, pp. 3183–3195, 2014. 24. H. Zhao, X. Zeng, Z. He, T. Li, and W. Jin, “Nonlinear adaptive filter-based simplified bilinear model for multichannel active control of nonlinear noise processes,” Appl. Acoust., vol. 74, no. 12, pp. 1414–1421, 2013. 25. L. Eriksson, M. Allie, and R. Greiner, “The selection and application of an IIR adaptive filter for use in active sound attenuation,” IEEE Trans. Acoust., vol. 35, no. 4, pp. 433–437, 1987. 26. M. Shao and C. L. Nikias, “Signal Processing with Fractional Lower Order Moments: Stable Processes and Their Applications,” Proc. IEEE, vol. 81, no. 7, pp. 986–1010, 1993. 27. L. Lu and H. Zhao, “Adaptive Volterra filter with continuous lp-norm using a logarithmic cost for nonlinear active noise control,” J. Sound Vib., vol. 364, pp. 14–29, 2016. 28. K. Yin, H. Zhao, and L. Lu, “Functional link artificial neural network filter based on the q-gradient for nonlinear active noise control,” J. Sound Vib., vol. 435, pp. 205–217, 2018. 29. N. V. George and G. Panda, “A robust filtered-s LMS algorithm for nonlinear active noise control,” Appl. Acoust., vol. 73, no. 8, pp. 836–841, 2012. 30. N. C. Kurian, K. Patel, and N. V. George, “Robust active noise control: An information theoretic learning approach,” Appl. Acoust., vol. 117, pp. 180–184, 2017. 31. E. Roy, R. W. Stewart, and T. S. Durrani, “High-order system identification with an adaptive recursive second-order polynomial filter,” IEEE Signal Process. Lett., vol. 3, no. 10, pp. 276–279, 1996.

References

161

32. S. Wang, L. Dang, B. Chen, S. Duan, L. Wang, and C. K. Tse, “Random Fourier Filters Under Maximum Correntropy Criterion,” IEEE Trans. Circuits Syst. I Regul. Pap., vol. 65, no. 10, pp. 3390–3403, 2018. 33. H. Zhao and J. Zhang, “Functional link neural network cascaded with Chebyshev orthogonal polynomial for nonlinear channel equalization,” Signal Processing, vol. 88, no. 8, pp. 1946–1957, 2008. 34. H. Zhao, X. Zeng, X. Zhang, J. Zhang, Y. Liu, and T. Wei, “An adaptive decision feedback equalizer based on the combination of the FIR,” Digit. Signal Process., vol. 21, no. 6, pp. 679–689, 2011. 35. S. Haykin, “Adaptive Filter Theory (3rd Ed.) by Simon Haykin.pdf,” pp. 1–997, 2002. 36. J. C. Patra and R. N. Pal, “A functional link artificial neural network for adaptive channel equalization,” Signal Processing, 1995. 37. C. T. Yen, W. De Weng, and Y. T. Lin, “FPGA realization of a neural-network-based nonlinear channel equalizer,” IEEE Trans. Ind. Electron., 2004. 38. W. De Weng, C. S. Yang, and R. C. Lin, “A channel equalizer using reduced decision feedback Chebyshev functional link artificial neural networks,” Inf. Sci. (Ny)., 2007.

Chapter 4

Spline Adaptive Filter

4.1

Introduction

In Chaps. 3 and 4, we have introduced the nonlinear model with Volterra expansion in general [1] and the functional link artificial neural network (FLANN) [2], respectively. In practice, one of the most used structures in nonlinear filtering is the so-called block-oriented representation. In this chapter, we will further introduce this type of nonlinear filter. There are several base types of the block-oriented nonlinear structure including the Wiener model [3], the Hammerstein model [4, 5], and the variants originated from these two classes in accordance with different topologies (i.e., parallel, feedback, and cascade) [6]. Specifically, the Wiener model consists of a cascade of a linear time-invariant (LTI) filter followed by a static nonlinear function, which is sometimes deemed as a linear-nonlinear (LN) model, and the Hammerstein model comprises a static nonlinear function connected behind an LTI filter, which is usually considered as a nonlinear-linear (NL) model [7, 8]. In recent years, some scholars have combined block-oriented architecture with spline functions and proposed a new Wiener model for LN blocks, called Wiener spline adaptive filter (SAF) [9]. Spline adaptive filter is composed by a linear combiner followed by an adaptable lookup table (LUT) addressed by the linear combiner output and interpolated by a local low-order polynomial spline curve. Both the weights of the linear filter and the interpolating points of the LUT can be adapted by minimization of a specified cost function [10]. In addition, there also have Hammerstein spline filters, cascade spline filter, and IIR spline adaptive filter (IIR-SAF) [11, 12]. And the main abbreviations and symbols in this chapter are given in Table 4.1.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Zhao, B. Chen, Efficient Nonlinear Adaptive Filters, https://doi.org/10.1007/978-3-031-20818-8_4

163

164

4

Table 4.1 Reverberation time T60 and filter length L used in experiments

T60[ms] Anechoic 50 100 200 350

Spline Adaptive Filter N 256 512 1024 1280 2048

Fig. 4.1 Structure of SAF

4.2 4.2.1

Spline Filter Model Spline Adaptive Filter

Figure 4.1 shows the spline adaptive filter, which is essentially a linear-nonlinear network, and the linear part is an FIR filter. The nonlinear network consists of an adaptive lookup table (LUT) and a spline interpolation network. As shown in the figure, when the instantaneous moment is n, the input of the system is x(n), the output of the linear network is s(n), and y(n) is the output of the spline filter. Therefore: sðnÞ = wT ðnÞxðnÞ

ð4:1Þ

where w(n) = [w0, w1 ,. . ., wN-1]T is the adaptive weight vector of the FIR filter. x(n) = [x(n), x(n-1),. . ., x(n-N+1)]T is the tap delay input signal vector, and N is the number of filter taps. s(n) and y(n) are related by a nonlinear function. In fact, y(n)

4.2

Spline Filter Model

165

Fig. 4.2 Schematic structure of the SAF. Block S1 computes the parameters u and i by (4.2) and (4.3), while S2 computes the SAF output through the spline patch determined by S1

Fig. 4.3 Example of qy,i control points interpolation using a CR-spline function with a fixed step for the x-axes control points Δx = qx,i - qx,i-1

depends on s(n) a function determined by the span index i and the local parameter u, where u 2 [0, 1]. In the simple case of uniform spacing of knots and referring to the top of Fig. 4.2, we constrain the control point abscissas to be equidistant and, most importantly, not adaptable [14]. Moreover, for the sake of efficiency, another constraint is imposed on the control points, forcing the sampling interval to be centered on the x-axis origin (see Fig. 4.3).

166

4

Spline Adaptive Filter

The calculation of the local parameter u is: uð nÞ =

  s ð nÞ s ð nÞ Δx Δx

ð4:2Þ

And the span index i is obtained by: 

 s ð nÞ Q-1 i= þ Δx 2

ð4:3Þ

where Q is the total number of control points, Δx is the gap between the control points, and b•c is the floor operator. The output of the spline adaptive filter is given by: yðnÞ = φi ðuÞ = uT Cqi,n

ð4:4Þ

where u = [u3(n), u2(n), u(n), 1]T and qi = [qi, qi+1, qi+2, qi+3]T is the control point vector and C is the spline basis matrix. Among the spline nonlinear filters, the CR-spline base matrix and the B-spline base matrix are the most widely used, which are given by: 0

1 -1 3 -3 1 B -5 4 -1C 1B 2 C C CR = B C 2@ -1 0 1 0 A 0 2 0 0 0 1 -1 3 -3 1 B 0C -6 3 1B 3 C CB = B C 6@ -3 3 0A 0 1 4 1 0

4.2.2

Basic Spline Filter Algorithm

In this section, we will introduce some basic algorithms based on spline filter, such as SAF-LMS [10], SAF-NLMS [14], SAF-SNLMS [11], and SAF-VSS-SNLMS [11]. And the simulation results of them will be placed in 4.5.1.

4.2

Spline Filter Model

4.2.2.1

167

SAF-LMS Algorithm

In this section, we will briefly introduce the most basic spline filter nonlinear adaptive algorithm, called SAF-LMS. The online learning rules for nonlinear spline adaptive filters can also be derived by minimizing n o the cost function, and the most typical definition is   e J wn , qi,n = E jeðnÞ2 j . In general, the cost function can be approximated by instantaneous error, and the expression is as follows:   J wn , qi,n = e2 ðnÞ

ð4:5Þ

where e(n) is the prior error signal, referring to Figs. 4.1 and 4.2, and formula (4.4), and we can see that its expression is: eðnÞ = dðnÞ - φi ðuÞ

ð4:6Þ

In order to minimize formula (4.5), we use the stochastic gradient adaptive method to find its derivative with respect to the weight vector w:   ∂J wn , qi,n ∂φ ðuÞ ∂u ∂sðnÞ = - 2eðnÞ i ∂wn ∂u ∂sðnÞ ∂wn

ð4:7Þ

where sðnÞ = wTn xn . From (4.4) the local derivative of the i-th span is ∂φ∂ui ðuÞ = T = ½3u2 ðnÞ, 2uðnÞ, 1, 0 . φ0i ðuÞ = u_ T Cqi,n , where u_ = ∂u ∂u 1 . Hence (4.7) becomes: And from expression (4.2), we have that ∂s∂uðnÞ = Δx   ∂J wn , qi,n 2 =eðnÞφi 0 ðuÞxn Δx ∂wn

ð4:8Þ

For the derivative computation of (4.5) with respect to the control points qi,n, we have:   ∂J wn , qi,n ∂ðd ðnÞ - φi ðuÞÞ2 ∂φi ðuÞ = ∂φi ðuÞ ∂qi,n ∂wn

ð4:9Þ

i ðuÞ where, from (4.4), we have that ∂φ = CT u, so we can write: ∂q i,n

  ∂J wn , qi,n = - 2eðnÞCT u ∂wn

ð4:10Þ

168

4

Spline Adaptive Filter

At this point, we can obtain the update equations for the weight vector and the control point, respectively: wn + 1 = wn + μw eðnÞφi 0 ðuÞxn

ð4:11Þ

qn + 1 = qn + μq eðnÞCT u

ð4:12Þ

where the parameters μw and μq represent the learning rates for the weights and for the control points, respectively. The algorithm convergence conditions are: 0 < μw ≤

2 φ i 0 2 ð uÞ kxn k2

ð4:13Þ

2 kCuk2

ð4:14Þ

0 < μq ≤

A summary of the SAF-LMS algorithm is as follows:

4.2.2.2

SAF-NLMS Algorithm

However, the updating process of the above SAF algorithms will be affected by the eigenvalue spread of the autocorrelation matrix of the input signal. The stability of adaptive filtering algorithm is an important indicator for verifying the algorithm [13]. So, people investigate a normalized variant for the SAF-LMS called normalized LMS algorithm based on the SAF model (SAF-NLMS) [14].

4.2

Spline Filter Model

169

Using the Lagrange multiplier method, the cost function for the SAF-NLMS can be defined as:   J qi,n + 1 =

2 1 1 e2 ðnÞ + qi,n + 1 - qi,n  T 2 2un un

ð4:15Þ

  where ð1=2Þ × eðnÞ=uTn un can be viewed as the Lagrange multiplier. Taking the derivative of (4.15) with respect to qi,n+1 and wn+1, respectively, and setting them to zeros, we can obtain two recursive equations of the tap weights and control points for the SAF-NLMS algorithm: wn + 1 = wn + μw

e ð nÞ 1 T u_ Cq x uTn un + ε Δx n i,n n

ð4:16Þ

e ð nÞ CT un uTn un + ε

ð4:17Þ

qi,n + 1 = qi,n + μq

where μw and μq are the step sizes for the linear network and nonlinear network, respectively, the small positive constant ɛ is used for avoiding zero division. The algorithm convergence conditions are: h i 2 ε + kun k2 0 < μw ≤  2 2 1 _T Δx un Cqi,n kxn k + δw ðnÞ h i 2 ε + kun k2 0 < μq ≤  2 CT un

4.2.2.3

ð4:18Þ

ð4:19Þ

SAF-SNLMS Algorithm

In the paper [11], Liu et al. proposed SAF-SNLMS algorithm based on SAF-NLMS algorithm. The updating equation of qi,n+1 in the SAF-SNLMS algorithm can be formulated by the following constrained optimization problem: min jep ðnÞj = jdðnÞ - yðn + 1Þj = jdðnÞ - uTn Cqi,n + 1 j  2 subject to qi,n + 1 - qi,n  ≤ β2 qi,n + 1

ð4:20Þ

where ep ðnÞ is the posteriori error, β2 is chosen to be small to ensure the steady update of qi, n does not change drastically, jj is the absolute value operation, and kk denotes the Euclidean norm of a vector.

170

4

Spline Adaptive Filter

Then, using the Lagrange multiplier method, the cost function can be expressed by h i 2   J qi,n + 1 = jep ðnÞj + ρ0 qi,n + 1 - qi,n  - β2

ð4:21Þ

where ρ0 denotes the Lagrange multiplier. Setting the derivative of the cost function J(qi,n+1) with respect to qi,n+1 equal to zero, we have: qi,n + 1 = qi,n +

  1 T C un sgn ep ðnÞ 2ρ0

ð4:22Þ

where sgn[] is the sign function. Substituting (4.22) into the constraint condition in (4.20), we obtain: β 1  = 2ρ0 CT un 

ð4:23Þ

T Note that CT is a constant matrix andkCTunk ≤ kC kCTk is  Tk  kunk, where  T  T   defined as the spectral norm of matrix C , C ≔ sup C un =kun k . Thus, (4.23)

un ≠ 0

can be rewritten as: β0 1 ≥ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2ρ0 uTn un + ε0

ð4:24Þ

where β0 = β/kCTk and ε0 are small positive constants used for avoiding zero division. Considering the lower bound of 1/(2ρ0) in (4.24), the updating equation of qi,n can be derived as:   sgn ep ðnÞ T qi,n + 1 = qi,n + μq pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C un uTn un + ε0

ð4:25Þ

Similarly, the cost function associated with the weight vector of FIR filter wn can be formulated as: h i J ðwn + 1 Þ = jep ðnÞj + ρ0 kwn + 1 - wn k2 ‐β2 The updating equation of wn can expressed as:

ð4:26Þ

4.2

Spline Filter Model

171

  sgn ep ðnÞ 1 T wn + 1 = wn + μw pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u_ n Cqi,n + 1 xn uTn un + ε0 Δx

ð4:27Þ

We replace the posteriori error ep(n) by using the a priori error e(n) approximately. The updating equations can be rewritten as:

4.2.2.4

sgn ½eðnÞ qi,n + 1 = qi,n + μq pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi CT un uTn un + ε0

ð4:28Þ

sgn ½eðnÞ 1 T wn + 1 = wn + μw pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u_ n Cqi,n + 1 xn uTn un + ε0 Δx

ð4:29Þ

SAF-VSS-SNLMS Algorithm

In the paper [11], Liu et al. added variable step size scheme to SAF-NLMS algorithm and proposed SAF-VSS-SNLMS algorithm. In this work, the adjustments of the variable step sizes are controlled by the squared value of the impulsive-free error, i.e.: i h μw ðnÞ = αμw ðn - 1Þ + ð1 - αÞ min be20 ðnÞ, μw ðn - 1Þ h i μq ðnÞ = αμq ðn - 1Þ + ð1 - αÞ min be20 ðnÞ, μq ðn - 1Þ

ð4:30Þ ð4:31Þ

where α is the forgetting factor approaching one. be20 ðnÞ is the estimate of squared value of the impulsive-free error which can be obtained by: be20 ðnÞ = λbe20 ðn‐1Þ + c1 ð1 - λÞmedðγ n Þ

ð4:32Þ

where λ is another forgetting factor close to but smaller than one, c1 = 1.483 (1 + 5/(Nw - 1)) is a finite correction factor and Nw is the data window. γ n = [e2(n), e2(n - 1), ⋯, e2(n - Nw + 1)] and med() denotes the median operator. In the paper [11], the initial values of step sizes will be chosen as μw(0) = μq(0) = 0.05 for all the simulations. The SAF-VSS-SNLMS algorithm is as follows:

172

4.3 4.3.1

4

Spline Adaptive Filter

Robust Spline Filtering Algorithm SAF-MCC Algorithm

The nonlinear spline filter algorithm mentioned above achieves ideal performance in the Wiener system. However, when the desired signal is disturbed by non-Gaussian noise, especially in the presence of large outliers (observed values deviate significantly from a large amount of data), the performance of the above algorithm may deteriorate rapidly. The main reason is that the above algorithm is based on the mean square error (MSE) criterion, which only captures the second-order statistics of the data and relies heavily on the assumption of Gaussian distribution. However, in most practical situations, the Gaussian assumption does not hold [15]. Therefore, some scholars have proposed a new nonlinear adaptive filter, called the nonlinear spline adaptive filter under the maximum correntropy criterion

4.3

Robust Spline Filtering Algorithm

173

(SAF-MCC) [15]. Instead of using MSE, we use correntropy as a cost function to identify spline adaptive filters. Correntropy is a nonlinear similarity measure between two signals. The MCC aims at maximizing the similarity (measured by correntropy) between the model output and the desired response such that the adaptive model is as close as possible to the unknown system [16]. Given two random variables X and Y, the correntropy is [17]: Z V ðX, Y Þ = E ½κðX, Y Þ =

κðX, Y Þf XY ðx, yÞdxdy

ð4:33Þ

where E[•] denotes the expectation operator, κ(, ) is a shift-invariant Mercer kernel. The most widely used kernel in correntropy is the Gaussian kernel, given by: 1 κ σ ðx, yÞ = pffiffiffiffiffi exp σ 2π

-

e2 2σ 2

ð4:34Þ

where e = x - y and σ stands for the kernel bandwidth. In practical situations, the join distribution of X and Y is usually unknown, and only a finite number of data fðxðiÞ, yðiÞÞgNi= 1 are available. In these cases, we can use a sample mean estimator of the correntropy: K X b K,σ ðX, Y Þ = 1 V κ ðxðiÞ, yðiÞÞ K i=1 σ

ð4:35Þ

Correntropy can be used as a cost function for adaptive systems training. For example, under the maximum correntropy criterion (MCC), an adaptive model can be learned by maximizing the correntropy between the desired response and the model output. The optimization criterion in MCC training is: b K,σ ðX, Y Þ = max J MCC = max V

K 1 X κ ð eð i Þ Þ K i=1 σ

ð4:36Þ

where e(i) = x(i) - y(i) is an error sample. We can calculate the sensitivity of the MCC cost with respect to the error e(i), by taking the derivative of equation (4.36) (with Gaussian kernel and K = 1): b K,σ ðX, Y Þ ∂V ∂J MCC 1 = = - pffiffiffiffiffi  exp 3 ∂eðiÞ ∂eðiÞ σ 2π



- ðeðiÞÞ2 eðiÞ 2σ 2

ð4:37Þ

Derivative curves with different kernel widths are shown in Fig. 4.4. We can see the relationship between the derivative and the error. When the magnitude of the error is very large, the derivative becomes very small, especially if the kernel width

174

4

Spline Adaptive Filter

Fig. 4.4 Derivative curves of JMCC with respect to e(i) for different kernel widths

is smaller. Therefore, MCC training will be insensitive (hence robust) to impulsive noise, which usually causes large error. Using MCC instead of MSE as the cost function, the following cost function can be obtained:   1 J wn , qi,n = K

n X

1 1 κ σ ðdðjÞ, yðjÞÞ = pffiffiffiffiffi σ 2π K j=n-K +1

n X

exp

j=n-K +1

- e2 ð j Þ 2σ 2



ð4:38Þ where e( j) = d( j) - y( j). To obtain the optimal weight vector, we can use a gradient-based approach to maximize the above cost. For online adaptation, an instantaneous correntropy (with K = 1) can be used to derive a random gradient:  

∂J wn , qi,n - ðeðnÞÞ2 ∂φi ðuÞ ∂u ∂sðnÞ e ð nÞ = pffiffiffiffiffi  exp  2σ 2 ∂u ∂sðnÞ ∂wn ∂wn σ 3 2π

2 - ðeðnÞÞ e ð nÞ  ∂φ0i ðuÞxn = pffiffiffiffiffi  exp 3 2σ 2 σ 2π

ð4:39Þ

4.3

Robust Spline Filtering Algorithm

175

 

∂J wn , qi,n - ðeðnÞÞ2 1  eðnÞφi 0 ðuÞxn wn + 1 = wn + μw = w n + μw exp Δx 2σ 2 ∂wn ð4:40Þ Similarly, for the control points, we derive:  

∂J wn , qi,n - ðeðnÞÞ2 ∂φi ðuÞ e ð nÞ p ffiffiffiffiffi  =  exp 2σ 2 ∂qi,n ∂qi,n σ 3 2π

2 - ðeðnÞÞ e ð nÞ  CT u = pffiffiffiffiffi  exp 3 2σ 2 σ 2π  

∂J wn , qi,n - ðeðnÞÞ2  eðnÞCT u = qi,n + μq exp qi,n + 1 = qi,n + μq 2σ 2 ∂qi,n

ð4:41Þ

ð4:42Þ

SAF-MCC algorithm was used as reference for the above update equations. The MCC has very strong robustness to impulse noise, especially impulse non-Gaussian noise.in the presence of impulsive non-Gaussian noises. And the simulation results of the above algorithm will appear in 4.5.2.

4.3.2

Performance Analysis

Although the MCC algorithm may achieve an excellent performance in non-Gaussian environments, its performance analysis is so far still not addressed [18]. Therefore, we will introduce the steady-state performance of the spline nonlinear filter under the MCC criterion in detail in this section. For the spline nonlinear filter under the MCC criterion, we use energy conservation arguments [19] to analyze its performance. Next, we will introduce the theoretical mean square error (EMSE) for Gaussian noise and non-Gaussian noise. Through theoretical performance analysis, the steady-state performance of the system can be accurately predicted [20]. The steady-state excess mean square error (EMSE) is a significant measure of performance [21], which is defined as: ς = MSE - σ 2v

ð4:43Þ

where MSE denotes the mean square error. Figure 4.5 depicts the structure of SAF-MCC for performance analysis, which includes the adaptive part and the unknown spline parameters to be estimated (denoted with subscript 0). The a priori error of the whole system is defined as ε(n) = d(n) - y(n). When the spline control points are fixed, the a priori error for the linear filter is denoted with εw. In contrast, when the linear filter is fixed, the a priori error for the control points is denoted i-th

176

4

Spline Adaptive Filter

Fig. 4.5 SAF-MCC model for performance analysis

εq. The wo and qo,i are the optimal solution of linear filter weights and spline control points for the spline nonlinear system, respectively. In the steady state, the mean square values of the weight vector and the spline control points are: lim = Efwn g = wo

ð4:44aÞ

lim = E qi,n = qi,o

ð4:44bÞ

n→1 n→1

According to Fig. 4.5, the prior error is: ε ð nÞ = = = = =

d ð nÞ - y ð nÞ φo ðlðnÞÞ - φðsðnÞÞ φi ðul Þ - φi ðus Þ uTl Cqi,o - uTs Cqi,o ε w ð nÞ + εq ð nÞ

ð4:45Þ

where ul = [u3l, u2l, ul, 1]T and us = [u3s, u2s, us, 1]T, with ul and us denoting the local variable u for l(n) and s(n), respectively, and having the following relationship:

4.3

Robust Spline Filtering Algorithm

177

    l ð nÞ lðnÞ s ð nÞ s ð nÞ ul - us = + Δx Δx Δx Δx l ð nÞ s ð n Þ = Δx Δx ðwo - wn ÞT xðnÞ ≈ Δx ðwÞ T v x ð nÞ = n Δx

ð4:46Þ

where vðnwÞ = wo - wn denotes the weight error vector. The spline function φi(n) in (4.4) can be rewritten as: φi ðuÞ = u3 c1 qi,n + u2 c2 qi,n + uc3 qi,n + c4 qi,n ≈ uc3 qi,n + c4 qi,n

ð4:47Þ

where Ck denotes the k-th row of matrix C. And the a priori error εw(n) can be expressed as:   εw ðnÞ = uTl - uTs Cqi,o = uTl Cqi,o - uTs Cqi,o = ðul - us Þc3 qi,o c3 qi,o ðwÞT = xðnÞ v Δx n

ð4:48Þ

  εq ðnÞ = uTs C qi,o - qi,n = uTs CvðnqÞ = vðnqÞT CT us

ð4:49Þ

Similarly:

For convenience and simplification of analysis, the following assumptions are necessary: a1. The noise {v(n)} is independent, zero mean, and is independent of x(n), s(n), εq(n), εw(n), and ε(n). a2. The priori error ε(n) and the error e(n) are independent of φi(u), φ0i ðuÞ, kx(n)k2, and kCTuk2.  

a3. At noise state, the update term eðnÞ exp asymptotically uncorrelated with φi(u),

φ0i ðuÞ,

e2 ðnÞ 2σ 2

in (4.45) and (4.47) is

kx(n)k , and kCTuk2.

(A) Steady-state performance for the linear filter Subtracting wo from both sides of (4.11) yields:

2

178

4 ðwÞ vn + 1

= vðnwÞ

φ0 ð uÞ - μw i eðnÞ exp Δx



Spline Adaptive Filter

e 2 ð nÞ xðnÞ 2σ 2

ð4:50Þ

Take the norm on both sides of the above formula and take the expectation: 

 0   n 2 o φ ðuÞ e 2 ð nÞ  ðwÞ 2 E vn + 1  = E vðnwÞ  - 2μw E i ε eðnÞ exp ð n Þ w c3 qi,o 2σ 2 ( ) 

 2 φ0 2 ð u Þ e2 ð nÞ + μ2w E i 2 eðnÞ exp k x ð nÞ k 2 Δx 2σ 2 ð4:51Þ    n 2 o  ðwÞ 2 Assuming the spline adaptive filter is stable and E vn + 1  = E vðnwÞ  when n → 1. The formula (4.46) can be expressed as:

 φ0i ðuÞ e2 ðnÞ ε w ð nÞ eðnÞ exp 2Δx μw E c3 qi,o 2σ 2 ( ) 

2 2 ð n Þ e 2 2 = μw 2 E φ0i ðuÞ eðnÞ exp k x ð nÞ k 2σ 2 

2

ð4:52Þ

Applying assumptions A2–A3, the expectation on the left side of equation (4.51) can be expressed as:

 φ0i ðuÞ e2 ð nÞ E ε w ð nÞ eðnÞ exp c3 qi,n 2σ 2  0  

 φi ðuÞ e2 ðnÞ E eðnÞ exp ε w ð nÞ =E c3 qi,n 2σ 2 

ð4:53Þ

For the expectation on the right side of equation (4.52), we have to consider the following two situations:   2 1. Case 1:v(n) is Gaussian: Let gðeðnÞÞ = eðnÞ exp - e2σðn2Þ : Assuming that εq(n) ≈ 0, at the steady state for n → 1, the second expectation on the right of (4.48) can be calculated as:

4.3

Robust Spline Filtering Algorithm



179

 e 2 ð nÞ εw ðnÞ E eðnÞ exp 2σ 2 = E fgðeðnÞÞεw ðnÞg = E fgðεw ðnÞ + vðnÞÞεw ðnÞg

= E fg0 ðeðnÞÞgE ε2w ðnÞ

Z1





E ε2w ðnÞ e2 ðnÞ e 2 ð nÞ e2 ð nÞ pffiffiffiffiffi = × 1- exp deðnÞ exp 2σ 2 2σ 2 2σ 2e σ e 2π =E





ε2w ðnÞ

-1





σ3

3 σ 2 + E ε2w ðnÞ + σ 2v 2 ð4:54Þ

where σ e2 denotes the variance of the error e(n), which is expressed as σ e 2 = E ε2w ðnÞ + σ v 2 . Applying assumptions A2–A3, the expectation on the right of the (4.52) can be calculated as: ( E

) 2 e 2 ð nÞ 2 eðnÞ exp k x ð nÞ k 2σ 2 n o σ 3 E ε2 ðnÞ + σ 2  2 w v 02 = E φi ðuÞkxðnÞk 

3 2E ε2w ðnÞ + 2σ 2v + σ 2 2

2 φ0i ðuÞ





ð4:55Þ

Substituting (4.53) and (4.55) into (4.52) yields: 

 φ0i ðuÞ 2 σ3 E ε w ð nÞ 

3 c3 qi,n E ε2w ðnÞ + σ 2v + σ 2 2 n o σ 3  E ε 2 ð nÞ + σ 2  2 2 w v 2 02 = μw E φi ðuÞ kxðnÞk 

3 2E ε2w ðnÞ + 2σ 2v + σ 2 2

2Δx2 μw E

ð4:56Þ

Rearranging (4.56) and letting n → 1, we have: n o μw E φ0i 2 ðuÞ2 kxðnÞk2 n 0 o E ε2w ð1Þ = φ ðuÞ 2E c3iq Δx2 i,n  2  3



E εw ð1Þ + σ 2v σ 2 + E ε2w ð1Þ + σ 2v 2 ×

 3 2E ε2w ð1Þ + 2σ 2v + σ 2 2

ð4:57Þ

The steady-state EMSE can be obtained by solving the fixed-point equation (4.57).

180

4

Spline Adaptive Filter

2. Case 2: v(n) is non-Gaussian: In this case, we use the Taylor series expansion of the function to calculate the EMSE of the algorithm. According to (4.40), we have: 

 n o

φ0i ðuÞ 2 E fgðeðnÞÞεw ðnÞg = μ2w E φ0i ðuÞkxðnÞk2 E g2 ðeðnÞÞ ð4:58Þ 2Δx μw E c3 qi,n 2

We approximate g(e(n)) using a second-order Taylor series approximation of the function g(e(n)) around v(n) as: gðeðnÞÞ = gðεw ðnÞ + vðnÞÞ ≈ gðvðnÞÞ + g0 ðvðnÞÞεw ðnÞ +

1 00 g ðvðnÞÞε2w ðnÞ 2

ð4:59Þ

Omit higher-order terms:

v2 ð nÞ v 2 ð nÞ 1 2σ 2 2σ 2



v2 ðnÞ v3 ðnÞ 3vðnÞ g00 ðvðnÞÞ = exp 2σ 2 σ4 2σ 2 g0 ðvðnÞÞ = exp



-

ð4:60Þ ð4:61Þ

Using (4.60) and (4.61) and assumptions A1-A3, two of the expectations in (4.58) can be approximated by:

E fgðeðnÞÞεw ðnÞg ≈ E gðvðnÞÞεw ð nÞ + g0 ð vðnÞÞε2w ðnÞ ≈ E fg0 ðvðnÞÞgE ε2w ðnÞ



E g2 ðeðnÞÞ = E g2 ðεw ðnÞ + vðnnÞÞ o

≈ E g2 ðvðnÞÞ + E jg0 ðvðnÞÞj2 E ε2w ðnÞ



+ E jg0 ðvðnÞÞg0 0 ðvðnÞÞj E ε2w ðnÞ

ð4:62Þ

ð4:63Þ

Finally, substituting (4.62) and (4.63) into (4.58) yields (4.64):

Efφ0 2i ðuÞkxðnÞk2 g × E exp ð - v2 ðnÞ=σ 2 Þv2 ðnÞ μ w 2

Δx2 E fφ0i ðuÞ=c3 qi,n g

E εw ð1Þ = 2E exp ð - v2 ðnÞ=σ 2 Þv2 ðnÞð1 - v2 ðnÞ=σ 2 Þ n o

- μw E φ0 2i ðuÞkxðnÞk2 =Δx2 E φ0i ðuÞ=c3 qi,n ) ( exp ð - v2 ðnÞ=σ 2 Þv2 ðnÞ × ×E ð1 + 2v4 ðnÞ=σ 2 - 5v2 ðnÞ=σ 2 Þ

ð4:64Þ

4.3

Robust Spline Filtering Algorithm

181

(B) Steady-state performance for the spline control points Subtracting qi, o from both sides of (4.12) yields: ð qÞ vn + 1

= vðnqÞ

- μq eðnÞ exp

e 2 ð nÞ T C u 2σ 2

ð4:65Þ

Similarly, we can take the norm on both sides of the above formula and take the expectation. Using A2–A3 and assuming that the spline adaptive filter  assumptions   n 2 o  ðwÞ 2 is stable and E v holds at the steady state for n → 1,  = E vðwÞ  n+1

n

we have: 

 e 2 ð nÞ ε q ð nÞ 2μq E eðnÞ exp 2σ 2 (

2 ) n 2  T 2 o ð n Þ e C u = μq 2 E eðnÞ exp E 2σ 2

ð4:66Þ

n o To obtain E ε2q ðnÞ , we consider the following two cases: 1. Case 1: v(n) is Gaussian: Using a similar approach to deriving (4.55) and assuming that εw[n] ≈ 0, the expectation on the lift of (4.66) can be expressed as: 

 n o e 2 ð nÞ ε ð n Þ = E ε2q ðnÞ lim  E eðnÞ exp q 2 n→1 2σ

σ2

+E

n

σ3 ε2q ðnÞ

o

+ σ 2v

32 ð4:67Þ

The first expectation on the right of (4.66) is given by: ( E

eðnÞ exp

-

e ð nÞ 2σ 2 2

2 )

o  n  σ 3 E ε2q ðnÞ + σ 2v = n o 32 2E ε2q ðnÞ + 2σ 2v + σ 2

ð4:68Þ

n o Substituting (4.67) and (4.68) into (4.66), and letting n → 1 in E ε2q ðnÞ yields:

182

4

Spline Adaptive Filter

n o n 2 o E ε2q ð1Þ = μq E CT u o n o  n  32 E ε2q ð1Þ + σ 2v σ 2 + E ε2q ð1Þ + σ 2v × o  n 32 2 2E ε2q ð1Þ + 2σ 2v + σ 2

ð4:69Þ

The steady-state EMSE can be obtained by solving the fixed-point equation (4.69). 2. Case 2: v(n) is non-Gaussian: According to (4.53), we have: n 2 o

2E gðeðnÞÞεq ðnÞ = μq ðnÞE CT u E g2 ðeðnÞÞ

ð4:70Þ

Using a similar method to the EMSE calculation for the linear filter, we obtain the EMSE for the spline control points as given by (4.70):   n  o exp ð - v2 ðnÞ=σ 2 Þ T 2  ×E μq E C u n o × v 2 ð nÞ

E ε2q ð1Þ = 2 2 2 2E exp ð - v ðnÞ=σ Þv ðnÞð1 - v2 ðnÞ=σ 2 Þ ( ) n exp ð - v2 ðnÞ=σ 2 Þv2 ðnÞ 2 o T - μq E C u E × ð1 + 2v4 ðnÞ=σ 2 - 5v2 ðnÞ=σ 2 Þ

ð4:71Þ

(C) Steady state of EMSE of the whole SAF-MCC Using (4.43), (4.44a), and (4.44b), the EMSE can be calculated as:

ς = En ε2 ð1Þ  2 o = E εw ð nÞ + εq ð 1 Þ n o



= E ε2w ð1Þ + E ε2q ð1Þ + 2E εw ðnÞεq ð1Þ

ð4:72Þ

where E{εw(1)εq(1)} denotes the cross-EMSE. Since at steady-state ul ≈ us and qi,o ≈ qi,n, using (4.46) and (4.45), we see that E{εw(1)εq(1)} can be omitted. Therefore, the EMSE of the whole SAF-MCC algorithm is given by: n o

ς ≈ E ε2w ð1Þ + E ε2q ð1Þ

ð4:73Þ

4.4

Applications

4.4

183

Applications

With the development of spline filter, its application has been more and more extensive. In this section, we will introduce the applications based on spline filters, namely, active noise control and echo cancellation.

4.4.1

Active Noise Control Based on Spline Filter

Active noise control (ANC), which is based on the destructive superposition of sound waves, has successfully achieved enormous significance to reduce unwanted noise [22]. However, when the impulsive noise occurs in the original ANC systems, the performance of the ANC system would degrade [22]. In an endeavor to improve the noise cancellation achieved in a nonlinear ANC system, Vinal Patel et al. first propose a spline adaptive filter-based nonlinear ANC system [24]. On the basis of [24], a filtered-c generalized maximum correntropy criterion based on the nonlinear spline adaptive filtering. The GMCC criterion’s kernel function can make it less sensitive to abnormal data, so the FcGMCC algorithm outperforms the filtered-c least mean square (FcLMS) algorithm in the impulsive noise environment [25].

4.4.1.1

FcGMCC Algorithm

Figure 4.6 depicts the schematic diagram of the nonlinear ANC system based on the spline adaptive filter. The residual noise sensed by the error microphone is as follows: eðnÞ = d ðnÞ - yðnÞ  sN ðnÞ

ð4:74Þ

where  represents the linear convolution operator and sN(n) is the impulse response of the secondary path. To efficiently provide a robust solution for active impulsive noise control, the novel cost function based on the generalized maximum correntropy criterion is proposed. The generalized maximum correntropy criterion (GMCC) has been proven to be extremely efficient in dealing with mutations. The cost function based on GMCC criterion is as follows: J = Eα,β ½GðeðnÞÞ = γ α,β E ½ exp ð - λjeðnÞjα Þ where γ α,β is the normalization constant, and its definition is as follows:

ð4:75Þ

184

4

Spline Adaptive Filter

Fig. 4.6 Diagram of ANC system for GMCC [27]

γ α,β = α=ð2β  Γð1=αÞÞ

ð4:76Þ

where Γ() represents the gamma function, α is the shape parameter greater than 0, β is the scale coefficient, and λ = 1/βα represents the kernel parameter. In (4.75), when α = 2, the GMCC algorithm will degenerate into the MCC algorithm. From the gradient descent method, the weight update formula can be obtained as follows:   ∂J wn , qi,n ∂wn ∂½yðnÞ  sN ðnÞ = w n + μ w f ð e ð nÞ Þ  ∂wn i h 1 T = wn + μw f ðeðnÞÞ  u_ Cqi x ð nÞ  s N ð nÞ Δx = wn + μw f ðeðnÞÞ  ½xnt  sN ðnÞ = w n + μ w f ð e ð nÞ Þ  x0 ð nÞ

wn + 1 = wn - μw

ð4:77Þ

1 xðnÞ, and where f(e(n)) = αλγ α,β exp (-λje(n)jα)je(n)jα-1 sign (e(n)), xnt = u_ T Cqi Δx ′ b x (n) are the signal generated by xnt through the estimated value SðzÞ of the secondary channel S(z). Similarly, the weight update for control points is expressed as:

4.4

Applications

185

qi,n + 1 = qi,n - μq

∂J ðnÞ ∂qi,n

∂½yðnÞ  sN ðnÞ ∂qi,n   = qi,n + μq f ðeðnÞÞ  CT u  sN ðnÞ   = qi,n + μq f ðeðnÞÞ  CT u0 = qi,n + μq f ðeðnÞÞ 

ð4:78Þ

where u′ is the u  sN(n) and μq is the learning rate for updating the control points. Equations (4.77) and (4.78) are the update formulas of the novel robust filtered-c generalized maximum correntropy criterion (FcGMCC) algorithm.

4.4.1.2

Convergence Analysis

In this subsection, we provide the simple convergence analysis of FcLMGM algorithm. The error signal can be calculated by using the Taylor series expansion as follows: e ð n + 1Þ = e ð nÞ +

∂eðnÞ Δwn + η ∂wTn

ð4:79Þ

where η is the higher-order terms in the Taylor series expansion and Δwn = wn + 1 wn. Using (4.74) and (4.79), we have: ∂eðnÞ ∂y ∂u ∂α T =  sN ðnÞ = - x0 ðnÞ ∂u ∂α ∂wTn ∂wTn

ð4:80Þ

Δwn = μw f ðeðnÞÞx0 ðnÞ

ð4:81Þ

Δwn = μw f ðeðnÞÞx0 ðnÞ

ð4:82Þ

eðn + 1Þ = eðnÞ - μw kx0 ðnÞk f ðeðnÞÞ

ð4:83Þ

And

Using (4.81), we have:

2

Substituting (4.79) and (4.82) into (4.78) yields, we can get: In order to ensure the convergence of the algorithm, we have: jeðn + 1Þj ≤ jeðnÞ - μw kx0 ðnÞk f ðeðnÞÞj 2

So we have:

ð4:84Þ

186

4

Spline Adaptive Filter

j1 - αλγ α,β μw kx0 ðnÞk exp ð - λjeðnÞjα ÞjeðnÞjα - 2 j ≤ 1 2

ð4:85Þ

According to (4.86), we can get: 0 < μw ≤

2 αλγ α,β kx0 ðnÞk2 exp ð - λjeðnÞjα ÞjeðnÞjα - 2

ð4:86Þ

Similarly, we can get the range of μq, which makes the algorithm converge as follows: 0 < μq ≤

4.4.2

αλγ α,β

kCu0 k2

2 exp ð - λjeðnÞjα ÞjeðnÞjα - 2

ð4:87Þ

Echo Cancellation Based on Spline Filter

In recent years, there has been increasing interest in acoustic echo cancellation (AEC) issues due to teleconferencing and hands-free telephone systems. Unfortunately, the well-known linear algorithms proposed in the literature are too unrealistic in many practical situations because their performance is limited by the existence of nonlinearity. Therefore, some scholars have applied spline filter and proposed echo cancellation based on spline filter. This section will introduce echo cancellation based on Hammerstein filter and Wiener filter, respectively, and conduct simulation comparison of their performance [28]. The simulation results are shown in Sect. 4.5.5.

4.4.2.1

The Nonlinear Echo Canceler

An acoustic echo canceler is an adaptive system designed to reduce the echo generated by the sound produced by a speaker that can be picked up by microphones in the same room. The difficulty with acoustic echo cancellation is that the surrounding space changes the original sound, making the sound that reenters the microphone colorful. In addition, due to low-cost audio equipment, some nonlinear distortion can occur in the signal. In experiments, we simulate the effects of amplifier and speaker distortion by applying a nonlinear function (NL) to the input signal. The influence of environment is described by room impulse response (RIR)h(n) model. This nonlinear AEC (NAEC) model is referred to in the literature as the Hammerstein system, as shown in Fig. 4.7a. A nonlinear distortion model of amplifier and loudspeaker is presented by cascade of nonlinear function and RIR h(n). The latter model of NAEC is called the Wiener system, as shown in Fig. 4.7c).

4.4

Applications

187

Fig. 4.7 Two different implementations of a distorting echo path (b): (a) cascade of a nonlinear function (NL) and a room impulse response (RIR) (Hammerstein system) and (c) cascade of a RIR and a NL (Wiener system) [28]

Fig. 4.8 The Hammerstein NAEC architecture

The system is depicted in Fig. 4.8, where x(n) is the excitation signal, d(n) is the reference signal, and b(n) is a local background noise. In the Hammerstein NAEC architecture, s(n) = f(x(n)) is the distorted signal, y(n) = wTs is the output of the adaptive filter, that is the estimate of distorted echo signal, where w = [w1, w2, . . ., wN] are the N coefficients of the adaptive filter and s = [s(n), s(n-1), . . ., s(n-N +1)], while e(n) = d(n)-y(n) is the error signal. As shown in Fig. 4.8, the architecture consists of a cascade of nonlinear functions and a linear adaptive filter, the adaptive rules of which are described in the next section: Alternatively Fig. 4.9 shows the Wiener counterpart. In this new scheme, s(n) = wTx is the output of the linear adaptive filter, while y(n) = f(s(n)) is the distorted output signal. w = [w1, w2, . . ., wN] are the N coefficients of the adaptive

188

4

Spline Adaptive Filter

Fig. 4.9 The proposed Wiener NAEC architecture

filter and x = [x(n), x(n-1), . . ., x(n-N+1)], while e(n) = d(n)-y(n) is the error signal. In addition an additive environmental noise b(n) can be considered.

4.4.2.2

The Architectures Proposed in [28]

In this experiment, we use the basic least mean square (LMS) algorithm. Thus the cost function adopted is J(n) = je(n)j2 = jd(n)-y(n)j2. (A) Hammerstein System The Hammerstein model consists of a cascade of a static nonlinear function followed by an LTI filter, known as a nonlinear-linear (NL) model. Therefore, its updated rules are similar to those of the SAF-LMS algorithm based on Wiener filter in Sect. 4.2.2.1. Its updating formula is as follows: wn + 1 = wn + μw eðnÞsn

ð4:88Þ

qn + 1 = qn + μq eðnÞCT Ui,n wn

ð4:89Þ

where Ui,n 2 ℝ4×N = [ui,n, ui,n-1, . . ., ui, vectors ui,n-k

n-N+1]

is a matrix which collects N past

(B) Wiener System The update strategy of Wiener filter is shown in Sect. 4.2.2.1. Its updating formula is as follows:

4.5

Computer Simulation Examples

4.5

189

wn + 1 = wn + ηw eðnÞφi 0 ðuÞxn

ð4:90Þ

qn + 1 = qn + ηq eðnÞCT u

ð4:91Þ

Computer Simulation Examples

System identification is a mathematical model describing system behavior based on the time function of input and output. Its purpose is to estimate the model parameters inherent in the system by using adaptive filtering algorithm to get the desired output, as shown in Fig 4.5. In this Sects. 4.5.1, 4.5.2, and 4.5.3 are all simulated on the basis of system identification.

4.5.1

Basic Spline Filter Algorithm Simulation

In this section, we will introduce the performance comparison of the algorithms in Sect. 4.2.2 in different environments based on the Matlab simulation platform [5]. All simulations are based on system identification. We evaluate the performance of the above algorithms in the context of Wiener-type system identification. All the following results are obtained by averaging over 100 Monte Carlo trials. The performance is measured by the use of mean square error (MSE) defined as 2 10log 10[e(n)] . The input signal is generated by the process xðnÞ = ωxðn - 1Þ + pffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 - ω aðnÞ, where a(n) is the white Gaussian noise signal with zero-mean and unitary variance, and the parameter ω is selected in the range [0, 0.95], which can be interpreted as the degree of correlation for the adjacent samples. Real speech inputs are also applied. The FIR filter coefficients for the SAF are initialized as w-1 = [1, 0,. . ., 0] with length N = 5, while the spline model is initially set to a straight line with a unitary slope. For convenience, only B-spline basis is applied in the simulations; however, similar results can also be achieved using the CR-spline basis. The unknown Wiener spline model comprises an FIR filter wo=[0.6, -0.4, 0.25, -0.15, 0.1]T and a nonlinear spline function represented by a LUT qo with 23 control points, and Δx is set to 0.2, and qo is defined by qo = [-2.2,. . ., -0.8, -0.91, -0.4, -0.2, 0.05, 0.0, -0.4, 0.58, 1.0, 1.0, 1.2. . ., 2.2]. An independent White Gaussian background noise, v(n), is added to the output of the unknown system, with   30 dB signal to noise ratio (SNR), which is defined as dðnÞ. The impulsive SNR = 10 log 10 σ 2d =σ 2v , where σ 2d is the variance of noise-free e noise is considered as the contaminated Gaussian (CG) impulse or the symmetric α ‐ S noise. For the symmetric α ‐ S noise, its fractional-order  signal-to-noise a0 a0 e ratio (FSNR) can be defined as FSNR = 10 log 10 jdðnÞj =jη0 ðnÞj , where η0(n) denotes the symmetric α ‐ S noise, 0 < a0 < α0, α0 is the characteristic exponent of

190

4

Spline Adaptive Filter

Fig. 4.10 The variation of the step sizes for white Gaussian input in the absence of impulsive noise (SNR = 30 dB) [11]

the symmetric α ‐ S noise, α0 is set to 0.8, and a0 is selected to be 0.7 in simulations. The values of other parameters can be set as follows: μw = μq = 0.01, ε = 0.001, ε0 = 0.001, α = λ = 0.99, Nw = 11, μw(0) = μq(0) = 0.05, and be2o ð0Þ = σ 2x , where σ 2x denotes the variance of the input. Figure 4.10 shows the variation of the step sizes of the SAF-VSS-SNLMS for white Gaussian input, and the step size in the beginning is higher, which leads to faster convergence rate, and when the filter approaches its steady state, the step sizes become lower to ensure the small error. Figures 4.11 and 4.12 show the MSE learning curves of the SAF-LMS [3], SAF-NLMS [14], SAF-SNLMS [11], and SAF-VSS-SNLMS [11] in the absence of impulsive noise. The input signal is the white Gaussian sequence (ω is set to zero) in Fig. 4.11, and colored input (ω is set to 0.9) is used in Fig. 4.12. It clearly can be seen that the SAF-SNLMS algorithm suffers from the steadystate performance deterioration due to the sign operation of the error. However, the SAF-VSS-SNLMS nearly gets the steady-state performance comparable to that of the SAF-LMS and SAF-NLMS algorithms; besides, it obtains a better tracking ability than these two algorithms because of the variable step-size scheme. From the small figure on the top left corner of Fig. 4.11, we also can see that the SAF-VSSSNLMS obtains the fastest convergence rate in the beginning (about 1000 samples in

4.5

Computer Simulation Examples

191

Fig. 4.11 MSE curves for white Gaussian input in the absence of impulsive noise (SNR = 30 dB) [11]

Fig. 4.12 MSE curves for colored input in the absence of impulsive noise (SNR = 30 dB) [11]

192

4

Spline Adaptive Filter

Fig. 4.13 MSE curves for colored input in CG impulsive noise (SNR = 30 dB, t = 100,000, p = 0.01) [11]

the initial phase of filtering) of adaptation. Figure 4.10 shows the variation of the step sizes of the SAF-VSS-SNLMS for white Gaussian input, and the step sizes in the beginning are higher which lead to faster convergence rate, and when the filter approaches its steady state, the step sizes become lower to ensure the small error. Figures 4.13 and 4.14 indicate the learning curves of four algorithms in the case of CG impulsive noise; the input is the colored signal, and ω is set to 0.9. It is clearly in this case that the SAF-SNLMS algorithms outperform the other cited algorithms, obtaining the lower steady-state MSE and better tracking ability. In addition, the SAF-VSS-SNLMS achieves the best performance. Figures 4.15, 4.16, and 4.17 show the MSE learning curves of four algorithms in the symmetric α ‐ S noise environment at different FSNR. The other simulation parameters are the same of Fig. 4.13. As can be seen in cases of 0 dB in Fig. 4.15 and 20 dB FSNR in Fig. 4.16, the SAF-SNLMS algorithm does not acquire the satisfactory steady-state performance. However, due to the variable step-size solution, the SAF-VSS-SNLMS provides good tracking and steady-state performances. At high FSNR (-5 dB) in Fig. 4.17, the SAF-LMS and SAF-NLMS fail to track the unknown nonlinear system, but the SAF-SNLMS algorithms have the robust performance against the impulsive noise. Figure 4.19 shows the MSE learning curves of four algorithms in case of speech signal input which is shown in Fig. 4.18. The other simulation parameters are the

4.5

Computer Simulation Examples

193

Fig. 4.14 MSE curves for colored input in CG impulsive noise (SNR = 30 dB, t = 10,000, p = 0.1) [11]

Fig. 4.15 MSE curves for colored input in symmetric α ‐ S noise (SNR = 30 dB, symmetric α ‐ S noise FSNR = 0 dB) [11]

194

4

Spline Adaptive Filter

Fig. 4.16 MSE curves for colored input in symmetric α ‐ S noise (SNR = 30 dB, symmetric α ‐ S noise FSNR = 20 dB) [11]

Fig. 4.17 MSE curves for colored input in symmetric α ‐ S noise (SNR = 30 dB. Symmetric α ‐ S noise FSNR = -5 dB) [11]

4.5

Computer Simulation Examples

195

Fig. 4.18 Speech signal [11]

same with in Fig. 4.12. The impulsive noise is the CG noise. From Fig. 4.19, SAF-SNLMS algorithms perform better than other cited algorithms which demonstrate the effectiveness to the speech signal input.

4.5.2

SAF-MCC Algorithm Simulation

In this section, simulation results are presented to illustrate the performance of the SAF-MCC algorithm in Sect. 4.3.1. In the experiment, we identify an unknown Wiener system consisting of a linear component with a parameter vector w = [0.2, -0.1, 0.25, -0.15] and a nonlinear spline function implemented by a 21-point length LUT qo with an interval sampling Δx = 0.12, given by qo = [-1.20, -1.08, -0.96, -0.84, -0.72, . . ., -0.36, -0.24, -0.12, 0.00, 0.12, 0.24, 0.36, . . ., 0.72, 0.84, 0.96, 1.08, 1.20]. And CR spline is the only one considered in the paper and whose basis matrix CCR is the following matrix:

196

4

Spline Adaptive Filter

Fig. 4.19 MSE curves for speech input in CG impulsive noise (SNR = 30 dB, t = 100,000, p = 0.01) [11]

0

-1 B 2 1B C CR = B 2@ -1 0

3 -5

-3 4

0

1

2

0

1 1 -1C C C 0 A 0

The input signal x(n) is generated by the following equation: pffiffiffiffiffiffiffiffiffiffiffiffiffi xðnÞ = axðn - 1Þ + 1 - a2 ξðnÞ

ð4:92Þ

where ξ(n) is a zero mean white Gaussian noise with unitary variance and α is a parameter, which is used to determine the level of correlation between adjacent samples. In this simulation, we employ the alpha-stable distribution to generate the disturbance noise at the desired output. The characteristic function of the alphastable distribution is as follows: f ðt Þ = exp fjδt - γjtjα ½1 + jβ sgn ðt ÞSðt, αÞg where

ð4:93Þ

4.5

Computer Simulation Examples

8 απ > < tan 2 Sðt, αÞ = >2 : log jtj π

197

ifα ≠ 1 ifα = 1

ð4:94Þ

where a is the characteristic exponent, which is also called the stability index and satisfies a 2 (0, 2], and β 2 [-1, 1] is the symmetry parameter, γ > 0 is the dispersion parameter, and δ 2 ℝ is the location parameter. When β = 0, the alpha-stable distribution is called a symmetric alpha-stable (SαS) distribution. We set the parameters as α = 1.6, β = 0, γ = 0.05, δ = 0. The filter length is 4, and the input vector is x(n) = [x(n-1), x(n-2), ..., x(n-4)]. The simulation results are averaged over 50 independent Monte Carlo runs. A segment of 8000 samples is used as the training data and another 100 samples as testing data. The Gaussian kernel is used in MCC, and the kernel width is set at σ = 0.1 Figure 4.20 demonstrates the performance comparison between SAF-LMS and SAF-MCC. The learning rates are set at μw = 0.02, μq = 0.01, and μw = 0.14, μq = 0.05 for SAF-LMS and SAF-MCC, respectively. It is obvious that SAF-MCC performs better than the original SAF-LMS, with faster convergence rate and smaller test error. We also investigate the convergence curves of the algorithm with different interval sampling △x = [0.08, 0.12, 0.14, 0.16] and different kernel width σ = [0.05, 0.10, 0.50, 1.00]. The step sizes are chosen such that all the algorithms have the same initial convergence rate. The performances are shown in Figs. 4.21 and 4.22, respectively. From the simulation results, we can see that, when Δx = 0.12, the convergence performance will be the best. The width size of the kernel will affect the steady-state performance and convergence speed of the algorithm. So, how to choose the best value of σ is a challenging problem in the future.

Fig. 4.20 Convergence curves of SAF-LMS and SAF-MCC

198

4

Spline Adaptive Filter

Fig. 4.21 Convergence curves of SAF-MCC with different Δx value

Fig. 4.22 Convergence curves of SAF-MCC with different σ value

4.5.3

Performance Analysis Simulation

In this section, we will carry out simulation analysis on the algorithm mentioned in Sect. 4.3.2. (A) Performance Result In this section, the performance of the MCC algorithm will be simulated and analyzed [15]. The input signal x(n) is generated by (4.92), (4.93), and (4.94). Let α = 1.6, β = 0, γ = 0.05, δ = 0. For Fig. 4.23, we demonstrate the performance comparison between SAF-LMS and SAF-MCC. The learning rates are set at μw = 0.02, μq = 0.01, and μw = 0.14, μq = 0.05 for SAF-LMS and SAF-MCC, respectively. Gaussian kernel σ = 0.1. It is

4.5

Computer Simulation Examples

199

Fig. 4.23 Theoretical and simulated EMSEs with Gaussian noise. (a) EMSE versus step size, σ 2v = 10 - 3 , (b) EMSE versus SNR, μ = 0.006 [20]

obvious that SAF-MCC achieves much better convergence speed and smaller testing error than the SAF-LMS. For Figs. 4.24 and 4.25, we investigate the convergence curves of the algorithm with different interval sampling Δx = [0.08, 0.12, 0.14, 0.16] and different kernel width σ = [0.05, 0.10, 0.50, 1.00], respectively. The step sizes are chosen such that all the algorithms have the same initial convergence rate. We can see that the convergence performance will be the best when Δx = 0.12 in Fig. 4.24. If the kernel width is too large or too small, the performance becomes poor. However, the parameter is set manually in this simulation. Therefore, how to choose the best σ is a challenging problem and is a research topic in the future. (B) Verification of Analysis Results In 4.4 we introduced the steady-state performance analysis method of MCC. In this subsection, we conduct the simulation with two types of noise, Gaussian and non-Gaussian to prove the correctness of mean analysis of it. The unknown system is a Wiener system consisting of the linear component wo = [0.6, -0.4, 0.25, -0.15, 0.1, -0.05, 0.001] and a nonlinear spline function expressed by a LUT qo with 23 control points. The interval sampling Δx = 0.2, and qo is given by qo = [-2.2, -2.0, -1.8, . . ., -1.0, -0.8, -0.91, -0.40, 0.20, -0.05,

200

4

Spline Adaptive Filter

Fig. 4.24 Theoretical and simulated EMSEs with uniform noise. (a) EMSE versus step size, σ 2v = 3:33 × 10 - 4 , (b) EMSE versus SNR, μ = 0.008 [20]

Fig. 4.25 Theoretical and simulated EMSEs with step size with binary σ 2v = 4 × 10 - 4 [20]

0.0, -0.15, 0.58, 1.0, 1.0, 1.2, 1.4, . . ., 0.02]. The input signal is a Gaussian process with zero mean and unit variance. And the step sizes are selected using the 17 values in the range μw = μq = μ = {0.004, 0.005, . . ., 0.02} [18].

4.5

Computer Simulation Examples

201

Figure 4.23 checks the validity of the analytic results, where the noise is Gaussian noise. The theoretical and simulated steady-state EMSEs versus the step size are depicted in Fig. 4.23a, where σ 2v is set to 10-3. As can be seen, the theoretic calculation results have a good match with the simulated results. The theoretical and simulated steady-state EMSEs versus the signal-to-noise ratio (SNR) are depicted in Fig. 4.23b, where μ is set to 0.008. As can be seen, the simulated results still match very well with the theoretical results. Figure 4.24 considers the uniform noise which is a typical non-Gaussian noise, where the theoretical and simulated EMSEs versus the step size and SNR are plotted. It can be seen that the simulated EMSEs converge to the theoretical ones, which again indicate the effectiveness of the analysis. In addition, as the step size increases, the steady-state EMSE gradually increases, which is consistent with (4.64) and (4.71). Figure 4.25 depicts the theoretical and simulated EMSEs versus step size where the noise is binary noise. Comments about the results in Fig. 4.25 are similar to those concerning the results of Fig. 4.24.

4.5.4

Simulation of ANC

In this section, we verify the effectiveness of the algorithm in Sect. 4.4 by using the ANC application. The performance of algorithms is evaluated by the average noise reduction (ANR) factor: ANRðnÞ =

Ae ðnÞ Ad ðnÞ

ð4:95Þ

where Ae(n) = λAe(n - 1) + (1 - λ)je(n)j and Ad(n) = λAd(n - 1) + (1 - λ)jd(n)j with λ being the forgetting factor. The initial value of Ae(n) and Ad(n) is zero.

4.5.4.1

Performance of the FcGMCC Algorithm

The reference signal x(n) is modeled by the standard symmetric α-stable (SαS)α distribution: φSαS ðt Þ = exp f - jtjα g

ð4:96Þ

where α is a characteristic exponent which with a small value indicates a peaky and heavy tailed distribution. In this example, the α is set to 1.1.

202

4

Spline Adaptive Filter

The primary noise sensed by the error microphone is given by: dðnÞ = uðn - 2Þ + 0:8u2 ðn - 2Þ - 0:4u3 ðn - 1Þ

ð4:97Þ

where u(n) = x(n)  p(n) with p(n) being the impulse response of the transfer function [27]: PðZ Þ = z - 3 - 0:3z - 4 + 0:2z - 5

ð4:98Þ

The transfer function of the secondary path used is given by [27]: SN ðZ Þ = z - 2 - 0:5z - 3

ð4:99Þ

Figure 4.26 shows the ANR learning curves of the FcLMS [26], FcSNLMS [29], FcMCC, and FcGMCC algorithms where Fig. 4.28a, b choose α = 1.9 and α = 1.8, respectively. The various simulation parameters used in (a) are: N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:3, p = 1:3, β = 1:1 ðFcGMCCÞ N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:3 ðFcLMSÞ N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:3 ðFcSNLMSÞ N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:3, σ = 1 ðFcMCCÞ In (b), the various simulation parameters are: N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:1, p = 1:3, β = 1:1 ðFcGMCCÞ N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:1 ðFcLMSÞ N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:1 ðFcSNLMSÞ N = 6, Q = 11, Δx = 0:4, μq = 1 × 10 - 5 , μw = 0:1, σ = 1 ðFcMCCÞ While the basis matrix C is given by: 0 CB =

-1

B 1B 3 B 6@ -3 1

3

-3 1

1

-6 0

3 3

0C C C 0A

4

1

0

As can be seen from Fig. 4.26, the FcGMCC algorithm achieves better stability and lower noise reduction than FcLMS, FcSNLMS [29], and FcMCC algorithms. Additionally, a better noise removal effect can be achieved even when the noise source contains impulsive interference. Through simulation experiments, the good

4.5

Computer Simulation Examples

203

(a)

(b) Fig. 4.26 Comparison of ANR in NANC system with nonlinear secondary path for SaS primary noise: (a) α = 1.9 and (b) α = 1.8 [27]

204

4

Spline Adaptive Filter

performance of the proposed FcGMCC algorithm in nonlinear environment and non-Gaussian noise environment can be assured.

4.5.5

Simulation of Echo Cancellation

In this section, some experimental results are presented to prove the validity of the algorithm in 4.4.2. The results are compared with the standard linear echo canceler using LMS algorithm. Performance is evaluated in terms of echo return loss enhancement (ERLE), which is defined as:

E d 2 ð nÞ ERLE = 10 log 10 E f e 2 ð nÞ g

ð4:100Þ

The experimental tests were carried out in a simulated environment of T60 with different reverberation time. The impulse responses were evaluated using the Matlab toolbox Roomsim in a room of dimensions of 6 × 4 × 2.5 m and changing the wall absorption coefficients. The receiver is positioned at [1.56, 1.88, 1.1] m, while the source is located 1m in front of the microphone along the x-direction. In the first experiment, we apply a Gaussian white noise with unitary variance. The length of the signal is 50,000 samples. The learning rates are set to the following values: μw = 10-3 and ηw = 10-2, while μq = ηq = 10-1. B-spline basis is used, and the control points are equispaced of Δx = 0.1, and their number is set to 21. The initialization filter coefficient is w = [0, 0, . . . . , 0]T. The filter length N depends on the reverberation time used, and it is listed in Table 4.1. Figure 4.27 shows the ERLE comparison for the case of an anechoic environment. The results show that the method proposed in [28] is superior to the linear AEC method. In particular, the performance of Hammerstein system is better than that of Wiener system in ERLE index. Figure 4.28 shows an ERLE comparison using spline functions in an anechoic environment. It is clear from the figure that ERLE is lower than the previous case, and the convergence rate is slower. This can be explained by remembering the latter case, where we have to adjust an S-type function with 21 arguments instead of 2. Figure 4.29 depicts the error signal of a Hammerstein system using an S-shaped compensation nonlinear function. In addition, Fig. 4.30 describes the contour of the spline compensation nonlinear function at the end of convergence. We can see that this method can restore the original distortion function.

4.5

Computer Simulation Examples

205

ERLE comparison

40 35 30

ERLE [dB]

25 20 15 10 5 ERLE Hammerstein ERLE Wiener ERLE Linear

0 -5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5 x 104

samples Fig. 4.27 ERLE comparison for the anechoic environment using sigmoid function

ERLE comparison

25

20

ERLE [dB]

15

10

5

0

-5

ERLE Hammerstein ERLE Wiener ERLE Linear 0

0.5

1

1.5

2

2.5

3

3.5

4

samples Fig. 4.28 ERLE comparison for the anechoic environment using spline function

4.5

5 x 104

206

4

Spline Adaptive Filter

Fig. 4.29 Error signal of the Hammerstein system with sigmoid function in an anechoic environment

Fig. 4.30 Profile of the estimated loudspeaker nonlinearity using a spline function for the anechoic environment

References

4.6

207

Summary

In this section, we mainly introduce the nonlinear spline filter. In the first section, we introduce the spline filter model. It is essentially a linear-nonlinear network, and the linear part is an FIR filter. Nonlinear networks consist of adaptive lookup tables (LUT) and spline interpolation networks. In addition, the SAF-LMS algorithm under MSE criterion is also introduced. However, the performance of SAF-LMS algorithm deteriorates rapidly when interfered with non-Gaussian noise, especially in the presence of large outliers (significant deviations from observed values). Therefore, we introduce a robust spline filtering algorithms in the second section, called spline adaptive filtering (SAF-MCC) algorithm based on MCC criterion. The performance of the algorithms is also simulated and analyzed. Simulation results show that SAF-MCC performs well in non-Gaussian environment. At the same time, the theoretical analysis of the steady-state performance of the spline nonlinear filter under the MCC criterion is introduced in detail in the third section. In the fourth section, the validity of the mean value analysis is proved by simulation analysis. Finally, we introduce the application of spline filter in active noise control field.

References 1. Moodi H, D Bustan. On Identification of Nonlinear Systems Using Volterra Kernels Expansion on Laguerre and Wavelet Function[C]// Control & Decision Conference. IEEE, 2010. 2. Le D C, Zhang J, Li D, et al. A generalized exponential functional link artificial neural networks filter with channel-reduced diagonal structure for nonlinear active noise control[J]. Applied Acoustics, 2018, 139:174–181 3. F. Lindsten, T. B. Schon, M. I. Jordanb, Bayesian semiparametric Wiener system identification. Automatica. 49, 2053–2063 (2013) 4. M. Rasouli, D. Westwick, W. Rosehart, Quasiconvexity analysis of the Hammerstein model. Automatica. 50, 277–281 (2014) 5. Scarpiniti M, Comminiello D, Parisi R, et al. Hammerstein uniform cubic spline adaptive filters: Learning and convergence properties[J]. Signal Processing, 2014, 100(JUL.):112–123 6. Scarpiniti, Michele, Parisi, et al. Novel Cascade Spline Architectures for the Identification of Nonlinear Systems[J]. IEEE transactions on circuits and systems, I. Regular papers: a publication of the IEEE Circuits and Systems Society, 2015. 7. Liu C, Peng C, Tang X, et al. Two variants of the IIR spline adaptive filter for combating impulsive noise[J]. Journal on Advances in Signal Processing, 2019, 2019(1). 8. Liu C, Zhang Z, Tang X. Sign Normalised Hammerstein Spline Adaptive Filtering Algorithm in an Impulsive Noise Environment[J]. Neural Processing Letters, 2019, 50(1):477–496 9. Yang Y, Yang B, Niu M. Spline adaptive filter with fractional-order adaptive strategy for nonlinear model identification of magnetostrictive actuator[J]. Nonlinear Dynamics, 2017, 90(1):1647–1659. 10. Scarpiniti M, Comminiello D, Parisi R, et al. Nonlinear spline adaptive filtering[J]. Signal Processing, 2013, 93(4):772–783. 11. Liu, Chang, Zhang, et al. Sign normalised spline adaptive filtering algorithms against impulsive noise[J]. Signal Processing: The Official Publication of the European Association for Signal Processing (EURASIP), 2018.

208

4

Spline Adaptive Filter

12. Uncini, Aurelio, Parisi, et al. Nonlinear system identification using IIR Spline Adaptive Filters [J]. Signal Processing: The Official Publication of the European Association for Signal Processing (EURASIP), 2015, 108:30–35 13. Chang L, Zhang Z, Tang X. Sign-Normalized IIR Spline Adaptive Filtering Algorithms for Impulsive Noise Environments[J]. Circuits Systems & Signal Processing, 2018. 14. Guan S, Li Z. Normalised Spline Adaptive Filtering Algorithm for Nonlinear System Identification[J]. Neural Processing Letters, 2017, 46(2):595–607. 15. Peng S, Wu Z, Zhang X, et al. Nonlinear spline adaptive filtering under maximum correntropy criterion[C]// Tencon IEEE Region 10 Conference. IEEE, 2016. 16. Zongze W, Siyuan P, Badong C, et al. Robust Hammerstein Adaptive Filtering under Maximum Correntropy Criterion[J]. Entropy, 2015, 17(10):7149–7166. 17. Singh A , José Carlos Príncipe. Using Correntropy as a cost function in linear adaptive filters [C]// International Joint Conference on Neural Networks. IEEE, 2009. 18. None. Steady-State Mean-Square Error Analysis for Adaptive Filtering under the Maximum Correntropy Criterion[J]. IEEE Signal Processing Letters, 2014, 21(7):880–884. 19. A. H. Sayed, Fundamentals of adaptive filtering, John Wiley & Sons,2003 20. Wang W, Zhao H, Zeng X, et al. Steady-State Performance Analysis of Nonlinear Spline Adaptive Filter Under Maximum Correntropy Criterion[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2020, 67(6):1154–1158. 21. Arenas-Garcia J, Figueiras-Vidal A R, Sayed A H. Steady state performance of convex combinations of adaptive filters[C]// Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP ‘05). IEEE International Conference on. IEEE, 2005. 22. Wu L, Qiu X, Guo Y. A generalized leaky FxLMS algorithm for tuning the waterbed effect of feedback active noise control systems[J]. Mechanical Systems & Signal Processing, 2018, 106 (jun.):13–23. 23. Patel V, George N V. Compensating acoustic feedback in feed-forward active noise control systems using spline adaptive filters[J]. Signal Processing, 2016, 120(MAR.):448–455. 24. George, Nithin V, Patel, et al. Nonlinear active noise control using spline adaptive filters [J]. Applied acoustics, 2015. 25. Y Gao, H Zhao, J Lou, Robust Spline Adaptive filtering algorithm based-GMCC for nonlinear active noise control, Submitted to Applied Acoustics. 26. Luo L, Sun J, Huang B. A novel feedback active noise control for broadband chaotic noise and random noise[J]. Applied Acoustics, 2017, 116(jan.):229–237. 27. Lu L, Zhao H. Adaptive Volterra filter with continuous lp-norm using a logarithmic cost for nonlinear active noise control[J]. Journal of Sound & Vibration, 2016, 364:14–29. 28. Scarpiniti M, Comminiello D, Parisi R, et al. Comparison of Hammerstein and Wiener systems for nonlinear acoustic echo cancelers in reverberant environments[C]// International Conference on Digital Signal Processing. IEEE, 2011. 29. Liu C, Zhang Z, Tang X. Sign normalised spline adaptive filtering algorithms against impulsive noise[J]. Signal Processing, 2018, 148: 234–240.

Chapter 5

Kernel Adaptive Filters

5.1

Introduction

Kernel method is a nonlinear nonparametric modeling tool. The key idea is to transform the input data to a high-dimensional feature space via a reproducing kernel. Then appropriate linear methods are subsequently applied on the transformed data. The kernel method needs to find the inner products in the formulation and transform the inner products to a kernel function. This methodology is also called the “kernel trick”, which has been widely used in many well-known algorithms, including support vector machine (SVM), principal component analysis (PCA), Fisher discriminant analysis. Moreover, reproducing kernel Hilbert space (RKHS) plays a central role to provide linearity, convexity, and universal approximation capability. The kernel adaptive filters, which create a growing Linear-In-the-Parameter (LIP) model, are developed by implementing the well-established linear adaptive filters in kernel space. The KAF algorithms have a bottleneck of computational complexity, requiring linear or superlinear time and space with respect to the sample number n. Besides, the size of the network increasing with the number of training data raises the challenge for applying KAFs in non-stationary signal processing tasks. On the one hand, a fundamental question is if it is necessary to memorize all the past inputs. Then, by removing redundant data, it is possible to keep a minimal set of centers and cover the area where inputs will likely appear (Imagine that a kernel is a sphere in the input space with the kernel bandwidth as the radius). On the other hand, a sparse model (a network with as few nodes as possible) is desirable, because it reduces the complexity in terms of computation and memory, and it usually gives better generalization ability. There are many approaches to simplify the network, and we divide them into sparsification, quantization, and kernel approximation methods. In this chapter, we will introduce some classic approaches in each class. In addition, the main abbreviations in this chapter are shown in Table 5.1.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Zhao, B. Chen, Efficient Nonlinear Adaptive Filters, https://doi.org/10.1007/978-3-031-20818-8_5

209

210

5

Kernel Adaptive Filters

Table 5.1 Abbreviations in the chapter Abbreviation KAF KLMS KRLS KAPA RKHS LMS RLS APA RBF NC ALD VQ QKLMS DQ VQIT PRQ RFF

5.2 5.2.1

Complete spelling Kernel adaptive filter Kernel least mean square Kernel recursive least square Kernel affine projection algorithm Reproducing kernel Hilbert space Least mean square Recursive least square Affine projection algorithm Radial basis function Novelty criterion Approximate linear dependency Vector quantization Quantized kernel least mean square Density-dependent vector quantization Vector quantization using information theoretic learning Probability density rank based quantization Random fourier feature

Kernel Adaptive Filters Reproducing Kernel Hilbert Space

A Hilbert space is an inner-product space, which has an orthonormal basis fxk g1 k = 1. be a basis and  be the largest and most inclusive space of vectors. Any Let fxk g1 k=1 vector not necessarily lying in the original inner-product space can be represented as x=

1 X

ak x k :

ð5:1Þ

k=1

where x can be spanned by the basis fxk g1 k = 1 , ak are the coefficients of the representation. Define a new vector ym =

m X

ak x k :

ð5:2Þ

k=1

We can calculate the Euclidean distance between the vector yn and the vector ym as follows

5.2

Kernel Adaptive Filters

211

k yn - ym k 2 = k =k =

n P

ak x k -

k=1

n P k = mþ1 n P

k = mþ1

m P k=1

ak x k k 2

ak x k k 2

:

ð5:3Þ

ak 2

Therefore, to make the definition of x meaningful, the following conditions hold: n P

1. 2.

k = mþ1 m P k=1

ak 2 → 0 as both n, m →1.

ak 2 < 1.

In other word, define a Cauchy sequence fxk g1 k = 1 , thus a vector x can be expanded on the basis fxk g1 k = 1 , if and only if x is a linear combination of the basis vectors and the associated coefficients fak g1 k = 1 are square summable. Obviously, the space  is more “complete” than the inner-product space. We may, therefore, derive the following important statement: An inner-product space  is complete, if every Cauchy sequence taken from the space  converges to a limitation in ; a complete inner-product space is called a Hilbert space. A Mercer kernel is a continuous, symmetric, positive-definite function k :  ×  → , where  is the input domain, a subset of L and L is the input dimension. The well-known Gaussian kernel is kðx1 ,x2 Þ = exp ð - a k x1 - x2 k 2 Þ:

ð5:4Þ

Let  be any vector space of all real-valued functions of input x that are generated by the kernel k(x, ). Suppose two functions h() and g() are picked from the space  that are, respectively, represented by h=

l X

ai kðxi , Þ

ð5:5Þ

bj kðxj , Þ,

ð5:6Þ

i=1

and g=

m X j=1

where ai and bj are the expansion coefficients and both xi and xj 2  for all i and j. The bilinear form defined as

212

5

< h, g > =

Kernel Adaptive Filters

l X m X

ai kðxi , xj Þbj

ð5:7Þ

i=1 j=1

satisfies the following properties: 1. Symmetry < h,g > = < g,h >

ð5:8Þ

2. Scaling and distributive property < ðcf þ dgÞ,h > = c < f ,h > þ d < g,h >

ð5:9Þ

k f k 2 = < f ,f > ≥ 0

ð5:10Þ

3. Squared norm

Accordingly, the bilinear term < h, g > is indeed an inner product. There is one additional property that follows directly. Specifically, setting g() = k(x, ), we obtain < h, kðx, Þ > =

l X

ai kðxi , xÞ >

i=1

ð5:11Þ

= hðxÞ This property is known as the reproducing property. The kernel k(xi, x), representing a function of the two vectors xi and x, is called a reproducing kernel of the vector space  if it satisfies the following two conditions: • For every xi 2 , k(xi, x) being a function of the vector x belongs to . • It satisfies the reproducing property. These two conditions are indeed satisfied by the Mercer kernel, thereby endowing it with the designation “reproducing kernel.” If the inner-product space , in which the reproducing kernel space is defined and also complete, then it is called a reproducing kernel Hilbert space, for which we use the acronym RKHS hereafter. The analytic power of RKHS is expressed in an important theorem called the Mercer theorem. The Mercer theorem states that any reproducing kernel k(xi, x) can be expanded as follows: kðxi , xÞ =

1 X

λφðxi ÞT φðxÞ,

ð5:12Þ

i=1

where λ and φ() denote eigenvalues and the eigenfunctions, respectively. If the eigenvalues are non-negative, a mapping φ can be constructed as

5.2

Kernel Adaptive Filters

213

φ : → φ=

ð5:13Þ

pffiffiffiffiffi  pffiffiffiffiffi λ1 φ1 ðxÞ, λ2 φ2 ðxÞ, ⋯ :

ð5:14Þ

The dimensionality of  is determined by the number of strictly positive eigenvalues, which can be infinite in the Gaussian kernel case. In the machine learning literature, φ is usually treated as the feature mapping and φ(x) is the transformed feature vector lying in the feature space  (which is an innerproduct space). Therefore, an important implication is kðxi ,xÞ = φðxi ÞT φðxÞ:

ð5:15Þ

It is obvious that  is essentially the same as the RKHS, which is induced by the kernel by identifying φ(x) = k(x, ). Sometimes, one can find an explicit expression of φ, but in the most cases, it is hard to explicitly express φ. Here, we use an example to illustrate the mapping φ. Define 2

kðx,yÞ = ð1 þ xT yÞ ,

ð5:16Þ

where x = [x1, x2]T and y = [y1, y2]T with y1 and y2 being constant values, we have kðx,yÞ = 1 þ x21 y21 þ 2x1 x2 y1 y2 þ þx22 y22 þ 2x1 y1 þ 2x2 y2 :

ð5:17Þ

Therefore, the mapping φ of the input vector x can be written as pffiffiffi pffiffiffi pffiffiffi T  φðxÞ = 1, x21 , 2x1 x2 , x22 , 2x1 , 2x2 :

ð5:18Þ

It is easy to verify that φðxÞT φðyÞ = kðx,yÞ:

ð5:19Þ

In general, the dimensionality of φ scales with O(L p), where L is the dimension of input vectors and p is the order of the polynomial kernel.

5.2.2

Kernel Least Mean Square

A simple linear finite impulse response filter is least mean square (LMS) algorithm, which uses the stochastic gradient to optimize the cost function. If the mapping between d and x is highly nonlinear, very poor performance can be expected from LMS. Therefore, we can use the kernel method to transform the input x into a high-

214

5

Kernel Adaptive Filters

dimensional feature space as φ(x). Meanwhile, due to the difference in dimensionality, wTφ(x) is a more powerful model. Denote φ(x) as φi for simplicity. Basically, the LMS algorithm operates by minimizing the instantaneous cost function JðiÞ =

1 2 e : 2 i

ð5:20Þ

Using the LMS algorithm on the kernel space yields 8 w0 = 0 > > < ei = di - wTi- 1 φi > > : wi = wi - 1 þ ηei φi ,

ð5:21Þ

where wi denotes the estimate of the weight vector in feature space . Since we cannot write the expression of φ directly, which is usually in high-dimensional space, we use an alternative way of carrying out the computation. The repeated application of the weight update Eq. (5.21) through iterations yields wi = wi - 1 þ ηei φi = ½wi - 2 þ ηei - 1 φi - 1  þ ηei φi = wi - 2 þ ½ηei - 1 φi - 1 þ ηei φi  = w0 þ η =η

i P j=1

i P j=1

ð5:22Þ

e j φj

e j φj ,

where i denotes the i step training, the weight estimation is expressed as a linear combination of all the previous and present inputs and is weighted by the training errors and a training step η. Thus, we can compute the output of the filter, " wi T φðxÞ = η

i P j=1

#T e j φj

φðxÞ

i   P ej φj T φðxÞ : =η

ð5:23Þ

j=1

We can efficiently compute the inner products in the feature space by a kernel function as follows

5.2

Kernel Adaptive Filters

215

Fig. 5.1 Network topology of KLMS at iteration i

wi T φðxÞ = η

i X

ej kðxj ,xÞ:

ð5:24Þ

j=1

Comparing the LMS with the above iterations, we find that there is not any weight in the models because of the kernel method. Instead, we have the sum of all past errors multiplied by the kernel functions on the previously received data, which is equivalent to the weights in Eq. (5.22). Therefore, the model no longer iterates the weights and does not need compute the inner products. The new algorithm is named Kernel Least Mean Square (KLMS). It is the form of LMS in RKHS and allocates a new kernel unit for the new training data with input xi as a new center and ηe(i) as the corresponding coefficient. The algorithm is summarized in Algorithm 1, and the corresponding topology is shown in Fig. 5.1.

The KLMS and radial basis function (RBF) network have a similar topology. The differences include: (1) the weight before each kernel function is the corresponding

216

5

Kernel Adaptive Filters

training error; (2) KLMS has a growing network, where each new unit is placed over each new input; (3) the kernel function is not limited to be a radial basis function and can be any other Mercer kernel. KLMS requires O(i) operations in each iteration and weight update, but we need to pay attention to several aspects that are still unspecified. The first is how to select the kernel k(, ), the second is how to select the step-size parameter η, and finally how to cope with the growing memory/computation requirement for online operation.

5.2.2.1

Kernel Selection

The types of kernel functions are very important and define the measurement of data and can finally affect the performance. In the following, a brief discussion of kernel and parameter selection are provided. To apply the kernel method, we first need to pick a kernel function. In the existing works of nonparametric regression, it is known that any bell-shaped weight function (Gaussian function, tri-cube function, etc.) leads to equivalent asymptotic accuracy. Actually, we have discussed that weight functions are not necessarily reproducing kernels and vice versa. The RKHS approach examines more closely the eigenfunctions of the kernel and its richness for approximation. It is known that the Gaussian kernel creates a reproducing kernel Hilbert space with universal approximating capability while the polynomial kernel of finite order does not, because its Taylor expansion has infinite order. At the same time, Gaussian kernel has been widely used due to its excellent mathematical properties. Model function composed of Gaussian kernels is usually very smooth and brings advantages in numerical calculation. Gaussian kernel function is shown in the following. By contrast, the approximating capability of the polynomial kernel with order p is limited to any polynomial function with its degree less than or equal to p. Unless it is clear from the problem domain that the target function is a polynomial function or can be well approximated by a polynomial function, the Gaussian kernel is usually a default choice. The studies in many approximation fields show that the Gaussian kernel has the universal approximating capability, is numerically stable, and usually gives reasonable results. The well-known Gaussian kernel function is shown as kðxi ,xj Þ = exp

! k xi - x j k 2 , σ2

ð5:25Þ

where σ is the kernel bandwidth. Up to now, there are many methods for selecting a kernel size for the Gaussian kernel borrowed from the areas of statistics, nonparametric regression and kernel density estimation. Available methods to select suitable kernel bandwidth include cross-validation, nearest neighbors and penalizing functions. Cross-validation is easy and efficient, but the computational complexity is

5.2

Kernel Adaptive Filters

217

very huge especially when the data set is large or there are several hyper parameters need to be chosen. penalizing functions [1], plug-in methods are also used to select kernel bandwidth [2], but both cost huge computation resources. The Silverman’s rule is widely accepted in kernel density estimation although it is derived under a Gaussian assumption and is usually not appropriate for multimodal distributions [3]. Silverman’s rule uses the mean integrated square error (MISE) between the estimated and the actual PDF to get the optimal metric [3]: 1 n odþ4 , σ opt = σ X 4N - 1 ð2d þ 1Þ - 1

ð5:26Þ

P where d is the dimensionality of the data and σ 2x = d - 1 Ni= 1 X ii with Xii being the diagonal elements of the sample covariance matrix. In some situations, researchers use some statistic quantities, and the empirical kernel bandwidth is usually related to the mean or the variance of the data [4, 5]. Some adaptive kernel bandwidth selection methods are applied to kernel density estimation [6, 7]. Other adaptive kernel bandwidth optimization methods optimize the kernel bandwidth by gradient decent methods [8–10]. These methods are often used in online learning because of cheap computation, but they also have the disadvantage of slow convergence. Multi-kernel learning is often used to solve the problem of selection of kernel bandwidth [11, 12]. Multi-kernel learning replaces one single Gaussian kernel with multiple Gaussian kernels with different kernel bandwidths. In the training phase, the kernel bandwidth is selected by putting different weights on different Gaussian kernels. The kernel with more appropriate kernel bandwidth will get higher weight after training. However, multi-kernel learning is a suboptimal solution of find best kernel bandwidth and brings unnecessary computation burden. Although a large number of methods can be used to select the kernel bandwidth, cross-validation is the most used approach in practical problem and it is very straightforward, which always needs huge computations. From the perspective of functional analysis, the kernel function define the inter products in RKHS, which is also the similarity measure in RKHS. Therefore, selecting different kernel bandwidth, the same input data might be mapped to very different functions. In practical problems, if the kernel size is too large, all the data in RKHS will look similar (internal products are all close to 1) and the system will be reduced to linear regression. If the kernel size is too small, all data looks different (internal products are all close to 0) and the system cannot infer the unseen samples between the training points.

5.2.2.2

Step-Size Selection

After choosing the kernel and its free parameter, we need to find a suitable step-size. Since KLMS is the form of LMS algorithm in RKHS, we can use the same way to analysis the step-size. In particular, the step-size is a compromise between convergence time and mis-adjustment (i.e., increasing the step-size parameter decreases

218

5

Kernel Adaptive Filters

convergence time but increases mis-adjustment). Moreover, the step-size is upper bounded by the reciprocal of the largest eigenvalue of the transformed data autocorrelation matrix. Consider data set fðxi ,d i ÞgNi= 1 with i being the time index. Define the data matrix in the feature space Φ = [φ1, φ2, ⋯ , φN], Rφ is an autocorrelation matrix, and Gφ is a Gram matrix, we have Rφ =

1 ΦΦT N

ð5:27Þ

Gφ =

1 T Φ Φ N

ð5:28Þ

where Gφ is an N × N matrix with k(xi, xj) being its (i, j)-th element. The step-size is required to satisfy the following condition to keep the algorithm staying stable η