Data-Driven and Model-Based Methods for Fault Detection and Diagnosis 0128191643, 9780128191644

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis covers techniques that improve the quality of faul

709 129 14MB

English Pages 322 [315] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis
 0128191643, 9780128191644

Table of contents :
Contents
List of figures
List of tables
About the authors
Acknowledgments
List of acronyms
Nomenclature
Latin letters
Greek letters
1 Introduction
References
2 PCA and PLS-based generalized likelihood ratio for fault detection
2.1 PCA and PLS-based generalized likelihood ratio for fault detection
2.1.1 Introduction
2.1.2 Principal component analysis (PCA)
2.1.2.1 Modeling using PCA
2.1.2.2 How many principal components to use?
2.1.3 Fault detection using PCA method
2.1.4 Statistical hypothesis testing
2.1.4.1 Fault detection using hypothesis testing
2.1.4.2 Generalized likelihood ratio GLRT
2.1.5 Fault detection using a PCA-based GLRT
2.1.6 PCA-based GLRT and applications
2.1.6.1 Ozone monitoring using PCA-based GLRT
2.1.6.2 Description of the training ozone data
2.1.6.3 Ozone modeling using PCA
2.1.6.4 Monitoring the ozone concentrations
2.1.6.5 Process monitoring of a simulated continuously stirred tank reactor (CSTR)
2.1.6.6 Modeling the CSTR data using PCA
2.1.6.7 Simulation results
2.1.7 Conclusion
2.2 PLS-based generalized likelihood ratio for fault detection
2.2.1 Introduction
2.2.2 Partial Least Square (PLS) method
2.2.3 PLS-based GLRT for fault detection
2.2.4 PLS-based GLRT fault detection and applications
2.2.4.1 Fault detection of continuously stirred tank reactor process
2.2.4.2 Fault detection of Tennessee Eastman Process
2.2.5 Conclusions
References
3 Kernel PCA- and Kernel PLS-based generalized likelihood ratio tests for fault detection
3.1 Kernel PCA-based generalized likelihood ratio test for fault detection
3.1.1 Introduction
3.1.2 Kernel Principal Component Analysis (KPCA) description
3.1.3 Fault detection using KPCA method
3.1.4 Enhanced monitoring using kernel GLRT chart
3.1.5 Kernel GLRT fault detection chart with applications
3.1.5.1 Application 1: synthetic data
3.1.5.2 Application 2: nonisothermal CSTR process
3.1.6 Conclusion
3.2 Kernel PLS-based generalized likelihood ratio test for fault detection
3.2.1 Introduction
3.2.2 Kernel Partial Least Squares (KPLS) method
3.2.3 KPLS-based GLRT and application to fault detection in CSTR process
Case 1: faults in the concentration CA
Case 2: fault in the temperature T
Case 3: faults in the concentration CA and temperature T
3.2.4 Conclusion
References
4 Linear and nonlinear multiscale latent variable-based generalized likelihood ratio for fault detection
4.1 Linear multiscale latent variable-based generalized likelihood ratio for fault detection
4.1.1 Introduction
4.1.2 Multiscale PCA-based GLRT for fault detection
4.1.2.1 Modeling using multiscale PCA method
4.1.2.2 Fault detection using GLRT
4.1.2.3 MSPCA-based MW-GLRT and applications
4.1.3 Multiscale PLS-based GLRT for fault detection
4.1.3.1 Multiscale Partial Least Square (MSPLS) method
4.1.3.2 MSPLS-based GLRT fault detection technique and applications
4.1.4 Conclusions
4.2 Multiscale nonlinear latent variable-based generalized likelihood ratio test for fault detection
4.2.1 Introduction
4.2.2 Multiscale kernel PCA-based GLRT for fault detection
4.2.2.1 Multiscale kernel PCA description
4.2.2.2 Multiscale kernel GLRT fault detection chart with applications
4.2.3 Multiscale kernel PLS-based GLRT for fault detection
4.2.3.1 Multiscale Kernel Partial Least Square (KPLS) method
4.2.3.2 MSKPLS-based GLRT technique and applications
4.2.4 Conclusion
References
5 Linear and nonlinear interval latent variable approaches for fault detection
5.1 Interval latent variable approaches for fault detection
5.1.1 Introduction
5.1.2 Interval PCA-based GLRT for fault detection
5.1.2.1 Interval data description
5.1.2.2 Principal component analysis for interval-valued data
5.1.2.3 Interval-valued PCA model identification
5.1.2.4 Fault detection using complete information PCA-based GLRT
5.1.2.5 Complete information PCA-based GLRT and applications
5.1.2.6 Fault detection using midpoints radii PCA-based EWMA
5.1.2.7 Midpoints radii PCA-based EWMA and applications
5.1.3 Interval PLS-based GLRT for fault detection
5.1.3.1 Partial least squares for interval-valued data
5.1.3.2 Fault detection charts based on interval PLS
5.1.3.3 Fault detection using interval PLS-based GLRT
5.1.3.4 Interval PLS-based GLRT and applications
5.1.4 Conclusion
5.2 Interval nonlinear latent variable approaches for fault detection
5.2.1 Introduction
5.2.2 Interval kernel PCA-based GLRT for fault detection
5.2.2.1 Kernel PCA for interval-valued data (IKPCA)
5.2.2.2 Interval KPCA-based fault detection charts
5.2.2.3 Applications
5.2.3 Interval kernel PLS-based GLRT for fault detection
5.2.3.1 Kernel PLS for interval-valued data (IKPLS)
5.2.3.2 Interval KPLS-based fault detection charts
5.2.3.3 Interval KPLS-based GLRT and application
5.2.4 Conclusion
References
6 Model-based approaches for fault detection
6.1 Introduction
6.2 State estimation
6.2.1 State estimation problem formulation
6.2.2 State estimation techniques
6.2.2.1 Extended Kalman filter (EKF)
6.2.2.2 Unscented Kalman filter (UKF)
6.2.2.3 Particle filter (PF)
6.3 Fault detection-based state estimation approaches
6.3.1 Fault detection using multiscale EWMA chart
6.3.1.1 EWMA chart
6.3.1.2 Multiscale EWMA chart
6.3.2 Application to wastewater treatment plant
6.3.2.1 State estimation results
6.3.2.2 Fault detection results
6.4 Fault detection-based state estimation approach
6.4.1 Fault detection using optimized weighted SS-DEWMA chart
6.4.2 Optimized WSS-DEWMA and application to fault detection
6.4.2.1 Application 1: synthetic example
6.4.2.2 Application 2: Cad System in E. coli (CSEC)
6.5 Conclusions
References
7 Conclusions and perspectives
7.1 Conclusions
7.2 Perspectives and research proposals
7.2.1 Project 1: water distribution networks: modeling, sensor placement, leak and quality monitoring
7.2.2 Project 2: enhanced operation of wastewater treatment plants
7.2.3 Project 3: enhanced monitoring of photovoltaic systems
7.2.4 Project 4: enhanced data validation of an air quality monitoring networks
Appendix
Applications
Tennessee Eastman Process (TEP)
Distillation column
Air quality monitoring network
Continuously stirred tank reactor (CSTR)
References
Index

Citation preview

DATA-DRIVEN AND MODEL-BASED METHODS FOR FAULT DETECTION AND DIAGNOSIS

DATA-DRIVEN AND MODEL-BASED METHODS FOR FAULT DETECTION AND DIAGNOSIS

MAJDI MANSOURI MOHAMED-FAOUZI HARKAT HAZEM N. NOUNOU MOHAMED N. NOUNOU

Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States Copyright © 2020 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-12-819164-4 For information on all Elsevier publications visit our website at https://www.elsevier.com/books-and-journals Publisher: Susan Dennis Acquisitions Editor: Anita Koch Editorial Project Manager: Sara Valentino Production Project Manager: Vignesh Tamil Designer: Miles Hitchen Typeset by VTeX

Contents List of figures List of tables About the authors Acknowledgments List of acronyms Nomenclature

ix xvii xix xxi xxiii xxv

1. Introduction

1

References

6

2. PCA and PLS-based generalized likelihood ratio for fault detection

11

2.1. PCA and PLS-based generalized likelihood ratio for fault detection 2.1.1. Introduction 2.1.2. Principal component analysis (PCA) 2.1.3. Fault detection using PCA method 2.1.4. Statistical hypothesis testing 2.1.5. Fault detection using a PCA-based GLRT 2.1.6. PCA-based GLRT and applications 2.1.7. Conclusion 2.2. PLS-based generalized likelihood ratio for fault detection 2.2.1. Introduction 2.2.2. Partial Least Square (PLS) method 2.2.3. PLS-based GLRT for fault detection 2.2.4. PLS-based GLRT fault detection and applications 2.2.5. Conclusions References

12 12 13 16 18 21 23 36 37 37 38 40 41 43 45

3. Kernel PCA- and Kernel PLS-based generalized likelihood ratio tests for fault detection

49

3.1. Kernel PCA-based generalized likelihood ratio test for fault detection 3.1.1. Introduction 3.1.2. Kernel Principal Component Analysis (KPCA) description 3.1.3. Fault detection using KPCA method 3.1.4. Enhanced monitoring using kernel GLRT chart 3.1.5. Kernel GLRT fault detection chart with applications 3.1.6. Conclusion 3.2. Kernel PLS-based generalized likelihood ratio test for fault detection 3.2.1. Introduction 3.2.2. Kernel Partial Least Squares (KPLS) method

49 49 50 53 54 57 62 63 63 65 v

vi

Contents

3.2.3. KPLS-based GLRT and application to fault detection in CSTR process 3.2.4. Conclusion References

4. Linear and nonlinear multiscale latent variable-based generalized likelihood ratio for fault detection 4.1. Linear multiscale latent variable-based generalized likelihood ratio for fault detection 4.1.1. Introduction 4.1.2. Multiscale PCA-based GLRT for fault detection 4.1.3. Multiscale PLS-based GLRT for fault detection 4.1.4. Conclusions 4.2. Multiscale nonlinear latent variable-based generalized likelihood ratio test for fault detection 4.2.1. Introduction 4.2.2. Multiscale kernel PCA-based GLRT for fault detection 4.2.3. Multiscale kernel PLS-based GLRT for fault detection 4.2.4. Conclusion References

5. Linear and nonlinear interval latent variable approaches for fault detection 5.1. Interval latent variable approaches for fault detection 5.1.1. Introduction 5.1.2. Interval PCA-based GLRT for fault detection 5.1.3. Interval PLS-based GLRT for fault detection 5.1.4. Conclusion 5.2. Interval nonlinear latent variable approaches for fault detection 5.2.1. Introduction 5.2.2. Interval kernel PCA-based GLRT for fault detection 5.2.3. Interval kernel PLS-based GLRT for fault detection 5.2.4. Conclusion References

6. Model-based approaches for fault detection 6.1. Introduction 6.2. State estimation 6.2.1. State estimation problem formulation 6.2.2. State estimation techniques 6.3. Fault detection-based state estimation approaches 6.3.1. Fault detection using multiscale EWMA chart 6.3.2. Application to wastewater treatment plant

69 73 75

79 79 79 80 98 112 113 113 116 122 132 132

135 136 136 137 170 185 188 188 190 204 214 215

221 221 225 225 226 230 230 233

Contents

6.4. Fault detection-based state estimation approach 6.4.1. Fault detection using optimized weighted SS-DEWMA chart 6.4.2. Optimized WSS-DEWMA and application to fault detection 6.5. Conclusions References

7. Conclusions and perspectives 7.1. Conclusions 7.2. Perspectives and research proposals 7.2.1. Project 1: water distribution networks: modeling, sensor placement, leak and quality monitoring 7.2.2. Project 2: enhanced operation of wastewater treatment plants 7.2.3. Project 3: enhanced monitoring of photovoltaic systems 7.2.4. Project 4: enhanced data validation of an air quality monitoring networks

vii

240 240 245 252 255

259 259 260 260 266 270 273

Appendix References

279 286

Index

289

List of figures

Figure 2-1 Figure 2-2 Figure 2-3 Figure 2-4 Figure 2-5 Figure 2-6 Figure 2-7 Figure 2-8 Figure 2-9 Figure 2-10 Figure 2-11 Figure 2-12 Figure 2-13 Figure 2-14 Figure 2-15 Figure 2-16 Figure 2-17 Figure 2-18 Figure 2-19 Figure 2-20 Figure 2-21 Figure 2-22

Figure 2-23 Figure 2-24 Figure 2-25 Figure 2-26 Figure 2-27 Figure 3-1 Figure 3-2 Figure 3-3

Geometric Interpretation of PCA. Principle of PCA projection. Outliers detection using PCA. A schematic diagram of the PCA-based GLRT fault detection method. (A) Quarter-hourly ozone measurements and (B) ACF of ozone measurements. Variance captured by each principal component. PC2 versus PC1. PCA model prediction of ozone concentrations for the first four network stations. PCA model prediction of ozone concentrations for the last three network stations. Monitoring a simple sensor fault using PCA-based Q. Monitoring a simple sensor fault using PCA-based GLRT. Monitoring a multiple faults using PCA-based Q. Monitoring a multiple faults using PCA-based GLRT. The variance captured by each principal component. Histograms showing the normality of the residuals. Monitoring a fault in CA using PCA-based Q. Monitoring a fault in CA using PCA-based GLRT. Monitoring a fault in T using PCA-based Q. Monitoring a fault in T using PCA-based GLRT. Monitoring of multiple faults in CA and T using PCA-based Q. Monitoring of multiple faults in CA and T using PCA-based GLRT. Correct detection rate vs false alarm rate for the PCA-based GLRT fault detection method and the conventional PCA method. The time evolution of the generated data X . Monitoring fault in temperature using PLS-based Q chart. Monitoring fault in temperature using PLS-based GLRT chart. Monitoring TEP IDV 2 fault using PLS-based Q chart. Monitoring TEP IDV 2 fault using PLS-based GLRT chart. Schematic illustration of KGLRT algorithm. Evolutions of SPE, KGLRT, and EWMA-KGLRT in faulty case. Evolutions of SPE, KGLRT, and EWMA-KGLRT in faulty case.

17 17 17 22 25 26 26 27 28 30 30 31 31 32 33 34 34 35 35 36 36

37 42 42 43 43 44 58 59 60 ix

x

List of figures

Figure 3-4

Figure 3-5

Figure 3-6 Figure 3-7 Figure 3-8 Figure 3-9 Figure 3-10 Figure 3-11 Figure 3-12 Figure 3-13 Figure 3-14 Figure 3-15

Figure 3-16

Figure 3-17

Figure 3-18

Figure 4-1 Figure 4-2 Figure 4-3

The time evolution of the SPE (A), KGLRT (B) and EWMA − KGLRT (C) statistics on a semi-logarithmic scale in the presence of a single fault in T . The time evolution of the SPE (A), KGLRT (B) and EWMA − KGLRT (C) statistics on a semi-logarithmic scale in the presence of simultaneous faults in CA and T . KPLS diagram for nonlinear regression. The time evolution of PLS-based Q statistic on a semilogarithmic scale in the presence of a fault in CA . The time evolution of KPLS-based Q statistic on a semilogarithmic scale in the presence of a fault in CA . The time evolution of PLS-based GLRT statistic on a semilogarithmic scale in the presence of a fault in CA . The time evolution of KPLS-based GLRT statistic on a semilogarithmic scale in the presence of a fault in CA . The time evolution of PLS-based Q statistic on a semilogarithmic scale in the presence of a fault in T . The time evolution of KPLS-based Q statistic on a semilogarithmic scale in the presence of a fault in T . The time evolution of PLS-based GLRT statistic on a semilogarithmic scale in the presence of a fault in T . The time evolution of KPLS-based GLRT statistic on a semilogarithmic scale in the presence of a fault in T . The time evolution of PLS-based Q statistic on a semilogarithmic scale in the presence of simultaneous faults in CA and T . The time evolution of KPLS-based Q statistic on a semilogarithmic scale in the presence of simultaneous faults in CA and T . The time evolution of PLS-based GLRT statistic on a semilogarithmic scale in the presence of simultaneous faults in CA and T . The time evolution of KPLS-based GLRT statistic on a semilogarithmic scale in the presence of simultaneous faults in CA and T . A schematic diagram of data representation at multiple scales [1]. Schematic illustration of MSPCA model. Schematic illustration of proposed MSPCA-based MW-GLRT algorithm.

61

62 67 69 70 70 71 71 71 72 72

73

73

74

74 81 83 86

List of figures

Figure 4-4

Figure 4-5 Figure 4-6 Figure 4-7 Figure 4-8 Figure 4-9 Figure 4-10 Figure 4-11 Figure 4-12 Figure 4-13 Figure 4-14 Figure 4-15 Figure 4-16 Figure 4-17 Figure 4-18 Figure 4-19 Figure 4-20 Figure 4-21 Figure 4-22 Figure 4-23 Figure 4-24 Figure 4-25

The effect of the window length on the performance of the moving-window GLRT (the fault size equals twice the residuals standard deviation). (A) Missed detection rate, (B) false alarm rate and (C) average run length (ARL) vs. window length. Monitoring a fault of magnitude unity using PCA-based T 2 chart. Monitoring a fault of magnitude unity using MSPCA-based T 2 chart. Monitoring a fault of magnitude unity using PCA-based Q chart. Monitoring a fault of magnitude unity using MSPCA-based Q chart. Monitoring a fault of magnitude unity using MSPCA-based GLRT chart. Monitoring a fault of magnitude unity using MSPCA-based MW-GLRT chart (WL = 4). Monitoring a fault of magnitude unity using MSPCA-based MW-GLRT chart (WL = 8). Monitoring a fault of magnitude 1σ using PCA-based T 2 chart. Monitoring a fault of magnitude 1σ using MSPCA-based T 2 chart. Monitoring a fault of magnitude 1σ using PCA-based Q chart. Monitoring a fault of magnitude 1σ using MSPCA-based Q chart. Monitoring a fault of magnitude 1σ using MSPCA-based GLRT chart. Monitoring a fault of magnitude 1σ using MSPCA-based MW-GLRT chart (WL = 4). Monitoring a fault of magnitude 1σ using MSPCA-based MW-GLRT chart (WL = 8). Monitoring TEP fault 12 using PCA-based T 2 chart. Monitoring TEP fault 12 using MSPCA-based T 2 chart. Monitoring TEP fault 12 using PCA-based Q chart. Monitoring TEP fault 12 using MSPCA-based Q chart. Monitoring TEP fault 12 using MSPCAbased GLRT chart. Monitoring TEP fault 12 using MSPCA-based MW-GLRT chart (WL = 4). Monitoring TEP fault 12 using MSPCA-based MW-GLRT chart (WL = 8).

xi

88 89 90 90 90 91 91 91 92 92 92 93 93 93 94 101 102 102 102 103 103 103

xii

List of figures

Figure 4-26 Figure 4-27 Figure 4-28 Figure 4-29 Figure 4-30 Figure 4-31 Figure 4-32 Figure 4-33 Figure 4-34 Figure 4-35 Figure 4-36 Figure 4-37 Figure 4-38 Figure 4-39 Figure 4-40 Figure 4-41 Figure 4-42 Figure 4-43 Figure 4-44 Figure 4-45 Figure 4-46 Figure 4-47 Figure 4-48 Figure 4-49 Figure 4-50

Monitoring a fault of magnitude unity using the MSPCA-based MW-GLRT (WL = 2) chart. Monitoring a fault of magnitude unity using the MSPCA-based MW-GLRT (WL = 4) chart. Monitoring a fault of magnitude unity using the MSPCA-based MW-GLRT (WL = 8) chart. Monitoring a fault of magnitude unity using the MSPCA-based EWMA-GLRT chart. Monitoring a fault of magnitude 1σ using the MSPCA-based MW-GLRT (WL = 2) chart. Monitoring a fault of magnitude 1σ using the MSPCA-based MW-GLRT (WL = 4) chart. Monitoring a fault of magnitude 1σ using the MSPCA-based MW-GLRT (WL = 8) chart. Monitoring a fault of magnitude 1σ using the MSPCA-based EWMA-GLRT chart. Representation of MSPLS fault detection model. Monitoring multiple faults in temperature using PLS-based Q chart. Monitoring multiple faults in temperature using MSPLS-based Q chart. Monitoring multiple faults in temperature using PLS-based GLRT chart. Monitoring multiple faults in temperature using MSPLS-based GLRT chart. Monitoring TEP IDV 2 fault using PLS-based Q chart. Monitoring TEP IDV 2 fault using MSPLS-based Q chart. Monitoring TEP IDV 2 fault using PLS-based GLRT chart. Monitoring TEP IDV 2 fault using MSPLS-based GLRT chart. Representation of MSKPCA fault detection model. Evolutions of SPE, KGLRT, and MS-KGLRT in faulty case. Evolutions of SPE, KGLRT, and MS-KGLRT in faulty case. Evolutions of SPE, KGLRT, and MS-KGLRT in faulty case. Monitoring faults (A) IDV1 , (B) IDV2 , (C) IDV3 and (D) IDV4 using SPE, KGLRT, and MS-KGLRT charts. Monitoring faults (A) IDV5 , (B) IDV6 , (C) IDV7 and (D) IDV8 using SPE, KGLRT, and MS-KGLRT charts. Monitoring faults (A) IDV9 , (B) IDV10 , (C) IDV11 and (D) IDV12 using SPE, KGLRT, and MS-KGLRT charts. Monitoring faults (A) IDV13 , (B) IDV14 , (C) IDV15 and (D) IDV16 using SPE, KGLRT, and MS-KGLRT charts.

104 104 104 105 105 105 106 106 107 109 110 110 111 113 114 114 115 116 118 118 119 121 122 123 124

List of figures

Figure 4-51 Figure 4-52 Figure 4-53 Figure 4-54 Figure 4-55 Figure 4-56 Figure 4-57 Figure 4-58 Figure 4-59 Figure 5-1 Figure 5-2 Figure 5-3 Figure 5-4 Figure 5-5 Figure 5-6 Figure 5-7 Figure 5-8 Figure 5-9 Figure 5-10 Figure 5-11 Figure 5-12 Figure 5-13 Figure 5-14 Figure 5-15 Figure 5-16 Figure 5-17 Figure 5-18 Figure 5-19 Figure 5-20 Figure 5-21 Figure 5-22

Monitoring faults (A) IDV17 , (B) IDV18 , (C) IDV19 , (D) IDV20 and (E) IDV21 using SPE, KGLRT, and MS-KGLRT charts. Representation of MSKPLS fault detection model. The time evolution of the generated data X . Time evolution of detection using PLS-based GLRT method. Time evolution of detection using KPLS-based GLRT method. Time evolution of detection using MSPLS-based GLRT method. Time evolution of detection using MSKPLS-based GLRT method. Time evolution of detection using KPLS-based GLRT method. Time evolution of detection using MSKPLS-based GLRT method. Time evolution of simulated interval-valued data. Measurements and estimations of interval-valued data using CIPCA model. Time evolution of univariate interval GLR. Time evolution of univariate interval weighted GLR. Time evolution of multivariate GLR. Time evolution of multivariate interval weighted GLR. Distillation column interval-valued measurements. Evolution of the VIRE with respect to the number of principal components . Measurements and estimations. Time evolution of univariate interval GLR. Time evolution of univariate interval weighted GLR. Time evolution of multivariate interval GLR. Time evolution of multivariate interval weighted GLR. Ozone concentrations for single-valued and interval-valued data. Evolution of VIRE with respect to the number of principal components. Measurements and estimations of O3 station 1. Measurements and estimations of O3 station 3. Time evolution of univariate interval GLR with a fault on x7 . Time evolution of univariate interval weighted GLR with a fault on x7 . Time evolution of multivariate interval GLR with a fault on x7 . Time evolution of multivariate interval weighted GLR with a fault on x7 . Time evolution of simulated data.

xiii

125 126 128 128 129 129 130 131 131 150 151 151 152 152 152 153 154 154 155 155 155 156 157 157 158 158 159 159 159 160 163

xiv

List of figures

Figure 5-23 Figure 5-24 Figure 5-25 Figure 5-26 Figure 5-27 Figure 5-28 Figure 5-29 Figure 5-30 Figure 5-31 Figure 5-32 Figure 5-33 Figure 5-34 Figure 5-35 Figure 5-36 Figure 5-37 Figure 5-38 Figure 5-39 Figure 5-40 Figure 5-41 Figure 5-42 Figure 5-43 Figure 5-44 Figure 5-45 Figure 5-46 Figure 5-47 Figure 5-48 Figure 5-49 Figure 5-50 Figure 5-51 Figure 5-52 Figure 5-53 Figure 5-54

The time evolution of the MRPCA-based SPE statistic in the presence of faults in x2 and x3 . The time evolution of the MRPCA-based EWMA statistic in the presence of faults in x2 and x3 . The time evolution of the MRPCA-based GLRT statistic in the presence of faults in x2 and x3 . The time evolution of the MRPCA-based EWMA statistic in the presence of faults in x2 and x3 . Measurements and estimations of O3 station 1. Measurements and estimations of O3 station 3. The time evolution of the MRPCA-based SPE statistic in the presence of faults in O3 . The time evolution of the MRPCA-based EWMA statistic in the presence of faults in x2 and x3 . The time evolution of the MRPCA-based GLRT statistic in the presence of faults in O3 . The time evolution of the MRPCA-based EWMA statistic in the presence of faults in O3 . Time evolution of interval-valued simulated variables. Scatter plots of predicted and observed training data y1 . Scatter plots of predicted and observed training data y2 . Evolution of Q1,x in both fault-free and faulty cases. Evolution of Q1,y in both fault-free and faulty cases. Evolution of Q2,x in both fault-free and faulty cases. Evolution of Q2,y in both fault-free and faulty cases. Evolution of Q3,x in both fault-free and faulty cases. Evolution of Q3,y in both fault-free and faulty cases. Evolution of Q4,x in both fault-free and faulty cases. Evolution of Q4,y in both fault-free and faulty cases. Evolution of I Gx in both fault-free and faulty cases. Evolution of I Gy in both fault-free and faulty cases. Distillation column interval-valued measurements. Evolution of Q1,x and Q1,y with a fault on variable x2 . Evolution of Q2,x and Q2,y with a fault on variable x2 . Evolution of Q3,x and Q3,y with a fault on variable x2 . Evolution of Q4,x and Q4,y with a fault on variable x2 . Evolution of IGLRx and IGLRy with a fault on variable x2 . Evolution of Q4,x and Q4,y with a fault on variable y2 . Evolution of IGLRx and IGLRy with a fault on variable y2 . 3-D scatter plot of the generated interval-valued data.

165 165 166 166 167 167 168 168 169 169 177 177 178 178 179 179 180 180 181 181 182 182 183 184 184 185 185 186 186 187 188 199

List of figures

Figure 5-55 Figure 5-56 Figure 5-57 Figure 5-58 Figure 5-59 Figure 5-60 Figure 5-61 Figure 5-62 Figure 5-63 Figure 5-64 Figure 5-65 Figure 5-66 Figure 6-1 Figure 6-2 Figure 6-3 Figure 6-4 Figure 6-5 Figure 6-6 Figure 6-7 Figure 6-8 Figure 6-9 Figure 6-10 Figure 6-11 Figure 6-12

Time evolution of indices T 2 and SPE in KPCA model with a fault on variable x1 . Time evolution of indices T 2 and SPE in IKPCACR model with a fault on variable x2 . Time evolution of x3 . Time evolution of indices T 2 and SPE in KPCA model with IDV-1 fault. Time evolution of indices T 2 and SPE in IKPCA model with IDV-1 fault. Time evolution of indices T 2 and SPE in IKPCA model with IDV-1 fault. Time evolution of indices IGLRTUL in IKPCA model with IDV-1 fault. Time evolution of indices IGLRTCR in IKPCA model with IDV-1 fault. Time evolution of univariate GLRT index based on IKPLSUL model. Time evolution of multivariate GLRT index based on IKPLSUL model. Time evolution of univariate GLRT index based IKPLSCR model. Time evolution of multivariate GLRT index based on IKPLSCR model. Plots of samples of normal and faulty signals. General flow-chart of the multiobjective optimization process. Multiscale EWMA strategy. PF-based MS-EWMA fault detection strategy. State estimation of the variables (A) XDCO , (B) SO and (C) XBH using UKF and PF. State estimation of the variables (A) SNH , (B) SNO and (C) XBA using UKF and PF. Monitoring a bias fault in SO using (A) Shewhart, (B) EWMA, and (C) MS-EWMA methods. Monitoring a drift fault in SO using (A) Shewhart, (B) EWMA, and (C) MS-EWMA methods. Monitoring a fault in X using (A) EWMA, (B) SS-DEWMA, and (C) OWSS-DEWMA charts. Qualitative model of the CSEC (simplified). Estimation of state variables using various state estimation techniques. Monitoring a multiple faults in cadaverine Cadav using (A) EWMA, (B) SS-DEWMA and (C) OWSS-DEWMA charts.

xv

200 200 201 202 203 203 204 204 211 211 213 213 222 232 233 234 236 237 239 241 247 248 249 251

xvi

List of figures

Figure 6-13 Figure 6-14 Figure 1 Figure 2 Figure 3 Figure 4

Monitoring a fault in Cadav using (A) EWMA, (B) SS-DEWMA and (C) OWSS-DEWMA charts. Monitoring a multiple faults in Cadav and Lys using (A) EWMA, (B) SS-DEWMA and (C) OWSS-DEWMA charts. Tennessee Eastman process. Basic distillation column controlled with LV-configuration. Ozone concentrations for the first three stations. Ozone concentrations for the first station, single-valued and interval-valued representations.

253 254 279 283 285 285

List of tables Table 2-1 Table 2-2 Table 3-1 Table 3-2 Table 3-3 Table 4-1 Table 4-2 Table 4-3 Table 4-4 Table 4-5 Table 4-6 Table 4-7 Table 4-8 Table 4-9 Table 4-10 Table 4-11 Table 4-12 Table 4-13 Table 4-14 Table 4-15 Table 4-16 Table 4-17

Summary of MDRs (%), FARs (%) and ARL1 . Missed detection rates (MDR %), False alarm rates (FAR %), and ARL1 values for TEP data. FAR %, MDR %, and ARL1 for the presented fault detection charts (simulated example). FAR % and MDR % for the presented fault detection charts (CSTR process) for a single fault. FAR % and MDR % for the presented fault detection charts (CSTR process) for multiple faults. Summary of missed detection (%), false alarms (%), and ARL1 for simulated data using PCA and MSPCA models. Summary of missed detection rates (%) for TEP data using PCA and MSPCA. Summary of missed detection rates (%) for TEP data using MSPCA-based GLRT and MW-GLRT. Summary of false alarm rates (%) for TEP data using PCA and MSPCA. Summary of false alarm rates (%) for TEP data using MSPCA-based GLRT and MW-GLRT. Summary of ARL1 for TEP data using PCA and MSPCA. Summary of ARL1 for TEP data using MSPCA-based GLRT and MW-GLRT. Summary of missed detection (%), false alarms (%), and ARL1 for simulated data using MSPCA. Summary of MDR (%), FAR (%), and ARL1 . Missed detection rates (%) for TEP data. False alarm rates (%) for TEP data. ARL1 values for TEP data. FAR %, MDR %, and ARL1 for the presented fault detection charts (simulated example). FAR %, MDR %, and ARL1 for the presented fault detection charts (CSTR process). Summary of MDR, FAR, and ARL1 values for TEP data. Summary of missed detection rate (%), false alarm rate (%), and ARL1 . Summary of missed detection rate (%), false alarm rate (%), and ARL1 (CSTR process).

42 44 60 63 63 95 96 97 98 99 100 101 106 110 111 112 113 118 119 120 130 132 xvii

xviii

List of tables

Table 5-1 Table 5-2 Table 5-3 Table 5-4 Table 5-5 Table 5-6 Table 5-7 Table 5-8 Table 5-9 Table 5-10 Table 5-11 Table 5-12 Table 5-13 Table 5-14 Table 5-15 Table 5-16 Table 6-1 Table 6-2 Table 6-3 Table 6-4 Table 6-5 Table 6-6 Table 6-7 Table 6-8 Table 6-9 Table 6-10 Table 6-11 Table 1 Table 2 Table 3 Table 4

VIRE of different interval-valued for CIPCA model. FAR, MDR, and ARL1 . FAR, MDR rates, and ARL1 for the distillation column. FAR, MDR, and ARL1 for air quality data. MSE using CPCA, CIPCA, and MRPCA models. Summary of missed detection (MDR %), false alarms (FAR%) and ARL1 . Summary of missed detection (MDR %), false alarms (FAR %), and ARL1 . FAR % and MDR % for the presented fault detection charts (simulated example). FAR % and MDR % for the presented fault detection charts (scenario 1 of distillation column). FAR % and MDR % for the presented fault detection charts (scenario 2 of distillation column). Summary of MDR and FAR values for the simulation example. Missed detection rate (MDR %) and False Alarm Rate (FAR %) values for TEP data sets. Selected monitored variables in the TE process. Selected output measured variables in the TE process. Missed detection rate (MDR %) and false alarm rate (FAR %) values for TEP data sets using IKPCA-based GLRT approach. Missed detection ratio (MDR %)-based Q and GLRT for the 21 faults of the TEP. Comparison of the MSE for the UKF and PF techniques. Summary of MDR (%) and FAR (%) for bias fault. Summary of MDR (%) and FAR (%) for drift fault. Summary of MDR (%) and FAR (%) for different values of s. Summary of MDR (%) and FAR (%) for different values of a. MDR (%) and FAR (%) evaluation. CSEC parameters. RMSE of estimated states using EKF, UKF, and PF methods. MDR (%) and FAR (%) evaluation. MDR (%) and FAR (%) evaluation. MDR (%) and FAR (%) evaluation. Manipulated variables. Measured variables. Process faults of TEP. Distillation column process variables.

150 153 156 160 164 166 169 183 187 188 201 205 210 211 214 215 238 240 242 242 242 248 249 250 252 252 255 280 281 282 284

About the authors Majdi Mansouri Dr. Majdi Mansouri received the engineering degree in Electrical Engineering in 2006 from the Higher School of Communication of Tunisia (SUPCOM), Tunisia. He received his master degree of Electrical Engineering from the School of Electronics, Informatics and Radiocommunications in Bordeaux (ENSEIRB), France, in 2008. He received his PhD degree of Electrical Engineering from the University of Technology of Troyes (UTT), France, in 2011. In December 2019, he received the degree of HDR (Accreditation To Supervise Research) of Applied Mathematics and Statistics for Electrical Engineering from University of Orleans in France. He joined the Electrical Engineering Program at Texas A&M University at Qatar, in 2011, where he is currently an Associate Research Scientist. He has over ten years of research and practical experience in systems engineering and signal processing. His work focuses on the utilization of applied mathematics and statistics concepts to develop statistical data and model driven techniques and algorithms for modeling, estimation, fault detection, fault classification, monitoring and diagnosis, which aim to improve process operations and enhance the data validation. Dr. Majdi Mansouri is the author of more than 150 refereed journal and conference publications and book chapters, and has worked on several projects as lead principal investigator (LPI) and principal investigator (PI). Dr. Mansouri is a member of IEEE. Mohamed-Faouzi Harkat Dr. Mohamed-Faouzi Harkat received his Eng. degree in Automatic control from Annaba University, Algeria in 1996, his Ph.D. degree from Institut National Polytechnique de Lorraine (INPL), France, in 2003. He is now Professor in the Department of Electronics at Annaba University, Algeria. His research interests include fault diagnosis, process modeling and monitoring, multivariate statistical approaches, and neural networks. Dr. Harkat is the author of more than 100 refereed journal and conference publications and book chapters. Hazem Numan Nounou Dr. Hazem N. Nounou (SM’08) is a professor in the Electrical and Computer Engineering Program and the Assistant Dean for Academic and Student Services at Texas A&M University at xix

xx

About the authors

Qatar. In 2015–2017, he was the holder of Itochu Professorship. He received the B.S. degree (Magna Cum Laude) from Texas A&M University, College Station, in 1995, and the M.S. and Ph.D. degrees from Ohio State University, Columbus, in 1997 and 2000, respectively, all in electrical engineering. In 2001, he was a Development Engineer for PDF Solutions, a consulting firm for the semiconductor industry, in San Jose, CA. Then, in 2001, he joined the Department of Electrical Engineering at King Fahd University of Petroleum and Minerals in Dhahran, Saudi Arabia, as an Assistant Professor. In 2002, he moved to the Department of Electrical Engineering, United Arab Emirates University, Al-Ain, UAE. In 2007, he joined the Electrical and Computer Engineering Program at Texas A&M University at Qatar, Doha, Qatar, where he is currently a professor. He published more than 200 refereed journal and conference papers and book chapters. He served as Associate Editor in technical committees of several international journals and conferences. His research interests include data-based control, monitoring and fault detection, intelligent and adaptive control, control of time-delay systems, system biology, and system identification and estimation. Dr. Nounou is a senior member of IEEE. Mohamed Numan Nounou Dr. Mohamed Nounou is a professor of Chemical Engineering at Texas A&M University-Qatar. He received the B.S. degree (Magna Cum Laude) from Texas A&M University, College Station, in 1995, and the M.S. and Ph.D. degrees from the Ohio State University, Columbus, in 1997 and 2000, respectively, all in chemical engineering. From 2000 to 2002 he was with PDF Solutions, a consulting company for the semiconductor industry, in San Jose, CA. In 2002, he joined the Department of Chemical and Petroleum Engineering at the United Arab Emirates University. In 2006, he joined the Chemical Engineering Program at Texas A&M University at Qatar, where he is currently a professor. He has received research funding over $5M and published more than 190 refereed journal and conference papers and book chapters. He also served as Associate Editor in technical committees of several international journals and conferences. His research interests include process modeling, monitoring, estimation, system biology, and intelligent control. He is a senior member of the American Institute of Chemical Engineers (AIChE) and a senior member of the Institute of Electrical and Electronics Engineers (IEEE).

Acknowledgments

This work was made possible by NPRP grant NPRP9-330-2-140 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.

xxi

List of acronyms

GLRT KGLRT EWMA CUSUM SPE LVR MSE PCA PLS MSPLS MSPCA IPCA KPCA KPLS MSPLS MSKPCA IKPCA CPCA CPLS MRPCA MRPLS CIPCA CIPLS SNR CSTR AQMN TEP FAR MDR ARL1 WL MW

Generalized likelihood ratio test Kernel generalized likelihood ratio test Exponential weighted moving average Cumulative sum Squared prediction error Latent variable regression Mean square error Principal component analysis Partial least square Multiscale partial least square Multiscale principal component analysis Interval principal component analysis Kernel principal component analysis Kernel partial least square Multiscale kernel partial least square Multiscale kernel principal component analysis Interval kernel principal component analysis Centers principal component analysis Centers Partial least square Midpoints-radii principal component analysis Midpoints-radii partial least square Complete information principal component analysis Complete information partial least square Signal-to-noise ratio Continuous stirred tank reactor Air quality monitoring network Tennessee Eastman process False alarm rate Missed detection rate Average run length Window length Moving window

xxiii

Nomenclature Latin letters X ∈ RN ×m x ∈ Rm y ∈ Rp t ∈ R P∈R p I E e m n 

T2 Q T

w

Input data matrix Input vector Output vector Latent variable Eigenvector matrix Eigenvector Identity matrix Residual matrix Residual vector Number of inputs Number of samples Number of retained principal components Hotelling statistic Squared prediction error statistic Generalized likelihood ratio test statistic Window length

Greek letters   λ σ φ(·) ψ φ

Eigenvalue matrix Model error Eigenvalue Standard deviation Nonlinear mapping Wavelet basis function Orthonormal scaling function

xxv

CHAPTER 1

Introduction Process monitoring is essential for proper and safe operation of various industrial processes (like chemical and environmental processes), and it has recently become even more important than ever before. Proper operation of complex chemical processes, such as those in the oil and gas industries, requires careful monitoring of certain key process variables to enhance the productivity of these processes and, more importantly, to avoid disasters in the cases of failure [1]. Many serious accidents have occurred in the past few decades in various chemical and petrochemical plants all over the world. These accidents include the Union Carbide accident [2,3], the Piper Alpha accident [4,5], and the Al-Ahmedi (Kuwait) accident [6]. The Union Carbide accident occurred in Bhopal, India, in 1984, where a major toxic gas leak resulted in over 3000 fatalities and injured 400,000 others in the surrounding neighborhoods [2,3]. The 1988 accident in Piper Alpha (an oil production plant operated by Occidental Chemical in the North Sea) involved an explosion killing 167 men, leaving only 61 survivors [4,5]. The accident in Mina Al-Ahmedi in 2000, on the other hand, was due to a failure in a condensate line in a refinery plant causing the death of 5 people and injuring 50 others [6]. These accidents show that tight monitoring of chemical and petrochemical processes is essential for safe and profitable operation of these plants. Also, monitoring the atmospheric air pollution levels is extremely important for the safety of humans and the marine life, especially in areas with large fuel productions or consumptions and large climate fluctuations [7]. For example, the heat wave in France in the summer of 2003 was linked to an exceptional ozone pollution that affected the entire European community [8]. The consequences of this heat wave demonstrated the importance of having reliable warning systems to detect unexpected pollution levels and any unforeseeable events [8]. Proper monitoring of air pollutants provides useful information that can help people take the needed precautions to avoid undesirable consequences. During the past few decades, a lot of effort has been made to improve air quality. Fault detection is often used for process monitoring. Possible faults can be due to malfunctioning sensor/s (called sensor faults) or to abnormal changes in the process. Sensor faults are usually quantified by sudden (or Data-Driven and Model-Based Methods for Fault Detection and Diagnosis https://doi.org/10.1016/B978-0-12-819164-4.00010-8

Copyright © 2020 Elsevier Inc. All rights reserved.

1

2

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

quick) changes in a small number of process variables. Process faults, on the other hand, are abnormal changes caused by deviations in the process itself. These faults are usually quantified by slow drifts across several variables. The need for monitoring techniques that can accurately and quickly detect abnormal situations (sensor or process faults) has greatly attracted the attention of researchers and engineers. Over the past few decades, several monitoring techniques have been developed [6,9–11]. Generally, fault detection techniques can be classified into two main categories: data-based or model-free techniques and model-based techniques. Model-based monitoring methods rely on comparing the process measurements with knowledge obtained from a mathematical process model, which is usually derived using some fundamental understanding of the process under fault-free conditions. The residuals, which are the differences between the measurements and the model predictions, can be used as an indicator about the existence or absence of faults [12,13]. When the monitored process is under normal operating conditions (no faults exist), the residuals are zero or close to zero in cases of modeling uncertainties and measurement noise. However, when a fault occurs, the residuals deviate significantly from zero indicating the presence of a new condition that is significantly distinguishable from the normal faultless working mode [12,13]. The model-based monitoring approaches include the observer-based methods [14,15], parity space approaches [16–19], and interval approaches [20]. Of course, the effectiveness of these model-based monitoring methods depends on the accuracy of the models used. The effective performance of various practical systems requires proper process operations, such as modeling and monitoring. In these operations, it is often assumed that the state variables of the process model are measurable and that the model parameters are available. In many cases, however, obtaining such measurements or determining the model parameters can be costly, difficult, or sometimes impossible. To deal with this problem, state and/or parameter estimators are often utilized. Several estimation techniques, such as the extended Kalman filter, unscented Kalman filter, and more recently the sequential Monte Carlo method have been developed and utilized in many applications. The classical Kalman filter (KF) was developed in the 1960s [21] and has been widely applied in various engineering and science areas, including communications, control, machine learning, neuroscience, and many others. In the case where the model describing the system is assumed to be linear and Gaussian, the KF provides an optimal solution [22–25]. KF has also been formulated in the context of Takagi–Sugeno

Introduction

3

fuzzy systems, which can be described by a convex set of multiple linear models [26–28]. It is known that KF is computationally efficient; however, it is limited by the nonuniversal linear and Gaussian modeling assumptions. To relax such assumptions, the extended Kalman filter [22,23,29–31] and the unscented Kalman filter [22,23,32–34] have been developed. In extended Kalman filtering the model describing the system is linearized at every time sample (which means that the model is assumed to be differentiable). Therefore, for highly nonlinear models, EKF does not usually provide a satisfactory performance. The UKF, on the other hand, instead of linearizing the model to approximate the mean and covariance matrix of the state vector, uses the unscented transformation to approximate these moments. In the unscented transformation a set of samples (called sigma points) are selected and propagated through the nonlinear model to improve the approximation of these moments and thus the accuracy of state estimation. Other state estimation techniques use a Bayesian framework to estimate the state and/or parameter vector [35]. The Bayesian framework relies on computing the probability distribution of the unobserved state given a sequence of the observed data in addition to the state evolution model. Consider an observed data set y, which is generated from a model defined by a set of unknown parameters z [36]. The beliefs about the data are completely expressed via the parametric probabilistic observation model P(y|z). The learning of uncertainty or randomness of a process is solved by constructing a distribution P(z|y), called the posterior distribution, which quantifies our belief about the system after obtaining the measurements. According to the Bayes rule, the posterior distribution can be expressed as P(z|y) ∝ P(y|z)P(z),

where P(y|z) is the conditional distribution of the data given the model parameter vector z, which is called the likelihood function, and P(z) is the prior distribution, which quantifies our belief about z before obtaining the measurement. Thus the Bayes rule specifies how our prior belief, quantified by the priori distribution, is updated according to the measured data y. Unfortunately, for most nonlinear systems and non-Gaussian noise observations, a closed-form analytic expression of the posterior distribution of the state is untractable [37]. To overcome this drawback, a nonparametric Monte Carlo sampling-based method called the sequential Monte Carlo method (SMC) (also known as particle filtering (PF))

4

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

[38–40] has recently gained popularity. SMC methods approximate the posterior probability distribution by a set of weighted samples, called particles. Since real-world problems usually involve high-dimensional random variables with complex uncertainty, the nonparametric and sample-based estimation of uncertainty (provided by the PF) has thus become quite popular to capture and represent the complex distribution P(z|y) in nonlinear and non-Gaussian models [41]. The model-based methods, mainly the Shewhart chart [42], the exponentially weighted moving average (EWMA) chart [43,44], the cumulative sum (CUSUM) chart [45], and the generalized likelihood ratio test (GLRT) chart [46,47], have been also used to improve the model-based FD capabilities. Whereas the Shewhart chart considers solely the present data sample to evaluate performance, the CUSUM and EWMA chart considers a weighted sum of past observations. The CUSUM chart provides same weight for all past observations, whereas the EWMA chart gives more importance to the more recent observations [48–50]. Both CUSUM and EWMA charts perform almost equally in detecting small mean shift, but the EWMA chart is somewhat easier to set up and operate. Moreover, since EWMA statistic is the weighted average of all previous and present observations, it is less sensitive to the normality assumption [48,51]. However, the classical EWMA chart is not suitable to deal with an extensive range of fault sizes, since it has to be tuned according to different sizes. The improved chart called max-double EWMA (M-DEWMA) has showed higher detection performances than the classical EWMA chart in detecting minor and moderate shifts in the mean and/or variance [52]. The Max-DEWMA control chart considers the highest of the absolute values for two EWMA statistics, one controlling the mean and the other for the variance. It has been presented that the Max-DEWMA chart performed higher than the Max-EWMA chart in detecting shifts in the mean and/or variance. The authors in [53,54], developed an enhanced single chart named sum of squares-DEWMA (SS-DEWMA) chart aims at detecting shifts of all sizes in the mean and/or variance. It has been shown that the SS-DEWMA chart performed higher than the Max-DEWMA chart in detecting shifts in the mean and/or variance, and both of them outperformed the classical EWMA [53,54]. Unfortunately, sometimes it is very difficult to derive accurate models of the monitored systems, especially for complex processes, such as in the

Introduction

5

cases of many chemical and environmental processes. For example, modeling the ozone level is very challenging because of the complexity of the ozone formation mechanisms in the troposphere and the uncertainty about the meteorological conditions. Also, modeling chemical and petrochemical processes is a challenging task because of the complexity and sometimes of the lack of understanding about these processes. In these cases, data-based monitoring techniques are more commonly used. Data-based monitoring methods, on the other hand, rely on the availability of historical data obtained from the monitored fault-free process [10]. These data are first used to build an empirical model, which is then used to detect faults in future data. Data-based monitoring methods include the latent variable regression methods, such as partial least square (PLS) regression, principal component analysis (PCA), independent component analysis (ICA), canonical variate analysis (CVA), [55,10], neural networks [56], fuzzy systems [57], pattern recognition methods [58], and support vector machine (SVM) based methods [59,60]. SVM-based fault detection methods can be applied to nonlinear systems and offer advantages over conventional nonlinear optimization-based techniques. Data-based monitoring methods, especially those that utilize PCA or its extensions, have been widely used in many applications in a very wide range of industries, for example, air quality monitoring [61], chemical industry [62,63], water treatment [64,65], pharmacology [66], biology and biotechnology [67], agriculture [68], health [69], semiconductors [70], and many others. In the scientific synthesis, several objectives are sought. First, the advantages of the hypothesis testing fault detection methods are exploited in the cases where process models are not available by developing linear and nonlinear latent variable regression (LVR) based hypothesis testing fault detection methods. LVR techniques (such as PCA and PLS) are made to achieve further improvements and widen the applicability of the developed methods in practice. Also, kernel LVR (including kernel PCA and kernel PLS) methods are used to deal process nonlinearities. Secondly, to account for uncertainty in the measured data, two approaches are followed. In one approach, multiscale representations of data are utilized to reduce the false alarm rate through enhancing the noise-feature separation in the data and decorrelating autocorrelated errors in the measurements. In another approach, to deal with uncertainty, interval LVR (ILVR), which quantifies uncertainty in the data by defining intervals in which the data may fall, is adopted. Integrating hypothesis testing and ILVR helps develop fault detection techniques that are more sensitive to the presence of faults.

6

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

To deal with scenarios where the process model is available and has a pre-defined structure (obtained using material and energy balances), an improved model-based fault detection technique that aims at enhancing the monitoring of industrial systems is developed. The objectives are twofold. Firstly, a state estimation technique that can accurately estimate the state variables in such systems will be developed; secondly, new fault detection charts based on exponentially weighted moving average (EWMA) will be proposed. Finally, the developed fault detection methods are applied to enhance monitoring various chemical, biological, and environmental processes. This scientific synthesis is organized in seven chapters. Chapter 2 presents detailed descriptions of the developed linear LVR-based generalized likelihood ratio test (GLRT) methods for fault detection. Chapter 3 extends the linear LVR-based GLRT approaches to deal with nonlinearities on the systems by developing nonlinear LVR (including kernel PCA and kernel PLS) based GLRT fault detection techniques. In Chapter 4, multiscale LVR-based fault detection approaches are presented, which improve the monitoring abilities of the classical LV-based techniques. Chapter 5 presents interval LVR-based approaches for monitoring uncertain systems, followed by their applications to chemical and environmental processes. Then, in Chapter 6 the model-driven fault detection techniques based on particle filter (PF) and exponentially weighted moving average (EWMA) and their applications to monitoring biological processes are presented. Finally, a summary of the main conclusions out of this report and the future works are presented in Chapter 7.

References [1] F. Khan, S. Abbasi, Major accidents in process industries and an analysis of causes and consequences, Journal of Loss Prevention in the Process Industries 12 (5) (1999) 361–378. [2] V. Dhara, The Union Carbide disaster in Bhopal: a review of health effects, Archives of Environmental Health 27 (5) (2002) 391–404. [3] B. Bowonder, The Bhopal accident, Technological Forecasting and Social Change 32 (2) (1987) 169–182. [4] L. Cullen, The Public Inquiry into the Piper Alpha Disaster, HMSO, 1990. [5] M. Pate-Cornell, Learning from the Piper Alpha accident: a postmortem analysis of technical and organizational factors, Risk Analysis 13 (2) (1993) 215–232. [6] V. Venkatasubramanian, R. Rengaswamy, S. Kavuri, K. Yin, A review of process fault detection and diagnosis part I: quantitative model-based methods, Computers and Chemical Engineering 27 (2003) 293–311.

Introduction

7

[7] M. Al-Maslamani, Assessment of Atmospheric Emissions Due to Anthropogenic Activities in the State of Qatar, PhD thesis, Institute for the Environment, Brunel University, 2011. [8] M. Poumadere, C. Mays, S.L. Mer, R. Blong, The 2003 heatwave in France: dangerous climate change here and now, Risk Analysis 25 (6) (2005) 1483–1494. [9] Y. Qingsong, Model-Based and Data Driven Fault Diagnosis Methods With Applications to Process Monitoring, PhD thesis, Case Western reserve University, May 2004. [10] V. Venkatasubramanian, R. Rengaswamy, S. Kavuri, K. Yin, A review of process fault detection and diagnosis part III: process history based methods, Computers and Chemical Engineering 27 (2003) 327–346. [11] K. Chaitanya, Data-Based Modeling: Application in Process Identification, Monitoring and Fault Detection, PhD thesis, 2011. [12] M. Kinnaert, Fault diagnosis based on analytical models for linear and nonlinear systems – a tutorial, in: Proceedings of the 15th International Workshop on Principles of Diagnosis, 2003, pp. 37–50. [13] M. Nyberg, C.M. Nyberg, Model Based Fault Diagnosis: Methods, Theory, and Automotive Engine Applications, PhD thesis, 1999. [14] R. Clark, D. Fosth, V. Walton, Detecting instrument malfunctions in control systems, IEEE Transactions on Aerospace and Electronic Systems AES-11 (4) (1975) 465–473. [15] R.J. Patton, P. Frank, R. Clarke (Eds.), Fault Diagnosis in Dynamic Systems: Theory and Application, Prentice-Hall, Inc., 1989. [16] M. Staroswiecki, Redondance analytique, in: Automatique et statistiques pour le diagnostic, Hermes Science Europe, 2001. [17] X. Ding, P. Frank, Fault detection via factorization approach, System Control Letters 14 (5) (1990) 431–436. [18] R.J. Patton, J. Chen, A review of parity space approaches to fault diagnosis, in: Proceedings of SAFEPROCESS’91, 1991, pp. 239–255. [19] E. Chow, A. Willsky, Analytical redundancy and the design of robust failure detection systems, IEEE Transactions on Automatic Control 29 (7) (Jul 1984) 603–614. [20] K. Benothman, D. Maquin, J. Ragot, M. Benrejeb, Diagnosis of uncertain linear systems: an interval approach, International Journal of Sciences and Techniques of Automatic Control Computer Engineering 1 (2) (2007) 136–154. [21] R.E. Kalman, A new approach to linear filtering and prediction problem, Transactions of ASME, Ser. D, Journal of Basic Engineering 82 (1960) 34–45. [22] D. Simon, Optimal State Estimation: Kalman, H∞ , and Nonlinear Approaches, John Wiley and Sons, 2006. [23] M. Grewal, A. Andrews, Kalman Filtering: Theory and Practice Using MATLAB, John Wiley and Sons, 2008. [24] V. Aidala, Parameter estimation via the Kalman filter, IEEE Transactions on Automatic Control 22 (3) (1977) 471–472. [25] L. Matthies, T. Kanade, R. Szeliski, Kalman filter-based algorithms for estimating depth from image sequences, International Journal of Computer Vision 3 (3) (1989) 209–238. [26] G. Chen, Q. Xie, L. Shieh, Fuzzy Kalman filtering, Journal of Information Science 109 (1998) 197–209. [27] D. Simon, Kalman filtering of fuzzy discrete time dynamic systems, Applied Soft Computing 3 (2003) 191–207. [28] H. Nounou, M. Nounou, Multiscale fuzzy Kalman filtering, Engineering Applications of Artificial Intelligence 19 (2006) 439–450.

8

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

[29] S. Julier, J. Uhlmann, New extension of the Kalman filter to nonlinear systems, Proceedings of SPIE 3 (1) (1997) 182–193. [30] L. Ljung, Asymptotic behavior of the extended Kalman filter as a parameter estimator for linear systems, IEEE Transactions on Automatic Control 24 (1) (1979) 36–50. [31] Y. Kim, S. Sul, M. Park, Speed sensorless vector control of induction motor using extended Kalman filter, IEEE Transactions on Industrial Applications 30 (5) (1994) 1225–1233. [32] E. Wan, R.V.D. Merwe, The unscented Kalman filter for nonlinear estimation, in: Adaptive Systems for Signal Processing, Communications, and Control Symposium, 2000, pp. 153–158. [33] R.V.D. Merwe, E. Wan, The square-root unscented Kalman filter for state and parameter-estimation, IEEE International Conference on Acoustics, Speech, and Signal Processing 6 (2001) 3461–3464. [34] S. Sarkka, On unscented Kalman filtering for state estimation of continuous-time nonlinear systems, IEEE Transactions on Automatic Control 52 (9) (2007) 1631–1641. [35] M. Beal, Variational Algorithms for Approximate Bayesian Inference, PhD dissertation, Gatsby Computational Neuroscience Unit, University College London, 2003. [36] V. Šmídl, A. Quinn, The Variational Bayes Method in Signal Processing, SpringerVerlag, Inc., New York, 2005. [37] J. Kotecha, P. Djuric, Gaussian particle filtering, IEEE Transactions on Signal Processing 51 (10) (2003) 2592–2601. [38] G. Storvik, Particle filters for state-space models with the presence of unknown static parameters, IEEE Transactions on Signal Processing 50 (2) (2002) 281–289. [39] A. Doucet, V. Tadi´c, Parameter estimation in general state-space models using particle methods, Annals of the Institute of Statistical Mathematics 55 (2) (2003) 409–422. [40] G. Poyiadjis, A. Doucet, S. Singh, Maximum likelihood parameter estimation in general state-space models using particle methods, in: Proceedings of the American Stat. Assoc., 2005. [41] M. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking, IEEE Transactions on Signal Processing 50 (2) (2002) 174–188. [42] M. Hart, R. Hart, Shewhart control charts for individuals with time-ordered data, Frontiers in Statistical Quality Control 4 (1992) 123. [43] G.J. Ross, N.M. Adams, D.K. Tasoulis, D.J. Hand, Exponentially weighted moving average charts for detecting concept drift, Pattern Recognition Letters 33 (2012) 191. [44] M. Mansouri, A. Al-khazraji, M. Hajji, M.F. Harkat, H. Nounou, M. Nounou, Wavelet optimized EWMA for fault detection and application to photovoltaic systems, Solar Energy 167 (2018) 125–136. [45] E.S. Page, Continuous inspection schemes, Biometrika (1954) 100–115. [46] C. Botre, M. Mansouri, M. Nounou, H. Nounou, M.N. Karim, Kernel PLS-based GLRT method for fault detection of chemical processes, Journal of Loss Prevention in the Process Industries 43 (2016) 212–224. [47] M. Mansouri, M. Nounou, H. Nounou, K. Nazmul, Kernel PCA-based GLRT for nonlinear fault detection of chemical processes, Journal of Loss Prevention in the Process Industries 26 (1) (2016) 129–139. [48] D.C. Montgomery, R. Gerth, Introduction to statistical quality control, IIE Transactions 30 (6) (1998) 571.

Introduction

9

[49] P. Castagliola, G. Celano, S. Fichera, Monitoring process variability using EWMA, in: Springer Handbook of Engineering Statistics, Springer, 2006, pp. 291–325. [50] P.E. Maravelakis, P. Castagliola, An EWMA chart for monitoring the process standard deviation when parameters are estimated, Computational Statistics & Data Analysis 53 (7) (2009) 2653–2664. [51] L. Shu, W. Jiang, S. Wu, A one-sided EWMA control chart for monitoring process means, Communications in Statistics, Simulation and Computation 36 (4) (2007) 901–920. [52] M.B. Khoo, S. Teh, Z. Wu, Monitoring process mean and variability with one double EWMA chart, Communications in Statistics Theory and Methods 39 (20) (2010) 3678–3694. [53] S.Y. Teh, M.B. Khoo, Z. Wu, A sum of squares double exponentially weighted moving average chart, Computers & Industrial Engineering 61 (4) (2011) 1173–1188. [54] T.S. Yin, M.B. Khoo, L.C. Kit, Comparing the performances of the optimal SSDEWMA and Max-DEWMA control charts, Journal of Statistical Modeling and Analytics 1 (2) (2010) 1–9. [55] L. Chaing, E. Russel, R. Braatz, Fault Detection and Diagnosis in Industrial Systems, Springer, London, 2001. [56] P. Subbaraj, B. Kannapiran, Artificial neural network approach for fault detection in pneumatic valve in cooler water spray system, International Journal of Computer Applications 9 (7) (2010) 43–52. [57] A. Dexter, M. Benouarets, Generic approach to identifying faults in HVAC plants, ASHRAE Transactions 102 (1) (1996) 550–556. [58] K. Mohammadi, R. Asgary, Pattern recognition and fault detection in MEMS, Computer Recognition Systems Advances in Soft Computing 30 (2005) 877–884. [59] D. Dehestani, F. Eftekhari, Y. Guo, S. Ling, S. Su, H. Nguyen, Online support vector machine application for model based fault detection and isolation of HVAC system, International Journal of Machine Learning and Computing 1 (1) (2011) 66. [60] C. Batur, L. Zhou, C.-C. Chan, Support vector machines for fault detection, in: Proceedings of the IEEE Conference on Decision and Control, Las Vegas, NV, 2002, pp. 1355–1356. [61] M. Harkat, G. Mourot, J. Ragot, An improved PCA scheme for sensor FDI: application to an air quality monitoring network, Journal of Process Control 16 (6) (2006) 625–634. [62] B. Wise, N. Gallagher, The process chemometrics approach to process monitoring and fault detection, Journal of Process Control 6 (1996) 329–348. [63] A. Simoglou, E. Martin, A. Morris, Multivariate statistical process control in chemicals manufacturing, in: IFAC Conference SAFEPROCESS, Hull, UK, 1997, pp. 21–27. [64] J. George, Z. Chen, P. Shaw, Fault detection of drinking water treatment process using PCA and Hotelling’s T 2 chart, World Academy of Science, Engineering and Technology 50 (2009) 970–975. [65] Y. Tharrault, Diagnostic de fonctionnement par analyse en composantes principales: Application à une station de traitement des eaux usées, PhD dissertation, National Polytechnic Institute of Lorraine, 2008. [66] C. Nascimento, J. Martins, Pharmacophoric profile: design of new potential drugs with PCA analysis, in: P. Sanguansat (Ed.), Principal Component Analysis – Multidisciplinary Applications, 2012.

10

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

[67] F. Reverter, E. Vegas, J. Oller, Kernel methods for dimensionality reduction applied to the «omics» data, in: P. Sanguansat (Ed.), Principal Component Analysis – Multidisciplinary Applications, 2012. [68] D. Magyar, G. Oros, Application of the principal component analysis to disclose factors influencing on the composition of fungal consortia deteriorating remained fruit stalks on sour cherry trees, in: P. Sanguansat (Ed.), Principal Component Analysis – Multidisciplinary Applications, Intech, 2012. [69] E. Belasco, B. Philips, G. Gong, The health care access index as a determinant of delayed cancer detection through principal component analysis, in: P. Sanguansat (Ed.), Principal Component Analysis – Multidisciplinary Applications, Intech, 2012. [70] J. Yu, Fault detection using principal components-based gaussian mixture model for semiconductor manufacturing processes, IEEE Transactions on Semiconductor Manufacturing 24 (3) (2011) 471–486.

CHAPTER 2

PCA and PLS-based generalized likelihood ratio for fault detection Contents 2.1. PCA and PLS-based generalized likelihood ratio for fault detection

12

2.1.1 Introduction

12

2.1.2 Principal component analysis (PCA)

13

2.1.2.1

Modeling using PCA

13

2.1.2.2

How many principal components to use?

15

2.1.3 Fault detection using PCA method

16

2.1.4 Statistical hypothesis testing

18

2.1.4.1

Fault detection using hypothesis testing

2.1.4.2

19

Generalized likelihood ratio GLRT

20

2.1.5 Fault detection using a PCA-based GLRT

21

2.1.6 PCA-based GLRT and applications

23

2.1.6.1

Ozone monitoring using PCA-based GLRT

23

2.1.6.2

Description of the training ozone data

24

2.1.6.3

Ozone modeling using PCA

25

2.1.6.4

Monitoring the ozone concentrations

26

2.1.6.5

Process monitoring of a simulated continuously stirred tank reactor (CSTR)

32

2.1.6.6

Modeling the CSTR data using PCA

32

2.1.6.7

Simulation results

33

2.1.7 Conclusion

36

2.2. PLS-based generalized likelihood ratio for fault detection

37

2.2.1 Introduction

37

2.2.2 Partial Least Square (PLS) method

38

2.2.3 PLS-based GLRT for fault detection

40

2.2.4 PLS-based GLRT fault detection and applications

41

2.2.4.1

Fault detection of continuously stirred tank reactor process

2.2.4.2

Fault detection of Tennessee Eastman Process

2.2.5 Conclusions

43 43

References Data-Driven and Model-Based Methods for Fault Detection and Diagnosis https://doi.org/10.1016/B978-0-12-819164-4.00011-X

41

45 Copyright © 2020 Elsevier Inc. All rights reserved.

11

12

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

2.1 PCA and PLS-based generalized likelihood ratio for fault detection 2.1.1 Introduction Monitoring processes is becoming increasingly important to maintain reliable and safe process operation [1,2]. Among the most important applications of process safety are those related to environmental and chemical processes. A critical fault in a chemical or petrochemical process may not only cause a degradation in the process performance or lower its product quality, but it can also result in catastrophes that may lead to fatal accidents and substantial economic losses. Therefore detecting anomalies in chemical processes is vital for their safe and proper operations. Also, abnormal atmospheric pollution levels negatively affect the public health, animals, plants, and climate and damage the natural resources. Therefore monitoring air quality is also crucial for the safety of humans and environment. Thus the main aim of this proposal is to develop enhanced fault detection methods that can improve air quality monitoring and the operation of chemical processes. When a model of the monitored process is available, model-based monitoring methods rely on comparing the process measured variables with the information obtained from the available model. Unfortunately, accurate models may not be available, especially for complex chemical and environmental processes. In the absence of a process model, latent variable models, such principal component analysis (PCA) and partial least squares (PLS), have been successfully used in monitoring processes with highly correlated process variables [3,4]. When a process model is available, on the other hand, statistical hypothesis testing methods, such as the generalized likelihood ratio test (GLRT), have shown good fault detection abilities. In this chapter, we propose to develop an enhanced fault detection method using statistical hypothesis testing and then use this method to improve monitoring various chemical and environmental processes. Hypothesis testing has been proven to be an effective fault detection approach in the presence of process models [5–11]. In this chapter, we propose to extend the advantages of hypothesis testing in the cases where process models are not available. In fact, we will utilize PCA as a modeling framework for hypothesis testing-based fault detection. The developed PCA-based GLRT fault detection algorithm provides optimal properties by maximizing the fault detection probability for a particular false alarm rate [12].

PCA and PLS-based GLRT for fault detection

13

The developed method utilizes PCA as a modeling framework and uses hypothesis testing for fault detection. The performance of the developed PCA-based GLRT fault detection method will be demonstrated and compared to the conventional PCA method. Two applications will be presented, one using real ozone data and the other using continuously stirred tank reactor (CSTR) data. The rest of the chapter is organized as follows. Section 2.1.2 describes the principal component analysis method. Section 2.1.3 presents the application of PCA to fault detection. GLRT chart, which is used for detecting faults, is presented in Section 2.1.4. Section 2.1.5 presents the developed PCA-based GLRT fault detection technique. Section 2.1.6 validates the developed algorithm through its applications. Finally, Section 2.1.7 concludes the chapter.

2.1.2 Principal component analysis (PCA) PCA is one of the most well-known multivariate statistical modeling techniques and is widely used in various disciplines, such as in data compression, face recognition, filtering, image analysis, and fault detection [13–16,12]. PCA is a linear orthogonal projection technique that projects measurements represented in a multidimensional space of dimension m (where m is the number of observed variables) onto a lower-dimensional space (principal component space of dimension l < m) by maximizing the variances of the projections. Several extensions of PCA have also been developed, which include recursive PCA (RPCA) [17], multiscale PCA (MSPCA) [18], moving window PCA (MWPCA) [19], multiway PCA [20], dynamic PCA (DPCA) [21], and nonlinear PCA (NLPCA) [22,23]. Process monitoring using PCA and its extensions has been applied in many applications, for example, in air quality monitoring [24], chemical processes [25,26], water treatment [27,28], semiconductors [29], and many others.

2.1.2.1 Modeling using PCA Principal component analysis (PCA) is a widely used multivariate statistical method, which can transform the original variables into a set of new orthogonal variables, so that most information is contained in the first few components with largest variance [30–32]. Let x(k) ∈ Rm denote a sample measurement of a vector of m sensors at time k. Assuming that there are N samples for each sensor, a data matrix X = [x(1) x(2) . . . x(N )]T ∈ RN ×m is composed with each row representing

14

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

a sample x(k)T . PCA can be performed through the eigenvalue decomposition of the covariance matrix of X. First, standardize the matrix X to zero mean and unit variance, then perform the eigenvalue decomposition of the covariance matrix,  = P P T ,

(2.1)

where  is the covariance matrix of X,  = diag(λ1 , λ2 , . . . , λm ) is the diagonal eigenvalue matrix (λ1 ≥ λ2 ≥ · · · ≥ λm are the eigenvalues), P = (p1 , p2 , . . . , pm ) is the eigenvector matrix (pi , i = 1, 2, . . . , m, represent the normalized and mutually orthogonal eigenvectors associated with the eigenvalues). Eigenvalues, eigenvectors, and principal components matrices can be partitioned as  =

P=



0

0

m−





P Pm−

,

T=



(2.2)

,





T Tm−

,

(2.3)

where  represents the number of retained principal components to be kept in the PCA model. Then the matrix X is transformed into independent variables T through T = XP ,

(2.4)

where T = [t(1), . . . , t(N )] contains the principals components, which are orthogonal to each other. By taking into account the first  highest eigenvalues and their corresponding eigenvectors the matrix X is decomposed as X = T PT + E,

(2.5)

where T = XP , and E is the residual matrix. A sample vector x(k) ∈ Rm can be projected on the principal and residual subspaces: xˆ (k) = P t (k) = C x(k),

(2.6)

PCA and PLS-based GLRT for fault detection

15

where xˆ (k) is the estimation vector of x(k), C = P PT , and t (k) = PT x(k) ∈ R

(2.7)

is the vector of the first  scores of latent variables. The vector of m −  last scores of latent variables, which represents the projection of measurement data in the residual subspace, is given by tm− (k) = PmT− x(k) ∈ Rm− .

(2.8)

The residual vector can be also presented in the measurement space Rm and is given by e(k) = x(k) − xˆ (k) = (I − C )x(k) = Pm− PmT− x(k) ∈ Rm .

(2.9)

In summary, the PCA model is determined based on an eigendecomposition of the covariance matrix  and the selection of the number  of components to be retained.

2.1.2.2 How many principal components to use? The accuracy of the PCA model depends on a good estimation of the number of retained principal components (PCs), . If the number of PCs is underestimated, then important features in the data would be left out, which degrades the prediction accuracy of the PCA model. If, on the other hand, the number of PCs is overestimated, then more noise would be retained, which masks important information in the data. Thus making accurate estimation of the number of retained PCs is essential. Indeed, in most practical cases (noisy measurements) the small eigenvalues indicate the existence of linear or quasilinear relations among the process variables. However, the distinction between significant or insignificant eigenvalues may not be obvious due to modeling errors (disturbances and nonlinearities) and noise. Most methods to determine the number of principal components are rather subjective in the general practice of PCA [33]. Other methods are based on criteria actually used in system identification (Akaike information criterion, minimum description length, . . . ) to determine the system order and emphasize the approximation of the data matrix X. Various techniques have been developed to estimate the number of PCs. For example, the Scree plot [34,35] is a graphical technique,

16

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

in which the eigenvalues are plotted in a descending, and the cutoff is determined by looking for a large gap or an “elbow” in the graph [34,12]. Another technique for estimating the number of PCs is the cumulative percent variance (CPV), in which the smallest number of PCs that capture a certain percentage of the total variance (e.g., 90%) is chosen [34,12]. The CPV is defined as follows: 

CPV () =

i=1 λi

trace ()

× 100.

(2.10)

Cross-validation is another popular criterion for choosing the number of PCs [34,36], which is based on minimizing a quantity called PRESS that represents the sum of squared errors between the measured and the approximated data. Other approaches include the parallel analysis, sequential tests, resampling, and profile likelihood [35,37,34]. Qin and Dunia [33] proposed to determine  by minimization of the variance of the reconstruction error in the residual subspace. This variable reconstruction consists in estimating a variable from other plant variables using the PCA model, that is, using the redundancy relations between this variable and the others. The reconstruction accuracy is thus related to the capacity of the PCA model to reveal the redundancy relations among the variables, that is, to the number of principal components.

2.1.3 Fault detection using PCA method As Eq. (2.5) shows, a vector x can be represented using PCA as the sum of two orthogonal vectors, an approximated vector xˆ and a residual vector e, which correspond to the projections onto the principal component subspace Sp and the residual subspace Sr , respectively (see Fig. 2.1). Thus the   matrices P P T and I − P P T span the principal component and residual   subspaces, respectively, and the vectors xˆ = P P T x and e = I − P P T x are orthogonal, that is, xˆ T e = 0. The residual vector e is usually small in magnitude in a fault-free situation, but it can largely increase in the presence of a fault. In fault detection using PCA a PCA model is first constructed using faultless data representing the normal operation of the process, and then the PCA model is used to detect faults using one of the detection indices, such as the T 2 or Q statistics. Let us observe the characteristic of outliers (Figs. 2.2 and 2.3). A system with two variables (x1 and x2 ) is considered. With the gray observations, the

PCA and PLS-based GLRT for fault detection

17

Figure 2.1 Geometric Interpretation of PCA.

Figure 2.2 Principle of PCA projection.

Figure 2.3 Outliers detection using PCA.

PCA model is found, that is, the principal space P1 and the residual space P2 . Now two outliers are considered (green (light gray in print version) and red (dark gray in print version) observations). Then we notice that the red (dark gray in print version) observation has the same projection onto the principal space P1 as normal observations, whereas the projection of the green (light gray in print version) observation is different. In the residual space P2 , we notice that the projection of the green (light gray in print version) observation is identical with the projection of the normal observation, whereas the projection of the red (dark gray in print version) observation is different. Then the green (light gray in print version) outliers can be detected using the principal space, and the red (dark gray in print version) outliers can be detected using the residual space.

18

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

The T 2 statistic quantifies the variations in the PCs at different time instances and is defined as [38,12] T 2 = xT P  −1 P T x,

(2.11)

where the matrix  = diag(λ1 , λ2 , . . . , λ ) is a diagonal matrix containing the eigenvalues associated with the  retained principal components. For new data, when the value of T 2 exceeds the value of the threshold T,2 N ,α , a fault is declared [38,12]. As pointed out in [39], an increase in T 2 alone might be due to normal changes in the process (i.e., may be just a shift in the process operating conditions) and thus may cause false alarms. In addition, in a previous study [40] the authors have shown that the T 2 statistic can result in false negatives (no detection) due to the latent space sometimes being insensitive to small process upsets, which is because each latent variable is a combination of all process variables. On the other hand, the Q statistic defined as [13] Q = e2 = (I − P P T )x2

(2.12)

measures the projection of a data sample on the residual subspace, which provides an overall measure of how the data sample fits the PCA model. When a vector of new data is available, then the Q statistic is computed and then compared to the value of the threshold Qα given in [41]. If the confidence limit is violated, then a fault is declared. The Q statistic is usually more preferred than T 2 in fault detection because it is more sensitive to fault with smaller magnitudes. However, the Q statistic is also more sensitive to modeling errors, and its performance largely depends on the choice of the number  of retained principal components [42,34]. In the examples presented later in Sections 2.1.6.1 and 2.1.6.5, the Q statistic is used as a benchmark for process monitoring using PCA. To overcome these limitations of the T 2 and Q statistics, we have developed an alternative fault detection approach, in which PCA is used as a modeling framework for fault detection using hypothesis testing. More details about hypothesis testing and how it can be utilized in fault detection are presented in the next section.

2.1.4 Statistical hypothesis testing This section presents the basic principles of hypothesis testing and how it can be utilized to detect faults.

PCA and PLS-based GLRT for fault detection

19

2.1.4.1 Fault detection using hypothesis testing Fault detection can be formulated as a binary hypothesis testing problem since the main objective in fault detection is to make a yes/no decision about the presence or absence of a fault. The solution to this hypothesis testing problem should perform a tradeoff between two incorrect decisions: a false alarm (i.e., false rejection of the null hypothesis, H0 ) and no detection (i.e., missed acceptance of the alternative hypothesis, H1 ). Assume that Y is a random variable with distribution Pθ belonging to a parametric family of distributions P = {Pθ }, where θ ∈  are possible faults. Then there are two possible types of hypotheses to be considered, a simple hypothesis and a composite hypothesis. “The simple hypothesis is characterized by a specified value of the parameter vector θ = θi , and the null and alternative hypotheses H0 and H1 correspond to the absence and existence of fault(s), respectively. The composite hypothesis, on the other hand, assumes a set of parameters θ ∈ i , indicating the possible existence of faults with dif ferent magnitudes.” Here we assume that 0 1 = ∅, which means that any measured data sample can be either faulty or fault-free. The composite hypotheses, on the other hand, are more practical than the simple ones because they can account for the possibility of having multiple faults with different magnitudes. “A statistical test between hypotheses is a measurable mapping δ : RN → {H0 , H1 } from the observations space onto a set of hypotheses. The quality of a test δ can be characterized by two measures, the probability of a false alarm and the power function. The probability of a false alarm α = sup Pθ (δ = H0 ) is the probability of deciding H1 when H0 θ ∈0

is true [6], and the power function βδ (θ ) = Pθ (δ = H1 ) is the probability of deciding H1 when H1 is true.” Of course, the value of the false alarm probability α should be as small as possible, and the value of the power function βδ (θ ) should be as large as possible for every θ ∈ 1 . The best statistical test (called uniformly most powerful (UMP) test) is the one with the highest power among all possible tests for a given false alarm probability [12,7,6, 8]. In the case where θ is a vector the challenge becomes to find a rich enough optimal statistical test over a set of alternatives. “Unfortunately, UMP tests rarely exist, except when the parameter θ is scalar, the family of distributions P = {Pθ , θ ∈ } has a monotone likelihood ratio, and the test is one-sided, namely H0 = {θ ≤ θ0 } and H1 = {θ > θ1 } with θ1 > θ0 [6,7]. To deal with this problem of a vector parameter θ , Wald [9] proposed the uniformly best constant powerful (UBCP) test in which he imposed an additional constraint on the class of considered tests, namely, a constant power

20

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

function βθ over a family of surfaces defined over the parameter space .” Another way to solve the composite hypothesis testing problem is using the GLRT, which is described next.

2.1.4.2 Generalized likelihood ratio GLRT The GLRT is an important statistical method that can be used to solve composite hypotheses testing problems by maximizing the likelihood ratio function over all possible faults [8,7,10–12]. Consider the fault detection problem where a measured vector E ∈ RN follows one of the two Gaussian distributions, N (0, σ 2 IN ) or N (θ = 0, σ 2 IN ), where θ is the mean vector (which is also the value of the fault), and σ 2 > 0 is the variance, which is assumed to have a known value. The hypothesis testing problem seeks to decide between two hypotheses, the null hypothesis and the alternative hypothesis:

H0 = {E ∼ N (0, σ 2 IN )} H1 = {E ∼ N (θ, σ 2 IN )}

(null hypothesis); (alternative hypothesis).

(2.13)

The GLRT estimates the unknown parameter θ using maximum likelihood estimation by maximizing the generalized likelihood ratio G(E) as follows: sup fθ (E) G(E) = 2 log

θ ∈RN

(2.14)

fθ =0 (E)

E − θ 22 E22 = 2 log sup exp − / exp − 2σ 2 2σ 2 θ 1 = min E − θ 22 + E22 θ σ2

  1 1  2 ˆ 22 + E22 =  , E − θ E   = 2 σ2 σ2

where, θˆ = arg minE − θ22 = E is the maximum likelihood estimate of θ

θ ,  · 2 is the Euclidean norm, and fθ (E) =

1 N (2π ) 2 σ N

  exp − 2σ1 2  E − θ 22

is the probability density function of Y . It is important to note that in this derivation, we maximized the likelihood function by maximizing its natural logarithm due to the fact that the logarithmic function is monotonic. Thus

PCA and PLS-based GLRT for fault detection

21

the GLRT decides between the hypotheses H0 and H1 as follows:

δ(E) =

H0 if G(E ) < Gα , H1 else.

(2.15)

This means that knowing the distribution of the decision function G(E) under the null hypothesis H0 allows the design of a statistical test with a desired false alarm rate α , where the threshold Gα is selected to satisfy the following false alarm probability: P0 ((E) ≥ Gα ) = α,

(2.16)

where P0 (A) is the probability of an event A when E is distributed according to the null hypothesis H0 . Since E is assumed to be normally distributed (see Eq. (2.13)), the statistic G follows a chi square (χ 2 ) distribution with m −  degrees of freedom. This chi-square distribution is central under H0 and noncentral under H1 with a parameter of noncentrality equal to κθ = σ12 θ 22 . Also, the power function of δ can be computed as follows: βδ

= Pθ (δ(E) = H1 ) .

(2.17)

In the next section the GLRT will be integrated with PCA to extend its fault detection abilities for the case where a process model is not available.

2.1.5 Fault detection using a PCA-based GLRT Several fault detection techniques require the availability of accurate process models. These models, however, may not always be available because of the complexity and high dimensionality of the monitored systems. To deal with this problem, PCA can be used to represent a matrix of process data as the sum of two orthogonal parts (an approximated data matrix and a residual data matrix) as shown in Eq. (2.9). This PCA-based representation of the data can provide a model that can be used in model-based fault detection. The model-based fault detection problem usually consists of two main steps: the generation of residuals based on the use of a process model, and then the evaluation of these residuals using a statistical test. The objective behind developing the PCA-based GLRT hypothesis testing fault detection method presented in this section is to have a fault detection technique insensitive to modeling errors and measurement noise but sensitive to the presence of faults, which can be achieved by integrating PCA and GLRT hypothesis testing. In this developed fault detection method the

22

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 2.4 A schematic diagram of the PCA-based GLRT fault detection method.

GLRT provides a fault detection framework that has the ability to handle the presence of noise and uncertainties in the data through the utilization of hypothesis testing to decide between two hypotheses H0 (the absence of faults) and H1 (the existence of faults). Since the PCA model expresses a data matrix as the sum of an approximated matrix and a residual matrix, the residuals obtained using PCA can be evaluated using the GLRT to detect faults. To do this, a PCA model is first estimated using a set of fault-free training data obtained under normal operating conditions, and then the PCA model is used to generate the residuals, which are evaluated using the GLRT as shown in Fig. 2.4. Thus this PCA-based GLRT fault detection technique seeks to detect the presence of an additive fault vector θ , which is not explained by the PCA model (2.6), with the highest detection probability for a given probability of false alarms. The detection of this additive fault vector can be formulated as a hypothesis testing problem, considering two possible hypotheses: a null hypothesis H0 (where a measured vector x is fault-free) and an alternative hypothesis H1 (where x contains a fault, and thus x is no longer explained by the PCA model estimated using the fault-free data). ˆ and Let the measured, approximated, and residual matrices X, X, E, be defined as X = [x1 x2 . . . xm ], Xˆ = [xˆ 1 xˆ 2 . . . xˆ m ], and E = [e1 e2 . . . em ], where xj , xˆ j , and ej are the jth columns of the matrices ˆ and E, respectively. The sample vector for normal operating condiX, X, tions is denoted by x∗ , which is unknown when a fault has occurred. In the presence of a process fault θ the sample vector x is represented by the following expression: x = x∗ + θ.

(2.18)

PCA and PLS-based GLRT for fault detection

23

In the absence of a fault the residual vector can be expressed as e = x∗ − xˆ,

(2.19)

whereas in the presence of a fault θ the residual vector can be expressed as e = x − xˆ = x∗ − xˆ + θ.

(2.20)

In Eq. (2.19) the residual vector is assumed to follow a normal distribution. Thus the fault detection problem (which seeks to detect the presence of an additive bias vector θ in the residual vector e) can be treated as a hypothesis testing problem considering two hypotheses: a null hypothesis H0 , where e is fault-free, and an alternative hypothesis H1 , where e contains a fault. This hypothesis testing problem can be formulated as follows:

  H0 = {e ∼ N 0, σ 2 In } H1 = {e ∼ N (θ, σ 2 In )}

(null hypothesis); (alternative one).

(2.21)

Using the theory developed in Section 2.1.4.2, now we can develop a PCA-based GLRT fault detection algorithm that can be used to detect faults in the residuals (obtained from the PCA model) using hypothesis testing. The developed fault detection algorithm is presented in Algorithm 1. The advantages of the developed PCA-based GLRT fault detection algorithm will be demonstrated using two examples. In the first example (presented in Section 2.1.6.1), it will be used to monitor the ozone level using real data obtained from different ozone surveillance network stations in the region of Upper Normandy, France, and in the second example (presented in Section 2.1.6.5), it will be used to monitor the operation of a continuously stirred tank reactor (CSTR).

2.1.6 PCA-based GLRT and applications 2.1.6.1 Ozone monitoring using PCA-based GLRT In this section, the developed PCA-based GLRT fault detection algorithm is utilized to detect abnormalities in ozone measurements caused by air pollution or due to any incoherences between the different network sensors in the Upper Normandy region in France. The performance of the developed method is compared to that of the conventional PCA.

24

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Algorithm 1 PCA-based GLRT fault detection algorithm. Input: Training fault-free data set X under normal operation conditions (NOC), Set a fixed false alarm probability α , 1. Data preprocessing: data standardization, 2. Build the PCA model: Compute the covariance  and its eigendecomposition  = P P T , Determine the number  of principal components to be used, P , 4. Decompose X as X = Xˆ + E and compute residuals E = X (I − P PT ), 5. Compute the GLR chart G(E) as shown in Eq. (2.14), 6. Compute the decision threshold Gα for the GLRT, 7. Test the new data vector x, Scale the new data vector x, Generate a residual vector e using PCA model, Compute the GLRT chart G(e), 8. Check for faults: if G(e) > Gα , declare a fault.

2.1.6.2 Description of the training ozone data The ozone data used in this study were measured in the Upper Normandy region from seven different monitoring stations. The ozone concentration data are measured every fifteen minutes to minimize any spatial or temporal sampling problems. Measured fault-free ozone data (collected between August 11 and August 19, 2006, for a total of 773 observations) were used to develop a PCA reference model. Plots of these ozone measurements and their corresponding autocorrelation functions (ACF) are shown in Figs. 2.5A and 2.5B, respectively. Only the ozone measurements from three stations (SRC, QUI, and ND2) are plotted for better readability of the figures. These data show a behavior that is similar to the data obtained from the others network stations. Fig. 2.5B shows a clear periodicity of the ACF every 24 hours (period between adjacent peaks in the ACF). This periodic behavior is related to the diurnal cycle of ozone, which is primarily caused by the diurnal tem-

PCA and PLS-based GLRT for fault detection

25

Figure 2.5 (A) Quarter-hourly ozone measurements and (B) ACF of ozone measurements.

perature cycle that affects the ozone levels. Fig. 2.5B also shows a similarity between the autocorrelation functions for all ozone measurements from the three network stations. These high-dimensional cross-correlated ozone measurements will be modeled using PCA as shown in the next section.

2.1.6.3 Ozone modeling using PCA As described in Algorithm 1, the PCA-based GLRT fault detection method requires constructing a PCA model from fault-free data. Therefore, the fault-free ozone training data described earlier were used to construct a PCA reference model to be used in fault detection. The fault-free ozone data were arranged as a matrix X having 773 rows (samples) and 7 columns (ozone measurements from seven monitoring stations). These data are first scaled (to have zero mean and unit variance) and then are used to construct the PCA model.

26

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 2.6 Variance captured by each principal component.

Figure 2.7 PC2 versus PC1.

In PCA, most of the important variations in the data are usually captured in few principal components corresponding to the largest eigenvalues. In this work, the cumulative percent variance (CPV) method is used to determine the optimum number of retained principal components. Using a CPV threshold value of 90%, only the first two principal components (which capture 86.88% and 4.34% of the total variations in the data as shown in Fig. 2.6) will be retained. Also, Fig. 2.7, which shows a bivariate plot of the two principal components, clearly shows that these two principal components are uncorrelated. To illustrate the quality of the constructed PCA model, in Figs. 2.8 and 2.9, we compare the predicted ozone concentrations (by the PCA model constructed using the scaled data) and the scaled ozone measurements. Figs. 2.8 and 2.9 show that the PCA model provides satisfactory predictions of the ozone levels considering the complexity of the ozone formation process. However, for some variables, such as those obtained from stations ND2, TAN, and QUI, we can observe some modeling errors. These modeling errors will have implications on fault detection as will be shown later.

2.1.6.4 Monitoring the ozone concentrations In this section, we will assess the fault detection abilities of PCA and the developed PCA-based GLRT fault detection algorithm using the Upper Normandy ozone data described earlier. Two case studies will be performed. However, before discussing these case studies, let us first distinguish between two types of anomalies (or atypical peaks) that are usually encountered in ozone measurements, true and false anomalies.

PCA and PLS-based GLRT for fault detection

27

Figure 2.8 PCA model prediction of ozone concentrations for the first four network stations.

28

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 2.9 PCA model prediction of ozone concentrations for the last three network stations.

1- True anomalies correspond to peaks in the ozone levels due to the production of photochemical ozone. The formation of a true peak of ozone requires certain conditions, such as high humidity and temperature, to promote the formation of ozone and low wind speeds to accumulate high pollution levels. These peaks are usually large with a range of several hours (due to the long reaction times needed for the gradual formation of photochemical ozone). Therefore this type anomalies usually exhibit bell-shaped curves.

PCA and PLS-based GLRT for fault detection

29

2- False anomalies are usually observed during the summer times, where the ozone concentration abruptly increases (to be in the range of 150 µg/m3 to 600 µg/m3 ) for short periods of time (around one hour). These abnormal changes are sharply pointed and different from those observed in the case of photochemical ozone. The presence of this type anomalies can be due to: a) malfunctioning sensor(s), b) movement of ozone produced elsewhere in the region, c) the intrusion of stratospheric ozone into the low troposphere, and others [43, 34]. As mentioned earlier, in this section, we present two case studies. The first case study involves a false anomaly, which corresponds to a simple sensor fault in station QUI. The second case study, on the other hand, involves multiple true anomalies, where faults simultaneously occur in more than one station. Note that all anomalies in these cases studies are real anomalies occurred in practice. Case study 1: Sensor fault detection – a simple false anomaly To compare the abilities of the two fault detection methods (PCA and PCA-based GLRT), a PCA model was first developed using the fault-free data, which was then used to detect possible simple sensor faults using unseen testing data. The testing data set consists of 361 data samples (measured in August 3–7, 2006) which are completely independent from the training data. In this case the possibility of only a single fault (i.e., in one variable) is considered. This testing data contains abnormal ozone measurements in station QUI between sample numbers 289 and 290 with a maximum intensity level of 350 µg/m3 occurring on August 6 at 05:00 pm, as per the information obtained from the Upper Normandy air monitoring association. Fig. 2.10, which plots the value of the Q statistic based on the testing data, shows that this simple fault is detected by exceeding the threshold value. The dotted line represents the detection threshold Qα , which is calculated to be 1.439 using the training fault-free data and a 95% confidence level. However, Fig. 2.10 also shows that the Q statistic resulted in several false alarms, which are indicated by the red (dark gray in print version) crosses. These false alarms are partly due to the modeling errors observed in Figs. 2.8 and 2.9. The application of the developed PCA-based GLRT fault detection method, on the other hand, resulted in detecting the fault with a smaller number of false alarms compared to the conventional PCA method as illustrated in Fig. 2.11. The GLRT threshold value is found to

30

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 2.10 Monitoring a simple sensor fault using PCA-based Q.

Figure 2.11 Monitoring a simple sensor fault using PCA-based GLRT.

be h(α) = 1.145 for a false alarm probability α of 5%. The false alarms observed in the PCA-GLRT are mainly due to errors in the estimated linear PCA model. We can see from these results that both techniques are affected by the quality of the estimated model, even though the PCA-based GLRT is less affected, since it resulted in fewer false alarms. These results also indicate that building a nonlinear PCA model may improve the detection abilities of both methods, especially for the PCA-based GLRT. Case study 2: Abnormal ozone detection – multiple true anomalies In this case study the fault detection abilities of PCA and the PCA-based GLRT are assessed in the presence of multiple true faults. The testing ozone data, which were measured in September 5–7, 2006, consist of 200 data samples. In this data set, faults simultaneously occurred in stations ND2 and TAN from sample numbers 21 to 34 (which occurred on 5 September, from 07:45 am to 11:00 am), and in stations LIL, ND2, and GRV from the sample numbers 119 to 146 (on September 6, from 08:45 am to 03:00 pm). Fig. 2.12 shows that the Q statistic is capable of detecting all faults but at the

PCA and PLS-based GLRT for fault detection

31

Figure 2.12 Monitoring a multiple faults using PCA-based Q.

Figure 2.13 Monitoring a multiple faults using PCA-based GLRT.

expense of a lot more false alarms (which are indicated by the red (dark gray in print version) crosses) than in the simple fault case (case study 1). The results of the GLRT, however, which are shown in Fig. 2.13, clearly indicate that the developed PCA-based GLRT fault detection algorithm can detect all faults with a much smaller number of false alarms compared to the conventional PCA method. This case study clearly shows the advantages of the PCA-based GLRT over PCA, especially in the case of multiple faults. In summary, the results of this ozone monitoring example show that the developed PCA-based GLRT fault detection method outperforms the conventional PCA approach by detecting all faults with a smaller number of false alarms, especially for the multiple fault case. The results also show that the quality of the estimated PCA model has a significant impact on the fault detection ability of both methods. Therefore, building a nonlinear PCA model (to better represent the nonlinear nature of the ozone formation process) is expected to reduce the rate of false alarms even further [23].

32

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 2.14 The variance captured by each principal component.

2.1.6.5 Process monitoring of a simulated continuously stirred tank reactor (CSTR) In this section, we illustrate the performance of the developed PCA-based GLRT fault detection method through its application to monitor a controlled nonisothermal continuous stirred tank reactor (CSTR), in which the irreversible first-order reaction (A → B) takes place [44]. To better represent practical process data, the simulated data of the two state variables (concentration and temperature), which are assumed to be noise-free, are then contaminated with zero-mean Gaussian noise having standard deviations of σc = 0.005 and σT = 0.02. 2.1.6.6 Modeling the CSTR data using PCA The CSTR model described earlier is used to generate 1750 data samples to be used in constructing the PCA model. The training process data include four variables: FC , F, CA , and T. Therefore a matrix having 1750 rows and 4 columns is used to construct the PCA model after scaling the variables. Using the CPV method and a threshold value of 90%, the optimum number of retained PCs has been found to be three (which capture 51.99%, 26.40%, and 17.73% of the total variations in the data) as shown in Fig. 2.14. Before applying the PCA-based GLRT, we need to check whether the residuals of the four variables follow Gaussian distributions to make sure that the data are well represented using a linear PCA model. The residuals here are columns of the error matrix E, which were not captured by the PCA model. Checking the normality of the residuals can be done by visually checking the histograms of these fours residual vectors, which are shown in

PCA and PLS-based GLRT for fault detection

33

Figure 2.15 Histograms showing the normality of the residuals.

Fig. 2.15. These histograms indicate that the normality assumption appears to be a reasonable one. In this example the performance of the developed PCA-based GLRT fault detection method is assessed and compared with that of PCA through three different cases studies representing three different types of faults. In the first case study the sensor measuring the concentration CA of A is assumed to be damaged. In the second case study a similar fault is assumed for the sensor measuring the temperature of the reactor T. In third case study, multiple faults are assumed to occur simultaneously in the sensors measuring the concentration and temperature inside the reactor.

2.1.6.7 Simulation results Case study 1: a fault in the concentration sensor – a simple fault The testing data used to compare the various fault detection methods, which consist of 350 samples, are generated using the CSTR model described earlier. To simulate a simple fault in the variable CA , an additive

34

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 2.16 Monitoring a fault in CA using PCA-based Q.

Figure 2.17 Monitoring a fault in CA using PCA-based GLRT.

fault having a magnitude 20% of the total variation in CA is introduced between samples 200 and 222. The results using the Q statistic (Fig. 2.16) show that it could successfully detect this simple fault but with several false alarms, which are indicated by the red (dark gray in print version) crosses. The performance of the PCA-based GLRT fault detection method (Fig. 2.17), on the other hand, shows that it could detect this fault without any false alarms. Case study 2: a fault in the temperature sensor – a simple fault In this case study the fault in the reactor temperature represented by a constant bias of amplitude equal 5% of the total variation in T is introduced starting at sample number 100 till the end of the testing data. The results of the Q statistic and the PCA-based GLRT (Figs. 2.18 and 2.19, respectively) show the ability of both techniques to detect this additive fault, but again with some false alarms using the Q statistic. Case study 3: simultaneous faults in the concentration and temperature sensors – multiple faults

PCA and PLS-based GLRT for fault detection

35

Figure 2.18 Monitoring a fault in T using PCA-based Q.

Figure 2.19 Monitoring a fault in T using PCA-based GLRT.

In this case study, simultaneous faults are introduced in both the concentration and temperature (each of which is represented by a bias of magnitude equal 5% of the variation in its corresponding variable) between samples 100 and 122. The results using the Q statistic and the PCA-based GLRT for this multiple fault case study are shown in Figs. 2.20 and 2.21, respectively. These results show that the Q statistic, even though it could detect these multiple faults, resulted in some false alarms. The PCA-based GLRT, however, was capable of detecting these faults without any false alarms. To compare the performances of the developed PCA-based GLRT and the conventional PCA fault detection method, we compare the Receiver Operating Characteristic (ROC) curves for the two approaches in Fig. 2.22. The ROC curves, which plot the correct detection rate β for different values of the false alarm probability α provide a measure to compare the detection accuracy of both tests and their sensitivities to variations in the detection thresholds. Fig. 2.22 shows that there is a trade-off between a high detection rate and a low false alarm probability. The intersections of

36

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 2.20 Monitoring of multiple faults in CA and T using PCA-based Q.

Figure 2.21 Monitoring of multiple faults in CA and T using PCA-based GLRT.

both plots (PCA and GLRT) with the line having a slope of −1 show that the GLRT provides a higher detection rate at a lower false alarm probability than the conventional PCA fault detection method. This clearly shows the advantages of the developed PCA-based GLRT over PCA.

2.1.7 Conclusion Principal component analysis (PCA) has been widely used in fault detection. In this work, we used PCA as a modeling framework for a new fault detection algorithm which exploits the advantages of the generalized likelihood ratio test (GLRT) (which assumes the availability of a process model) for improved fault detection when a process model is not available. In this developed PCA-based GLRT fault detection method, PCA is used to express a data matrix as the sum of two matrices, approximate and residual. Then the GLRT (which is a hypothesis testing method) is applied on the residual matrix to detect faults when the data do not fit the PCA model. The developed fault detection algorithm possesses optimal properties in the sense that it maximizes the detection probability for a given false alarm

PCA and PLS-based GLRT for fault detection

37

Figure 2.22 Correct detection rate vs false alarm rate for the PCA-based GLRT fault detection method and the conventional PCA method.

rate. The performance of the developed algorithm is illustrated through two examples, one using synthetic data and the other using simulated continuously stirred tank reactor (CSTR) data. The results of these examples clearly show the effectiveness of the developed PCA-based GLRT fault detection method.

2.2 PLS-based generalized likelihood ratio for fault detection 2.2.1 Introduction The demand for fault detection (FD) of chemical processes is significantly growing to ensure safety and to maintain product quality at desired level [45–47]. Fault detection is necessary to monitor the continuity of operating the system under normal conditions to ensure safety [48,49]. The detection of faults in the process helps to limit the process disturbances and keep it safe and reliable [50]. A fault can be defined as a nonallowed or incorrect step, processing, or data from an acceptable behavior, which leads to the system failure and the inability to address the intended objectives [51,52]. To keep a safe and reliable process, a detection system is needed. In Section 2.1, we have presented a principal component analysis (PCA)-based generalized likelihood ratio test (GLRT) fault detection method, which showed improved detection of abnormal ozone levels and enhanced monitoring of a continuously stirred tank reactor (CSTR) over

38

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

the conventional methods. The developed PCA-based GLRT fault detection method relies on using linear PCA (which is an input-space modeling technique) as a modeling framework for fault detection. However, most environmental and chemical processes, such as distillation columns, are usually described by input–output models (which relate two sets of variables, inputs and outputs). Therefore the objective of this chapter is extending the advantages of the developed linear PCA-based GLRT fault detection method to handle input–output models. To accomplish this objective, we plan to develop hypothesis testing fault detection technique based on input–output latent variable regression models. Examples of input–output models include partial least squares (PLS). The proposed technique widens the applicability of hypothesis testing-based fault detection to various practical chemical and environmental processes. The current chapter is organized as follows. In Section 2.2.2, we present a description of the partial least square approach. In Section 2.2.3, we present the developed PLS-based GLRT fault detection technique. In Section 2.2.4, we demonstrate the performance of the proposed technique using chemical processes. Finally, we present the conclusions in Section 2.2.5.

2.2.2 Partial Least Square (PLS) method PLS is an input output regression model that decomposes both online measurement X ∈ RN ×m matrix and quality Y ∈ RN ×p matrix while computing loading vectors and having linear relationship between the input and output score matrices. Thus the PLS model can be described as a combination of outer model (decompose X and Y matrices) and inner model (linear relationship between score matrices). The PLS model is given as [4] X = TP T + E =

 

ti pTi + E,

(2.22)

uj qTj + F ,

(2.23)

i=1

Y = UQT + F =

 

j=1

where E ∈ RN ×m and F ∈ RN ×p are the residuals of X and Y , respectively, obtained from the PLS model, T = [t1 , t2 , . . . , t ] ∈ RN × is the input score matrix, U = [u1 , u2 , . . . , u ] ∈ RN × is the output score matrix, P and Q are the loading vectors of X and Y respectively. The objective of the PLS

PCA and PLS-based GLRT for fault detection

39

algorithm is finding the solution of the following optimization problem [53]: max wTi XiT Yi qi s.t. wi =1, qi =1

(2.24)

where wi , qi are weight vectors that yield ti = Xi wi and ui = Yi qi , respectively. The weight vectors are computed in iterative manner using the nonlinear iterative partial least squares algorithm (NIPALS). More details about the PLS algorithm can be found in [53,15]. Once the score vectors t and u are extracted, the vectors of loadings p and q from Eqs. (2.22) and (2.23) can be computed by regressing X on t and Y on u, respectively. The modified NIPALS algorithm [53] is used to compute score matrix and loading vectors of X and Y , respectively, which is described in Algorithm 2: Algorithm 2 X , Y – PLS-NIPALS algorithm [53,15]. Center the column of X, Y to zero mean and scale them to unit variance Set i = 1, X1 = X 1. Set ui , equal to any column of Y 2. wi = Xi T ui /(uTi ui ) 3. ti = Xi wi 4. qi = Y T ti /tTi ti 5. ui = Y qi if ti converges, then go to Step 6, else return to Step 2 6. pi = XiT ti /tTi ti 7. Xi+1 = Xi − ti pTi Set i = i + 1 and return to Step 1. Terminate if i > .

PLS model can also be used as the regression model to predict the quality matrix Y from the online measurement matrix X. The relationship between Y and X is given as Y = XB + F ,

(2.25)

where F is the model residue, and B is the regression coefficient given as B = W (P T W )−1 C T ,

(2.26)

40

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

where W is the weight, P is the loading vector regressed on X, and C is the loading vector regressed on Y : W = XT U,

(2.27)

P = X T T (T T T )−1 ,

(2.28)

C = Y T T (T T T )−1 .

(2.29)

Thus the regression coefficient B is given as B = X T U (T T XX T U )−1 T T Y .

(2.30)

We denote by P, W , and B the estimated parameters of the PLS regression model. Given new scaled sample vectors x(k) ∈ Rm and y(k) ∈ Rp , their estimations xˆ (k) and yˆ (k) can be expressed, respectively, as xˆ (k) = Cx(k),

(2.31)

yˆ (k) = Bx(k),

(2.32)

where C = PRT and R = W (P T W )−1 .

2.2.3 PLS-based GLRT for fault detection Let x(k) ∈ Rm and y(k) ∈ Rp be new measurement vectors at time k. The output estimation given by the PLS model yˆ (k) is expressed as yˆ (k) = Bx(k).

(2.33)

The estimation error can be generated as eˆ (k) = y(k) − yˆ (k),

(2.34)

and the GLR test is expressed as, G(e) =

1 σ2

|e(k)|22 ,

(2.35)

where σ 2 is the variance of the residual e. The GLR test statistic follows a chi-square distribution [54]. Let a and b be, respectively, the mean and

PCA and PLS-based GLRT for fault detection

41

variance of the GLR statistic; its control limit Gα can be computed from its approximate distribution as [30,54,55] Gα = gχh2,α ,

(2.36)

where g = b/2a and h = 2a2 /b, A fault is detected if G(e) > Gα .

(2.37)

Next, we present an application of the developed PLS-based GLRT technique to fault detection of chemical processes.

2.2.4 PLS-based GLRT fault detection and applications In this section, we evaluate the fault detection performance of PLS-based GLRT in terms of three detection indicators: • Missed Detection Rate (%): the percentage of faulty observations undetected; • False Alarm Rate (%): the percentage of wrong faults declared in fault free region; • Average Run Length (ARL1 ): the number of observations taken to detect the fault after the fault is introduced. To demonstrate the advantages of the developed technique, the performance of the developed technique is assessed and compared to conventional fault detection techniques using two applications, one using simulated continuous stirred tank reactor data and the other using Tennessee Eastman process.

2.2.4.1 Fault detection of continuously stirred tank reactor process The input matrix X consists of cooling water flow rate, reactant flow rate, temperature, and concentration at the exit of CSTR, X = [FC F T CA ]. The training data (Xtrain ) and testing data (Xtest ) are computed by introducing stepwise changes in concentration and temperature controller. Fig. 2.23 shows response of the 4 state variables of the simulated CSTR model. The PLS model is trained using fault free data shown in Fig. 2.23. Zero-mean Gaussian noise having standard deviation 0.005 and 0.002 was introduced, N (0, 0.0052 ) and N (0, 0.0022 ) are introduced in concentration and temperature of outlet stream to represent practical process measurements. PLS is modeled based on the fault free training data set of nonisothermal CSTR model; Xtrain is the input training matrix with total 500 observations and 4 process variables.

42

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 2.23 The time evolution of the generated data X.

Figure 2.24 Monitoring fault in temperature using PLS-based Q chart. Table 2.1 Summary of MDRs (%), FARs (%) and ARL1 . Chart/Fault Detection Metric MDR (%) FAR (%) PLS-based Q 7.92 2.00 PLS-based GLRT 11.85 1.00

ARL1 1 1

Next, the performance of the PLS-based GLRT fault detection method is illustrated and compared to the conventional PLS-based method. In the case study the sensor measuring the temperature of the reactor T is assumed to be faulty with a simple fault of magnitude equal to 3σ . Figs. 2.24 and 2.25 show the FD fault detection comparison. PLS-based Q statistic shows an improvement of FD results over PLS-based Q statistic with lower missed detection rate. Detailed results are shown in Table 2.1. PLS-based GLRT (Fig. 2.25) shows an improved fault detection performance compared to PLS-based Q statistic.

PCA and PLS-based GLRT for fault detection

43

Figure 2.25 Monitoring fault in temperature using PLS-based GLRT chart.

Figure 2.26 Monitoring TEP IDV 2 fault using PLS-based Q chart.

Next, the developed PLS-based GLRT algorithm presented is illustrated through its application on a Tennessee Eastman Process.

2.2.4.2 Fault detection of Tennessee Eastman Process Table 2.2 shows missed detection rate, false alarm rate, and ARL1 for all the 21 faults test data sets. PLS-based GLRT statistic shows good FD improvement compared to PLS-based Q with lower missed detection rate, false alarm rate, and ARL1 for most of the faults. Fig. 2.26 shows the fault detection performance of PLS-based Q technique for IDV 2 fault test data. The PLS-based GLRT statistic shows better fault detection than PLS-based Q statistic (Figs. 2.26 and 2.27).

2.2.5 Conclusions Most chemical processes or many of them, such as distillation columns, are usually described by input–output models. For input–output models, on the other hand, partial least squares (PLS) has been widely used to extract

44

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 2.2 Missed detection rates (MDR %), False alarm rates (FAR %), and ARL1 values for TEP data. Fault PLS-based Q PLS-based GLRT MDR % FAR % ARL1 MDR % FAR % ARL1

IDV(1) IDV(2) IDV(3) IDV(4) IDV(5) IDV(6) IDV(7) IDV(8) IDV(9) IDV(10) IDV(11) IDV(12) IDV(13) IDV(14) IDV(15) IDV(16) IDV(17) IDV(18) IDV(19) IDV(20) IDV(21)

7 7.875 99.52 100 87.45 2.54 85.33 64.23 99.31 100 100 55.34 41.23 100 100 89.23 20.13 15.31 82.31 53.23 56.41

4.24 12.24 0 0 0 2.23 0 12.23 12.42 1.43 5.35 30 5.24 0 0 0.44 2.23 14.73 17.31 0 22.41

7 6 364 – 18 4 35 81 12 – – 56 55 – – 193 31 13 1 65 168

3 7 99.87 100 85 2 88 60.01 95.41 83.12 100 50.23 33.23 99 100 84 11.34 11.893 83.42 40.23 47.42

0 0 0 0 0 0 0 0 5.41 0 0 0 11.34 0 0 3.42 0.89 9.31 9.41 0 17.14

9 17 355 – 17 17 30 80 9 65 – 33 51 775 – 193 21 9 12 68 210

Figure 2.27 Monitoring TEP IDV 2 fault using PLS-based GLRT chart.

relationships between two sets of variables, inputs and outputs. Thus, in this chapter, we proposed a statistical fault detection using PLS-based GLRT. In fact, the input–output PLS model has been shown to be suitable to obtain

PCA and PLS-based GLRT for fault detection

45

accurate principal components of a set of data. In addition, the GLRT is a composite hypothesis testing method known to have a better fault detection performance compared to conventional PLS-based T 2 and Q statistics. The fault detection problem is addressed so that the data are first modeled using the PLS algorithm and then the faults are detected using GLRT. The detection stage is related to the evaluation of detection indices, which are signals that reveal the presence of faults. These indices are obtained from the analysis of the difference between the process measurements and its estimation using the PLS technique. The fault detection performance of the PLS-based GLRT is illustrated through a simulated continuously stirred tank reactor (CSTR) data. The results demonstrate the effectiveness of the PLS-based GLRT method over the conventional PLS method for detection of single and multiple sensor faults.

References [1] I. Hwang, S. Kim, Y. Kim, C. Seah, A survey of fault detection, isolation, and reconfiguration methods, IEEE Transactions on Control Systems Technology 18 (3) (2010) 636–653. [2] S. Qin, Survey on data-driven industrial process monitoring and diagnosis, Annual Reviews in Control 36 (2) (2012) 220–234. [3] J.E. Jackson, A User’s Guide to Principal Components, vol. 587, John Wiley & Sons, 2005. [4] T. Kourti, J. MacGregor, Process analysis, monitoring and diagnosis using multivariate projection methods: a tutorial, Chemometrics and Intelligent Laboratory Systems 28 (3) (1995) 3–21. [5] M. Basseville, I.V. Nikiforov, Detection of Abrupt Changes: Theory and Application, 1993. [6] A. Borovkov, Mathematical Statistics, Gordon and Breach Sciences Publishers, Amsterdam, 1998. [7] E. Lehmann, Testing Statistical Hypotheses, Chapman and Hall, New York, 1996. [8] T. Ferguson, Mathematical Statistics: A Decision Theoretic Approach, Academic Press, New York, London, 1967. [9] A. Wald, Tests of statistical hypotheses concerning several parameters when the number of observations is large, Transactions of the American Mathematical Society 54 (1943) 426–482. [10] T. Severini, Likelihood Methods in Statistics, Oxford University Press, Oxford, 2000. [11] Y. Pawitan, in: All Likelihood: Statistical Modeling and Inference Using Likelihood, Oxford University Press, Oxford, 2001. [12] F. Harrou, M. Nounou, H. Nounou, M. Madakyaru, Statistical fault detection using PCA-based GLR hypothesis testing, Journal of Loss Prevention in the Process Industries 26 (1) (2013) 129–139. [13] S. Qin, Statistical process monitoring: basics and beyond, Journal of Chemometrics 17 (8/9) (2003) 480–502.

46

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

[14] A. Herve, J. Lynne, Principal component analysis, Wiley Interdisciplinary Reviews: Computational Statistics 2 (2010) 433–459. [15] W. Li, H. Yue, S. Valle, S. Qin, Recursive PCA for adaptive process monitoring, Journal of Process Control 10 (5) (2000) 471–486. [16] W. Liang, L. Zhang, A wave change analysis (WCA) method for pipeline leak detection using gaussian mixture model, Journal of Loss Prevention in the Process Industries 25 (1) (2012) 60–69. [17] L. Weihua, H.H. Yue, S. Valle-Cervantes, S. Qin, Recursive PCA for adaptive process monitoring, Journal of Process Control 10 (2000) 471–486. [18] B. Bakshi, Multiscale PCA with application to multivariate statistical process monitoring, AIChE Journal 44 (1998) 1596–1610. [19] J.-C. Jeng, Adaptive process monitoring using efficient recursive PCA and moving window PCA algorithms, Journal of the Taiwan Institute of Chemical Engineers 41 (2010) 475–481. [20] P. Nomikos, J. MacGregor, Monitoring batch processes using multiway principal component analysis, AIChE Journal 40 (8) (1994) 475–481. [21] W. Ku, R. Storer, C. Georgakis, Disturbance detection and isolation by dynamic principal component analysis, Chemometrics and Intelligent Laboratory Systems 30 (1995) 179–196. [22] M. Kramer, Nonlinear principal component analysis using autoassociative neural networks, AICHE Journal 37 (1991) 23–243. [23] M. Harkat, S. Djelel, N. Doghmane, M. Benouaret, Sensor fault detection, isolation and reconstruction using nonlinear principal component analysis, International Journal of Automation and Computing 04 (2) (2007) 149–155. [24] M. Harkat, G. Mourot, J. Ragot, An improved PCA scheme for sensor FDI: application to an air quality monitoring network, Journal of Process Control 16 (6) (2006) 625–634. [25] B. Wise, N. Gallagher, The process chemometrics approach to process monitoring and fault detection, Journal of Process Control 6 (1996) 329–348. [26] A. Simoglou, E. Martin, A. Morris, Multivariate statistical process control in chemicals manufacturing, in: IFAC Conference SAFEPROCESS, Hull, UK, 1997, pp. 21–27. [27] J. George, Z. Chen, P. Shaw, Fault detection of drinking water treatment process using PCA and Hotelling’s T 2 chart, International Journal of Computer and Information Engineering 50 (2009) 970–975. [28] Y. Tharrault, Diagnostic de fonctionnement par analyse en composantes principales: Application à une station de traitement des eaux usées, PhD dissertation, National Polytechnic Institute of Lorraine, 2008. [29] J. Yu, Fault detection using principal components-based Gaussian mixture model for semiconductor manufacturing processes, IEEE Transactions on Semiconductor Manufacturing 24 (3) (2011) 471–486. [30] J.E. Jackson, G.S. Mudholkar, Control procedures for residuals associated with principal component analysis, Technometrics 21 (3) (1979) 341–349. [31] I.T. Jolliffe, A note on the use of principal components in regression, Applied Statistics (1982) 300–303. [32] S. Joe Qin, Statistical process monitoring: basics and beyond, Journal of Chemometrics 17 (8–9) (2003) 480–502. [33] S.J. Qin, R. Dunia, Determining the number of principal components for best reconstruction, Journal of Process Control 10 (2) (2000) 245–250.

PCA and PLS-based GLRT for fault detection

47

[34] F. Harrou, M. Nounou, H. Nounou, Statistical detection of abnormal ozone levels using principal component analysis, International Journal of Engineering & Technology 12 (6) (2012) 54–59. [35] M. Zhu, A. Ghodsi, Automatic dimensionality selection from the scree plot via the use of profile likelihood, Computational Statistics & Data Analysis 51 (2006) 918–930. [36] G. Diana, C. Tommasi, Cross-validation methods in principal component analysis: a comparison, Statistical Methods & Applications 11 (1) (2002) 71–82. [37] I. Jolliffe, Principal Component Analysis, second edition, Springer, Berlin, 2002. [38] H. Hotelling, Analysis of a complex of statistical variables into principal components, Journal of Educational Psychology 24 (1933) 417–441. [39] S. Wang, Y. Chen, Sensor validation and reconstruction for building central chilling systems based on principal component analysis, Energy Conversion and Management 45 (5) (2004). [40] J. Romagnoli, A. Palazoglu, Introduction to Process Control, CRC Press, United States of America, 2006. [41] J. Jackson, G. Mudholkar, Control procedures for residuals associated with principal component analysis, Technometrics 21 (1979) 341–349. [42] A. Benaicha, M. Guerfel, N. Boughila, K. Benothman, New PCA-based methodology for sensor fault detection and localization, in: MOSIM’10, Hammamet, Tunisia, Hammamet, Tunisia, May 10–12, 2010, pp. 10–12. [43] I. Zdanevitch, Etude d’épisodes inexpliqués d’ozone, Rapport LCSQA, convension 41/2000, INERIS, Paris, 2001. [44] M. Mansouri, M. Nounou, H. Nounou, K. Nazmul, Kernel PCA-based glrt for nonlinear fault detection of chemical processes, Journal of Loss Prevention in the Process Industries 26 (1) (2016) 129–139. [45] C. Agudelo, F.M. Anglada, E.Q. Cucarella, E.G. Moreno, Integration of techniques for early fault detection and diagnosis for improving process safety: application to a fluid catalytic cracking refinery process, Journal of Loss Prevention in the Process Industries 26 (4) (2013) 660–665. [46] I. Shulman, K. Lohr, A. Derdiarian, J. Picukaric, Monitoring transfusionist practices: a strategy for improving transfusion safety, Transfusion 34 (1) (1994) 11–15. [47] D.A. Crowl, J.F. Louvar, Chemical Process Safety: Fundamentals With Applications, Pearson Education, 2001. [48] W. Tan, N. Nor, M.A. Bakar, Z. Ahmad, S. Sata, Optimum parameters for fault detection and diagnosis system of batch reaction using multiple neural networks, Journal of Loss Prevention in the Process Industries 25 (1) (2012) 138–141. [49] K. Havelund, G. Ro¸su, Efficient monitoring of safety properties, International Journal on Software Tools for Technology Transfer 6 (2) (2004) 158–173. [50] A. Benkouider, J. Buvat, J. Cosmao, A. Saboni, Fault detection in semi-batch reactor using the EKF and statistical method, Journal of Loss Prevention in the Process Industries 22 (2) (2009) 153–161. [51] R. Isermann, Process fault detection based on modeling and estimation methods a survey, Automatica 20 (4) (1984) 387–404. [52] S. Datta, S. Sarkar, A review on different pipeline fault detection methods, Journal of Loss Prevention in the Process Industries 41 (2016) 97–106. [53] B.S. Dayal, J.F. MacGregor, Recursive exponentially weighted PLS and its applications to adaptive control and prediction, Journal of Process Control 7 (3) (1997) 169–179.

48

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

[54] G.E. Box, et al., Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification, The Annals of Mathematical Statistics 25 (2) (1954) 290–302. [55] M. Mansouri, M. Nounou, H. Nounou, N. Karim, Kernel PCA-based GLRT for nonlinear fault detection of chemical processes, Journal of Loss Prevention in the Process Industries 40 (2016) 334–347.

CHAPTER 3

Kernel PCA- and Kernel PLS-based generalized likelihood ratio tests for fault detection Contents 3.1. Kernel PCA-based generalized likelihood ratio test for fault detection 3.1.1 Introduction 3.1.2 Kernel Principal Component Analysis (KPCA) description 3.1.3 Fault detection using KPCA method 3.1.4 Enhanced monitoring using kernel GLRT chart 3.1.5 Kernel GLRT fault detection chart with applications 3.1.5.1 Application 1: synthetic data 3.1.5.2 Application 2: nonisothermal CSTR process 3.1.6 Conclusion 3.2. Kernel PLS-based generalized likelihood ratio test for fault detection 3.2.1 Introduction 3.2.2 Kernel Partial Least Squares (KPLS) method 3.2.3 KPLS-based GLRT and application to fault detection in CSTR process Case 1: faults in the concentration CA Case 2: fault in the temperature T Case 3: faults in the concentration CA and temperature T 3.2.4 Conclusion References

49 49 50 53 54 57 57 60 62 63 63 65 69 69 70 72 73 75

3.1 Kernel PCA-based generalized likelihood ratio test for fault detection 3.1.1 Introduction PCA is the most used multivariate data-driven method in the field of process monitoring through different applications [1–5]. PCA modeling method is based on the assumption that the process behaves linearly. However, most practical and real chemical processes are nonlinear, and applying linear PCA might not perform properly. To address this issue, kernel PCA (KPCA) has been proposed [6]. KPCA models rely on transforming the data into a higher-dimensional space, in which the data become linear [7], making the kernel-based approach for modeling nonlinear processes an attractive choice. Moreover, monitoring techniques based on KPCA and its Data-Driven and Model-Based Methods for Fault Detection and Diagnosis https://doi.org/10.1016/B978-0-12-819164-4.00012-1

Copyright © 2020 Elsevier Inc. All rights reserved.

49

50

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

extension have been commonly applied in several applications and showed good results in terms of analysis, modeling, and fault detection [8–11]. Conceptually, KPCA consists of two steps: (i) mapping the data from its original space into the feature space and (ii) applying the linear PCA in the feature space. The most advantages is that kernel PCA does not involve any nonlinear optimization procedure compared to neural-based PCA approaches [12,13]. In addition, the calculation procedure used in linear PCA can be directly used in KPCA. Like linear PCA, the KPCA approach generally uses T 2 and Q statistics computed in the feature space for fault detection. In the current work, we propose to enhance fault detection abilities of KPCA-based charts by applying generalized likelihood ratio test (GLRT) in the feature space. It has been shown that the linear GLRT chart provides a better detection abilities when compared to the well-known Shewhart and exponentially weighted moving average (EWMA) charts. Therefore the kernel GLRT (KGLRT) algorithm based on the leading eigenvector of the sample covariance matrix and the inner product of data in the feature space [14,15] will be developed. Thus, in this chapter, we propose to merge the developed KPCA with GLRT chart to enhance the fault detection performances and improve the monitoring of chemical processes. To evaluate the proposed fault detection algorithm, we used one synthetic example and two well-known chemical processes (continuous stirred tank reactor process (CSTR) and Tennessee Eastman process (TEP)). The detection effectiveness of the developed technique is evaluated using three fault detection criteria: the missed detection rate (MDR), the false alarm rate (FAR), and the out-of-control ARL1 . The rest of the chapter is organized as follows. In Section 3.1.2, we describe the principal component analysis method. In Section 3.1.3, we present the application of PCA to fault detection. In Section 3.1.4, we present the kernel generalized likelihood ratio fault detection chart. In Section 3.1.5, we validate the developed algorithm through its applications. Finally, Section 3.1.6 concludes the chapter.

3.1.2 Kernel Principal Component Analysis (KPCA) description This section presents the KPCA method. In fact, KPCA is among the most popular dimensional reduction and analysis methods [6]. KPCA extends the linear PCA to deal with nonlinear behavior. It consists to transform the nonlinear behavior of input data space into a linear behavior in a new high-dimensional feature space and to perform PCA in that space. Let a set

KPCA- and KPLS-based GLRT for fault detection

51

of normalized training data X=



x1 x2 . . . xN

T

∈ RN ×m

(3.1)

be given, where m is the number of process variables, and N is the number of measurements. A nonlinear mapping in the feature space H, φ : xi ∈ Rm → φi = φ(xi ) ∈ Rh maps the training data set into a highdimensional feature space, where h  m is the dimension of the feature space. An important property of the feature space is that the dot product of two vectors φ(xi ) and φ(xj ), i, j = 1, . . . , N, can be determined as φ(xi )T φ(xj ) = k(xi , xj ),

(3.2)

where k is called the kernel function. Many kernel functions have been proposed in the literature. The Gaussian kernel (Radial Basis Function (RBF)) is the most well-used kernel function: 

 −xi − xj 2 k(xi , xj ) = exp , σ2

(3.3)

where σ is the width of a Gaussian function. Assuming that the vectors in the feature space are scaled to zero mean and unit variance, the mapped data T



are arranged as X = φ(x1 ) φ(x2 ) . . . φ(xN ) . The covariance matrix C of the data set in the feature space is defined as follows: (N − 1)C = X T X =

N 

φi φiT .

(3.4)

i=1

KPCA in the feature space is equivalent to solving the following eigenvector equation: XTXv =

N 

φi φiT v

i=1

(3.5)

= λv.

The mapping function φi is not explicitly defined. The so-called kernel trick premultiplies equation (3.5) by X : X X T X v = λX v.

(3.6)

52

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

With the use of the kernel trick, we can define ⎡ ⎢ ⎢ ⎢ K = XXT = ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ ⎢ =⎢ ⎢ ⎣

φ1T φ1 . . . T φN φ1

φ1T φN . ... . . T . . . φN φN ...

k(x1 , x1 )

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

. . . k(x1 , xN ) . . ... . . . . k(xN , x1 ) . . . k(xN , xN



(3.7)

⎥ ⎥ ⎥ ⎥ ⎥ ⎦

and denote α = X v.

(3.8)

Then Eq. (3.6) can be expressed as K α = λα.

(3.9)

This equation shows that λ and α are an eigenvalue and eigenvector of K, respectively. To solve v from Eq. (3.8), we multiply this equation by X T , and using Eq. (3.5), we have XTα = XTXv

= λv,

(3.10)

which shows that v is given by v = λ−1 X T α.

(3.11)

Therefore an eigendecomposition is first performed using Eq. (3.9) to obtain λi and αi , and then Eq. (3.11) is used to compute vi . To ensure the normality of the vectors v1 , v2 , . . . , vN , that is, viT vi = 1, using Eqs. (3.8) and (3.10), we derive αiT αi = viT X T X vi = viT λi vi = λi .

(3.12)

KPCA- and KPLS-based GLRT for fault detection

Therefore αi needs to have a norm of eigenvector corresponding to λi , αi =

λi αi∗

53

√ λi . Let αi∗ be the unit norm

i = 1, . . . , N .

(3.13)

The matrix of the  retained principal loading of the KPCA in the feature space is denoted P = [v1 , . . . , v ]. From Eq. (3.11) P can be expressed as P = =

 

1 λ1

X T α1 , . . . , λ1 X T α ] −1/2

X T α1∗ λ1

−1/2

, . . . , X T α∗ λ

]

(3.14)

= X T P ∗ −1/2 , 



where P ∗ = α1∗ , . . . , α∗ and  = diag {λ1 , . . . , λ } are the  principal eigenvectors and eigenvalues of K, respectively. The choice of the number  of PCs has been the subject of many studies; some of them are described in [16,17]. For a given measurement x and its mapped vector φ = φ(x), the scores are calculated as, t = PT φ = −1/2 P ∗T X φ

(3.15)

= −1/2 P ∗T k(x)

t˜ = P˜ T φ

T  = v+1 , . . . , vN φ,

(3.16)

where k(x) = X φ = [φ1 , . . . , φN ]T φ = [φ1 φ, . . . , φN φ]T  T = k(x1 , x), . . . , k(xN , x) .

(3.17)

3.1.3 Fault detection using KPCA method KPCA-based fault detection is performed using Hotelling’s T 2 and squared prediction error (SPE) statistics [18,19]. Hotelling’s T 2 index is calculated as

54

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

T 2 = tT −1 t, where  = diag(λ1 , . . . , λ ) is the covariance of the scores t in the feature space. From Eq. (3.17), T 2 is calculated using kernel functions as [19,20,9] T 2 = k(x)T P −1 P T k(x).

(3.18)

The control limit associated with this monitoring index is given by [19, 9] 2

ταT =

(N − 1)(N + 1) Fα (, N − ), N (N − )

(3.19)

where Fα (, N −) is an F-distribution with  and N −  degrees of freedom and a level of significance α . The SPE index is defined in the feature space as [19,9] ⎧ T ⎪ SPE = t˜ t˜ ⎪ ⎪ ⎪ ⎪ ⎪ = φ(x)T P˜ P˜ T φ(x) ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

= φ(x)T (I − PP T )φ(x)

(3.20)

= φ(x) φ(x) − φ(x) PP φ(x) T

T

T

= k(x, x) − kT (x)Ck(x),

where I is the identity matrix, P˜ P˜ T = I − PP T , and C = PP T . The control limit of the SPE index is determined from the χ 2 -distribution and is defined as ταspe = gχh2,α ,

(3.21)

2a2 , where a and b are the mean and variance of the where g = 2ab and h = b SPE index, respectively.

3.1.4 Enhanced monitoring using kernel GLRT chart The GLRT approach developed in [21] is based on the linear subspace model developed in [14]. The basic idea starts with one N-dimensional vector x, and the two hypotheses H0 and H1 can be expressed as H0 : x = w, H1 : x = P θ + w,

(3.22)

KPCA- and KPLS-based GLRT for fault detection

55

where w is a white noise with multivariate Gaussian distribution N (0, σ 2 I ), P is an orthogonal matrix, P T P = I where I is the identity matrix, and θ is a parameter vector. For the received vector x, the ratio test (LTR) approach detects between the two hypotheses H0 and H1 by ρ=

f 1 (x|H1 ) ≷γ , f 0 (x|H0 )

(3.23)

where γ is the threshold value of the GLRT approach, f 1 (x|H1 ) and f 0 (x|H0 ) are the conditional probability densities which follow Gaussian distributions, and H0

: f 0 (x|H0 ) ∼ N (0, σ02 I ) =

H1

1

N (2π σ02 ) 2

exp(− 2σ1 2 ||w0 ||2 ), 0

: f 1 (x|H1 ) ∼ N (P θ, σ12 I ) =

1

N (2π σ12 ) 2

(3.24)

exp(− 2σ1 2 ||w1 ||2 ). 1

In general, the parameters θ , σ0 , and σ1 under which the GLRT approach is explored are unknown. In GLRT the parameters θ , σ0 , and σ1 are replaced by their maximum likelihood estimates θˆ , σˆ 0 , and σˆ 1 . The maximum likelihood estimate of θ is equivalent to the least square estimate of w1 : wˆ0 = x, wˆ1 = x − P θˆ = (I − C )x.

(3.25)

σˆ 0 , σˆ 1 are given as σˆ 02 = σˆ 12 =

1 ˆ 0 ||2 , N ||w 1 ˆ 1 ||2 N ||w

(3.26)

Substituting the maximum likelihood estimates of the parameters into Eq. (3.23) and taking N2 root, GLRT is expressed as [14] G=

||w ˆ 0 ||2 ||w ˆ 1 ||2

=

xT Ix xT (I −C )x

=

xT x xT (I −PP T )x

(3.27) ,

56

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

where I is the identity projection operator, and C = PP T is the projection onto the principal subspace, C = P (P T P )−1 P T = PP T .

(3.28)

The detection result is evaluated by comparing G of GLRT with a threshold value Gα . Accordingly, if H0φ , H1φ also obey Gaussian distributions, H0φ : x = wφ , H1φ : x = P θφ + wφ ,

(3.29)

then GLRT can be extended to the feature space of φ(x): KG = =

||wˆ 0,φ ||2 ||wˆ 1,φ ||2

φ(x)T Iφ φ(x) , φ(x)T (Iφ −Cφ )φ(x)

(3.30)

where Iφ is the identity projection operator in the feature space, Pφ is the linear space that the primary user’s signal in the feature space lies on. Each column of Pφ is the eigenvector corresponding to the nonzero eigenvalue of φ(x) =

N 1  φ(xi )φ(xi )T . N i=1

(3.31)

Likewise, Pφ is a projection operator onto the primary signal subspace, and Cφ = Pφ (PφT Pφ )−1 PφT = Pφ PφT .

(3.32)

Here, assuming that Iφ can perfectly project φ(x) as φ(x) in the feature space [15], φ(x)T Iφ φ(x) = φ(x)T φ(x)

(3.33)

Based on the derivation of kernel PCA, the eigenvectors corresponding to nonzero eigenvalues of the sample covariance matrix φ(x) , φ(x)T α are eigenvectors corresponding to nonzero eigenvalues of K, where, α = (α1 , α2 , . . . , α ) and φ(x) = (φ(x1 ), φ(x2 ), . . . , φ(xN )T . Accordingly,

KPCA- and KPLS-based GLRT for fault detection

57

φ(x)T Cφ φ(x) can be represented as φ(x)T Pφ PφT φ(x) = φ(x)T φ(x)T αα T φ(x)φ(y) = kT αα T k,

(3.34)

where ⎛



k(x, x1 ) ⎜ ⎟ ⎜ k(x, x2 ) ⎟

k=⎜ ⎜

⎟. .. ⎟ . ⎝ ⎠ k(x, xN )

(3.35)

The derivation of Eq. (3.30) is based on the assumption that the hypotheses H0φ and H1φ obey Gaussian distributions. The authors of [14] have claimed, although without strict proof, that if k is a Gaussian kernel, then H0φ , H1φ are still distributed as Gaussians. A Gaussian kernel is employed for the kernel GLRT approach, and thus φ(x)T φ(x) = k(x, x) = 1. Substituting Eq. (3.34) into Eq. (3.30), we have KG =

1 T

1 − k αα T k

.

(3.36)

The centering of k in the feature space [14] is ⎛



1 ⎜ ⎟ ⎜ 1 ⎟ 1 1 1 ⎜ k=k−⎜ . ⎟ ⎟ ( N , N , . . . , N )k. . ⎝ . ⎠ 1

(3.37)

First, we outline the flow chart of the KGLRT algorithm for fault detection with Gaussian kernels as shown in Fig. 3.1. Then we describe the main steps for the algorithm as summarized in Algorithm 1.

3.1.5 Kernel GLRT fault detection chart with applications 3.1.5.1 Application 1: synthetic data To illustrate the proposed kernel generalized likelihood ratio test (GLRT), we consider the following nonlinear simulation example based on 6 variables in the X with N = 1000 samples. The monitored variables are de-

58

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 3.1 Schematic illustration of KGLRT algorithm.

Algorithm 1 KGLRT algorithm. Inputs: N × m data matrix X. 1. Acquire the data set under normal operation, {x1 , x2 , . . . , xN }, xi ∈ Rm . 2. Determine the normalized data matrix X, 3. Determine the kernel matrix K and the centered kernel matrix K. 4. Determine the number of principal components (PCs), the eigenvalues, and the eigenvectors of the KPCA model. 5. Normalize the received m-dimensional vector xi by xi =

xi . ||xi ||2

6. Compute the kernel vector of k = k(x, xi ) by Eq. (3.35), 7. Compute the value of K G defined in Eq. (3.36). 8. Determine a threshold value Gα for a desired false alarm rate. 9. Detect the presence or absence of fault in xi by checking whether K G > Gα or not.

KPCA- and KPLS-based GLRT for fault detection

59

Figure 3.2 Evolutions of SPE, KGLRT, and EWMA-KGLRT in faulty case.

scribed in different instants k by the following relations: ⎧ x1 (k) = v1 (k), v1 (k) ∼ N (0, 0.01), ⎪ ⎪ ⎪ ⎪ ⎪ x2 (k) = v2 (k), v2 (k) ∼ N (0, 0.01), ⎪ ⎪ ⎨ x (k) = sin(x (k)) + v (k), v (k) ∼ N (0, 0.01), 3 1 3 3 (3.38) 2 − 3x (k) + 4 + v (k), v (k) ∼ N (0, 0.01), ⎪ x ( k ) = x ( k ) 4 1 1 4 4 ⎪ ⎪ ⎪ ⎪ ⎪ x5 (k) = x2 (k)2 + cos(x2 (k)2 ) + 1 + v5 (k), v5 (k) ∼ N (0, 0.01), ⎪ ⎩ x6 (k) = x3 (k)2 + x3 (k)x4 (k) + x1 (k) + v6 (k), v6 (k) ∼ N (0, 0.05),

where vj (k), j = 1, . . . , 6, represent Gaussian noise with small variance added to the measurements. The KPCA model is identified with  = 2 retained principal components. The X data are split into training and testing data sets of 500 observations each in order to carry out fault detection. After a process model has been successfully identified, we can proceed with fault detection. First, one fault of magnitude 3σ is simulated on variable x5 between samples 400 and 500. Second, two faults of magnitudes 3σ are introduced in variable x5 between samples 400 and 500 and in variable x4 between samples 300 and 350. To evaluate the efficiency of the proposed fault detection technique, two metrics are used: the false alarms rate (FAR) and the miss detection rate (MDR). For the first scenario, Fig. 3.2 represents the time evolution of fault detection charts SPE, KGLRT, and EWMA-KGLRT, computed based on residuals in the feature space. For the second scenario with two faults, the proposed technique still provides better detection abilities when compared the KGLRT and SPE charts (see Fig. 3.3).

60

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 3.3 Evolutions of SPE, KGLRT, and EWMA-KGLRT in faulty case. Table 3.1 FAR %, MDR %, and ARL1 for the presented fault detection charts (simulated example). Charts Case 1 Case 2 FAR % MDR % ARL1 FAR % MDR % ARL1 EWMA-KGLRT 4.76 0 1 2.01 0 1

KGLRT SPE

6.0150 2.75

63.36 78.21

1 11

7.38 3.69

57.92 67.32

3 3

Both SPE and KGLRT charts fail to detect most of the fault (as seen in Figs. 3.2 and 3.3). However, the performance is further improved through the application of the EWMA-KGLRT chart. The performances of the different fault detection charts are summarized in Table 3.1.

3.1.5.2 Application 2: nonisothermal CSTR process Next, the performance of the developed EWMA-KGLRT fault detection method is illustrated and compared to SPE and KGLRT charts. In the case study the sensor measuring the temperature inside the reactor is assumed to be faulty with bias fault. To illustrate the performances of the proposed approach, a sensor fault is introduced on variable x3 (temperature T) between samples 400 and 500. The magnitude of the fault represents 5σ of this variable. We can show from Fig. 3.4 that the EWMA-KGLRT chart shows a good improvement when compared to KGLRT and SPE charts in terms of FAR, MDR,

KPCA- and KPLS-based GLRT for fault detection

61

Figure 3.4 The time evolution of the SPE (A), KGLRT (B) and EWMA − KGLRT (C) statistics on a semi-logarithmic scale in the presence of a single fault in T.

and ARL1 values due to the advantages of EWMA representation (see also Fig. 3.5). The missed detection rates (%), false alarm rates (%), and ARL1 values obtained using the three techniques for single and multiple faults are summarized in Tables 3.2 and 3.3, respectively.

62

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 3.5 The time evolution of the SPE (A), KGLRT (B) and EWMA − KGLRT (C) statistics on a semi-logarithmic scale in the presence of simultaneous faults in CA and T.

3.1.6 Conclusion In this chapter, kernel principal component analysis (KPCA)-based generalized likelihood ratio test (GLRT) is used for nonlinear fault detection.

KPCA- and KPLS-based GLRT for fault detection

63

Table 3.2 FAR % and MDR % for the presented fault detection charts (CSTR process) for a single fault. Charts MDR % FAR % EWMA-KGLRT 0 1.00

KGLRT SPE

0.99 20.39

5.35 0.33

Table 3.3 FAR % and MDR % for the presented fault detection charts (CSTR process) for multiple faults. Charts MDR % FAR % EWMA-KGLRT 0 18.39

KGLRT SPE

0.49 16.41

21.07 18.6

The fault detection problem was addressed so that the data are first modeled using the KPCA method, and then the faults are detected using GLRT. The KPCA method is investigated as a modeling algorithm in the task of fault detection. The idea is to improve the GLRT performance introducing modeling of the data using the KPCA. The KPCA method has been proposed to deal with a nonlinear extension of PCA and provides a good performance over the linear versions. The KPCA-based GLRT fault detection performance is assessed and compared to that of the conventional KPCA through two examples, synthetic data and simulated continuously stirred tank reactor (CSTR) data. The results demonstrate the effectiveness of the KPCA-based GLRT technique over the conventional KPCA through its two charts T 2 and Q for detection of single and multiple sensor faults.

3.2 Kernel PLS-based generalized likelihood ratio test for fault detection 3.2.1 Introduction Multivariate statistic methods are very effective for fault detection and diagnosis in chemical industry [22,23]. The partial least square (PLS) and principle component analysis (PCA) are two basic types of multivariate methods. PCA is among the most popular statistical methods used for modeling and fault detection problems [24–27]; however, it provides linear

64

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

combinations of variables that demonstrate major trends in data set. In the previous section, we have successfully applied kernel PCA (KPCA)-based generalized likelihood ratio test (GLRT) for nonlinear fault detection of a chemical system. However, KPCA is an input space model and cannot take outcome measures into account, and most chemical processes or many of them, such as distillation columns, are usually described by input–output models. PLS is an input–output model and could be used to detect fault in both process and variables. It can also be used as a linear regression tool to predict the output variables form process variables. Hoteling T 2 and Q statistics are common statistical fault detection (FD) charts that are applied with PLS for process monitoring. However, the use of the conventional PLS [28] through its two charts, Hoteling T 2 and Q, could lead to missed detection and high false alarm rate. However, chemical and refinery processes are complex, and most of the variables are nonlinear in nature, and the fault detection of these processes by linear PLS would lead to many missed diagnosis and nonreliable results. In the literature, several nonlinear versions of PLS are developed, and a nonlinear iterative partial least squares (NIPALS) algorithm is used to model PLS, which was developed in [29]. The authors of [30] have presented a complicated artificial neural network to model nonlinear PLS, whereas the authors of [31] have proposed to use a quadratic function to relate scores in the PLS algorithm. Thus, in this chapter, we propose to use the kernel partial least squares (KPLS) as a modeling framework. The KPLS is among the most well-known nonlinear statistical methods [32]; it is the method for performing a nonlinear form of PLS. It is an input output model, which reduces the dimensionality of process variable and variables to extract scores and principle components and could be used to detect fault in both process and variables. The KPLS gives good general properties of nonlinear PLS by selecting an appropriate kernel function [32]; a radial basis function, a polynomial function, and a sigmoid function are three common kernel functions used. KPLS can also be used as a regression tool to predict the product variables from nonlinear process variables. KPLS approach works similar to PLS; it reduces the dimensionality of nonlinear process variables and variables by projecting into space with less dimensionality. Hence the objective of this chapter is to address the problem of nonlinear fault detection so that the data are first modeled using the KPLS algorithm, and then the faults are detected using generalized likelihood ratio test (GLRT). The KPLS is used to create the model and find nonlinear combinations of parameters that describe the major trends in a data set,

KPCA- and KPLS-based GLRT for fault detection

65

and GLRT is used to detect the faults, and both are utilized to improve faults detection process. An alternative approach for fault detection is to use hypothesis testing-based techniques, such as the GLRT. The GLRT has been shown to provide good detection abilities for specified false alarm rates [33–35]. The fault detection performances of the KPLS-based GLRT are illustrated through a simulated continuously stirred tank reactor (CSTR) data. The results demonstrate the effectiveness of the KPLS-based GLRT method over the linear PLS-based GLRT and conventional KPLS methods for detection of single and multiple sensor faults and assessed using the false alarms and missed detection rates. The current chapter is organized as follows. In Section 3.2.2, we present the developed kernel PLS-based GLRT fault detection technique. In Section 3.2.3, the performance of the proposed technique is demonstrated using chemical processes. Finally, the conclusions are presented in Section 3.2.4.

3.2.2 Kernel Partial Least Squares (KPLS) method PLS is a popular input–output latent variable regression method, which has been widely used in modeling and monitoring [36,37,21]. PLS identifies only linear structure in a given dataset and seeks to relate two sets of variables, inputs and outputs represented by two matrices X ∈ RN ×m and Y ∈ RN ×p by determining the score matrices for each data set and then linearly relating these score matrices. The scores and loading vectors of the PLS model are related as follows [36]: X = TP T + E = Y = UQT + F



T i=1 ti pi + E ,  = j=1 uj qTj + F ,

(3.39)

where E ∈ RN ×m and F ∈ RN ×p are the PLS model input and output residual matrices, respectively, T = [t1 , t2 , . . . , t ] ∈ RN × and U = [u1 , u2 , . . . , u ] ∈ RN × are the input and output score matrices, respectively, and P = [p1 , p2 , . . . , p ] ∈ Rm× and Q = [q1 , q2 , . . . , q ] ∈ Rp× are the input and output loading matrices, respectively. Note that as in PLS, the X and Y matrices are first standardized to have zero mean and unit variance before constructing the PLS model. However, the use of the linear PLS [38] could lead to prediction and modeling errors in cases of nonlinear processes. To address this issue and to extend this technique to deal with nonlinear input-output models, many extensions have been proposed to define the

66

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

nonlinearities [39,40,38,29]. The PLS has been extended to nonlinear regression through the use of kernels using the kernel PLS (KPLS) [36]. The KPLS method is to use PLS to factorize the kernel matrix directly. KPLS method assumes that the regressor data X is mapped by a mapping function φ to a higher-dimensional inner product space H. The key idea of KPLS is first to map the input space into a feature space H via nonlinear mapping φ and then to perform a linear PLS in H. The measured inputs are projected into the feature space using the mapping function φ : x → φ(x) ∈ Rh ,

(3.40)

where h  m is the dimension of the feature space. It is worth nothing that φ(·) is not explicitly defined. Let K be the Gram matrix of data X, such that the entries of the kernel k(xi , xj ) between two vectors in H is equal to the inner product < φ(xi ), φ(xj ) >H , that is, k(xi , xj ) = < φ(xi ), φ(xj ) > = φ(xi )T φ(xj ),

(3.41)

where φ is the mapped vector of the new observation x in the feature space, and k(x) is the vector of kernel functions evaluated in the pairs (x, xk ) for k = 1, . . . , N. The kernel matrix for the input K is calculated as ⎡ ⎢ ⎢ ⎢ K =⎢ ⎢ ⎣

k(x1 , x1 )

...

k(x1 , xN )

. ... . . k(xN , x1 ) . . . k(xN , xN ) . . .

⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎦

(3.42)

Similar to linear PLS, we assume a zero-mean KPLS model. The mapped data in a feature space H must be centralized as K = (IN − 1n 1N 1TN )K (IN − 1n 1N 1TN ),

(3.43)

where 1N is the (N × 1) vector with elements equal to one, and IN represents the N-dimensional identity matrix. Based on the matrix of mapped input data in the feature space, the NIPALS-PLS algorithm is modified, and the KPLS is presented in Algorithm 2 [36] (see also Fig. 3.6).

KPCA- and KPLS-based GLRT for fault detection

67

Algorithm 2 Kernel PLS algorithm. 1. Set i = 1, K1 = K, and Y1 = Y . 2. Initialize the score vector ui (N × 1) of the latent variable ui of Yi as the maximum-variance column of Yi . 3. Compute the score-vector ti (N × 1) of the latent variable ti of i as ti = Ki ui /Ki ui , ti  = 1. 4. Regress the columns of Yi on ti : ci = Yi ti , where ci is a weighting vector. 5. Calculate the new score vector: ui = Yi ci /Yi ci , ui  = 1. 6. Repeat steps (3) to (5) until the convergence of ti . 7. Deflate the matrices: Ki+1 = (I − ti tTi )Ki (I − ti tTi ), Yi+1 = Yi − ti tTi Yi . 8. Save the data in the matrices: T ← ti , U ← ui . 9. Set i = i + 1, and return to step (2). Stop when i > , with  being the selected number of latent variables.

Figure 3.6 KPLS diagram for nonlinear regression.

On this basis, the regression coefficients matrix B can be obtained from 

B = T U T T KU

−1

TTY.

(3.44)

The prediction of the output variables is given by 

Yˆ = B = KU T T KU

−1

TTY.

(3.45)

68

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

For a new observation x of input variables, the output is estimated by 



yˆ = Bφ(x) = Y T T U T T KU

−1 T

k(x).

(3.46)

Then the residual is given by e = y − yˆ .

(3.47)

Thus the GLRT chart can be computed using the formulation presented in Eq. (2.13), G=

1 σ2

e2 .

(3.48)

Let Gα be its corresponding threshold. A fault is detected if the GLRT is out of its respective threshold, G > Gα .

(3.49)

The KPLS-based GLRT algorithm is summarized in Algorithm 3. Algorithm 3 KPLS-GLRT algorithm. Inputs: N × m input data matrices X, Y . 1. Data matrix standardization to zero mean and unit variance, X and Y. 2. Determine the kernel matrix K. 3. Centralization of K matrix as given by Eq. (3.43), 4. Determine the number of principal components (PCs) to be kept in the PLS model. 5. Compute the regression coefficients matrices B as given by Eq. (3.44). 6. Compute the estimation of output matrix Yˆ . 7. Compute the estimation error as in Eq. (3.47) and then compute the GLRT chart G and its corresponding threshold Gα , 8. For new testing data, 9. Detect the presence or absence of fault in [xi ] by checking G > Gα or not.

Next, in Section 3.2.3 the developed KPLS-based GLRT algorithm is illustrated through its application on a controlled continuous stirred tank

KPCA- and KPLS-based GLRT for fault detection

69

Figure 3.7 The time evolution of PLS-based Q statistic on a semilogarithmic scale in the presence of a fault in CA .

reactor (CSTR), in which a nonisothermal irreversible first-order reaction A → B takes place.

3.2.3 KPLS-based GLRT and application to fault detection in CSTR process Next, the performance of the developed KPLS-based GLRT fault detection method is illustrated and compared to KPLS through its chart Q. The comparison is assessed through three different cases studies representing three different types of faults. In the first case study the sensor measuring the concentration CA of A is assumed to be faulty. In the second case study a similar fault is introduced in the temperature of the reactor T. In the third case study, multiple faults are assumed to occur simultaneously in the concentration and temperature inside the reactor.

Case 1: faults in the concentration CA The nonisothermal CSTR model is used to generate the training data. The training data has 500 samples and 4 variables. Additive fault is introduced in samples 400 to 450. Figs. 3.7–3.10 show the FD results of PLS and KPLS-based Q and GLRT techniques. We can show that the PLS and KPLS-based GLRT techniques provide good FD results compared with PLS and KPLS-based Q techniques. The PLS and KPLS-based Q techniques could detect the faults in samples 400 to 450 but with significant false alarm rates (Figs. 3.7 and 3.8). However, the PLS and KPLS-based GLRT techniques (Figs. 3.9 and 3.10) showed better FD results compared to the conventional PLS and KPLS techniques through their charts Q.

70

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 3.8 The time evolution of KPLS-based Q statistic on a semilogarithmic scale in the presence of a fault in CA .

Figure 3.9 The time evolution of PLS-based GLRT statistic on a semilogarithmic scale in the presence of a fault in CA .

Case 2: fault in the temperature T In this section, we have introduced additive fault in temperature in sample interval of 400 to 450. Figs. 3.11 to 3.14 show the fault detection results of the PLS and KPLS-based Q and GLRT techniques. We can show from Figs. 3.11 and 3.12, that PLS and KPLS-based Q techniques could detect the fault in the temperature with high false alarm rates. The FD results show also that the KPLS-based GLRT technique shows a good detection rate for fault in temperature with no false alarms (see Fig. 3.14), whereas the linear PLS-based GLRT technique could detect the fault with high missed detection rates (Fig. 3.13). It shows clear improvement over linear PLS-based GLRT technique that had high missed detection rates. The limitation of

KPCA- and KPLS-based GLRT for fault detection

71

Figure 3.10 The time evolution of KPLS-based GLRT statistic on a semilogarithmic scale in the presence of a fault in CA .

Figure 3.11 The time evolution of PLS-based Q statistic on a semilogarithmic scale in the presence of a fault in T.

Figure 3.12 The time evolution of KPLS-based Q statistic on a semilogarithmic scale in the presence of a fault in T.

72

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 3.13 The time evolution of PLS-based GLRT statistic on a semilogarithmic scale in the presence of a fault in T.

Figure 3.14 The time evolution of KPLS-based GLRT statistic on a semilogarithmic scale in the presence of a fault in T.

linear PLS is that it is not suitable in nonlinear cases and assumes that the relationships between variables are linear and hence may not always be the most appropriate method of analysis. In addition, the advantages of the developed PLS and KPLS-based GLRT fault detection algorithms over the conventional PLS and KPLS-based Q techniques are due to their abilities to provide optimal properties by maximizing the fault detection probability for a particular false alarm rate when using the GLRT statistic.

Case 3: faults in the concentration CA and temperature T In the final case study, we have introduced simultaneous faults in concentration and temperature; additive fault is introduced at sample intervals of 250 to 350 and 400 to 450, respectively. Figs. 3.15–3.18 show process monitor-

KPCA- and KPLS-based GLRT for fault detection

73

Figure 3.15 The time evolution of PLS-based Q statistic on a semilogarithmic scale in the presence of simultaneous faults in CA and T.

Figure 3.16 The time evolution of KPLS-based Q statistic on a semilogarithmic scale in the presence of simultaneous faults in CA and T.

ing results using the PLS and KPLS-based T 2 , Q, and GLRT techniques. We can show from Figs. 3.15 and 3.16 that the PLS and KPLS-based Q techniques cannot detect the multiple faults effectively, whereas both PLS and KPLS-based GLRT techniques show a good fault detection abilities (Figs. 3.17 and 3.18), with some missed detections when using the linear PLS-based GLRT (Fig. 3.17).

3.2.4 Conclusion In this chapter, kernel PLS (KPLS)-based generalized likelihood ratio test (GLRT) is used for nonlinear fault detection. The fault detection problem was addressed so that the data are first modeled using the KPLS method, and then the faults are detected using GLRT. The KPLS approach works similar to PLS; it reduces the dimensionality of nonlinear process variables and variables by projecting into a space with less dimensionality. The idea is

74

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 3.17 The time evolution of PLS-based GLRT statistic on a semilogarithmic scale in the presence of simultaneous faults in CA and T.

Figure 3.18 The time evolution of KPLS-based GLRT statistic on a semilogarithmic scale in the presence of simultaneous faults in CA and T.

to improve the GLRT performance introducing modeling of the data using the KPLS. The KPLS method has been proposed to deal with a nonlinear extension of PLS and provides a good performance over the linear versions. The KPLS method is investigated here as a modeling algorithm in the task of fault detection, and then the faults are detected using GLRT. The KPLS-based GLRT fault detection performance is assessed and compared to that of the conventional KPLS through a simulated continuously stirred tank reactor (CSTR) data. The results demonstrate the effectiveness of the KPLS-based GLRT method over the linear PLS-based GLRT method, and both of them provide a good performance compared with the conventional PLS and KPLS methods for detection of single and multiple sensor faults.

KPCA- and KPLS-based GLRT for fault detection

75

References [1] J.V. Kresta, J.F. MacGregor, T.E. Marlin, Multivariate statistical monitoring of process operating performance, The Canadian Journal of Chemical Engineering 69 (1) (1991) 35–47. [2] J.F. MacGregor, T. Kourti, Statistical process control of multivariate processes, Control Engineering Practice 3 (3) (1995) 403–414. [3] T. Kourti, J.F. MacGregor, Process analysis, monitoring and diagnosis using multivariate projection methods: a tutorial, Chemometrics and Intelligent Laboratory Systems 28 (3) (1995) 3–21. [4] E.L. Russel, L.H. Chiang, R.D. Braatz, Data-Driven Methods for Fault Detection and Diagnosis in Chemical Processes, Springer-Verlag, London, 2000. [5] J.E. Jackson, A User’s Guide to Principal Components, vol. 587, John Wiley & Sons, 2005. [6] B. Schölkopf, A. Smola, K. Müller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation 10 (5) (1998) 1299–1319. [7] M. Mansouri, M. Nounou, H. Nounou, N. Karim, Kernel PCA-based GLRT for nonlinear fault detection of chemical processes, Journal of Loss Prevention in the Process Industries 40 (2016) 334–347. [8] A. Widodo, B.-S. Yang, Application of nonlinear feature extraction and support vector machines for fault diagnosis of induction motors, Expert Systems With Applications 33 (1) (2007) 241–250. [9] C.F. Alcala, S.J. Qin, Reconstruction-based contribution for process monitoring with kernel principal component analysis, Industrial & Engineering Chemistry Research 49 (17) (2010) 7849–7857. [10] X. Deng, X. Tian, Nonlinear process fault pattern recognition using statistics kernel PCA similarity factor, Neurocomputing 121 (2013) 298–308. [11] J. Fan, Y. Wang, Fault detection and diagnosis of non-linear non-Gaussian dynamic processes using kernel dynamic independent component analysis, Information Sciences 259 (2014) 369–379. [12] M. Kramer, Nonlinear principal component analysis using autoassociative neural networks, AICHE Journal 37 (1991) 23–243. [13] D. Dong, T. McAvoy, Nonlinear principal component analysis based on principal curves and neural networks, Computers & Chemical Engineering 20 (1) (1996) 65–78. [14] L.L. Scharf, B. Friedlander, Matched subspace detectors, IEEE Transactions on Signal Processing 42 (8) (1994) 2146–2157. [15] S. Hou, R.C. Qiu, Spectrum sensing for cognitive radio using kernel-based learning, arXiv preprint, arXiv:1105.2978, 2011. [16] S. Valle, W. Li, S.J. Qin, Selection of the number of principal components: the variance of the reconstruction error criterion with a comparison to other methods, Industrial & Engineering Chemistry Research 38 (11) (1999) 4389–4401. [17] M. Tamura, S. Tsujita, A study on the number of principal components and sensitivity of fault detection using PCA, Computers & Chemical Engineering 31 (9) (2007) 1035–1046. [18] J.-M. Lee, C. Yoo, S.W. Choi, P.A. Vanrolleghem, I.-B. Lee, Nonlinear process monitoring using kernel principal component analysis, Chemical Engineering Science 59 (1) (2004) 223–234.

76

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

[19] S.W. Choi, C. Lee, J.-M. Lee, J.H. Park, I.-B. Lee, Fault detection and identification of nonlinear processes based on kernel PCAa, Chemometrics and Intelligent Laboratory Systems 75 (1) (2005) 55–67. [20] P. Cui, J. Li, G. Wang, Improved kernel principal component analysis for fault detection, Expert Systems With Applications 34 (2) (2008) 1210–1219. [21] C. Botre, M. Mansouri, M. Nounou, H. Nounou, M.N. Karim, Kernel PLS-based GLRT method for fault detection of chemical processes, Journal of Loss Prevention in the Process Industries 43 (2016) 212–224. [22] I. Hwang, S. Kim, Y. Kim, C. Seah, A survey of fault detection, isolation, and reconfiguration methods, IEEE Transactions on Control Systems Technology 18 (3) (2010) 636–653. [23] S. Qin, Survey on data-driven industrial process monitoring and diagnosis, Annual Reviews in Control 36 (2) (2012) 220–234. [24] J. Yu, Fault detection using principal components-based Gaussian mixture model for semiconductor manufacturing processes, IEEE Transactions on Semiconductor Manufacturing 24 (3) (2011) 471–486. [25] A. Herve, J. Lynne, Principal component analysis, Wiley Interdisciplinary Reviews: Computational Statistics 2 (2010) 433–459. [26] S. Wang, Y. Chen, Sensor validation and reconstruction for building central chilling systems based on principal component analysis, Energy Conversion and Management 45 (5) (2004). [27] G. Diana, C. Tommasi, Cross-validation methods in principal component analysis: a comparison, Statistical Methods & Applications 11 (1) (2002) 71–82. [28] H. Lodhi, Y. Yamanishi, Nonlinear partial least squares: an overview, in: Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques, ACCM, IGI Global, 2011, pp. 169–189. [29] S. Wold, Nonlinear partial least squares modelling II. Spline inner relation, Chemometrics and Intelligent Laboratory Systems 14 (1–3) (1992) 71–84. [30] E.C. Malthouse, A. Tamhane, R. Mah, Nonlinear partial least squares, Computers & Chemical Engineering 21 (8) (1997) 875–890. [31] S. Wold, H. Ruhe, H. Wold, W.D. Dunn III, The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses, SIAM Journal on Scientific and Statistical Computing 5 (3) (1984) 735–743. [32] R. Rosipal, L. Trejo, Kernel partial least squares regression in reproducing kernel Hilbert space, Journal of Machine Learning Research 2 (2001) 97–123. [33] F. Gustafsson, The marginalized likelihood ratio test for detecting abrupt changes, IEEE Transactions on Automatic Control 41 (1) (1996) 66–78. [34] A.S. Willsky, E. Chow, S. Gershwin, C. Greene, P. Houpt, A. Kurkjian, Dynamic model-based techniques for the detection of incidents on freeways, IEEE Transactions on Automatic Control 25 (3) (1980) 347–360. [35] J.R. Dawdle, A. Willsky, S.W. Gully, Nonlinear generalized likelihood ratio algorithms for maneuver detection and estimation, in: American Control Conference, 1982, IEEE, 1982, pp. 985–987. [36] R. Rosipal, L.J. Trejo, Kernel partial least squares regression in reproducing kernel Hilbert space, Journal of Machine Learning Research 2 (2001) 97–123. [37] Y. Zhang, Z. Hu, Multivariate process monitoring and analysis based on multi-scale KPLS, Chemical Engineering Research and Design 89 (12) (2011) 2667–2678.

KPCA- and KPLS-based GLRT for fault detection

77

[38] S. Wold, N. Kettaneh-Wold, B. Skagerberg, Nonlinear PLS modeling, Chemometrics and Intelligent Laboratory Systems 7 (1–2) (1989) 53–65. [39] G. Baffi, E.B. Martin, A. Morris, Non-linear projection to latent structures revisited: the quadratic PLS algorithm, Computers & Chemical Engineering 23 (3) (1999) 395–411. [40] A. Berglund, S. Wold, INLR, implicit non-linear latent variable regression, Journal of Chemometrics 11 (2) (1997) 141–156.

CHAPTER 4

Linear and nonlinear multiscale latent variable-based generalized likelihood ratio for fault detection Contents 4.1. Linear multiscale latent variable-based generalized likelihood ratio for fault detection 4.1.1 Introduction 4.1.2 Multiscale PCA-based GLRT for fault detection 4.1.2.1 Modeling using multiscale PCA method 4.1.2.2 Fault detection using GLRT 4.1.2.3 MSPCA-based MW-GLRT and applications 4.1.3 Multiscale PLS-based GLRT for fault detection 4.1.3.1 Multiscale Partial Least Square (MSPLS) method 4.1.3.2 MSPLS-based GLRT fault detection technique and applications 4.1.4 Conclusions 4.2. Multiscale nonlinear latent variable-based generalized likelihood ratio test for fault detection 4.2.1 Introduction 4.2.2 Multiscale kernel PCA-based GLRT for fault detection 4.2.2.1 Multiscale kernel PCA description 4.2.2.2 Multiscale kernel GLRT fault detection chart with applications 4.2.3 Multiscale kernel PLS-based GLRT for fault detection 4.2.3.1 Multiscale Kernel Partial Least Square (KPLS) method 4.2.3.2 MSKPLS-based GLRT technique and applications 4.2.4 Conclusion References

79 79 80 80 83 86 98 98 107 112 113 113 116 116 116 122 122 124 132 132

4.1 Linear multiscale latent variable-based generalized likelihood ratio for fault detection 4.1.1 Introduction The presence of measurement errors (noise) in the data and model uncertainties degrade the quality of fault detection (FD) techniques by increasing the rate of false alarms. Therefore the objective of this chapter is to enhance the quality of fault detection by suppressing the effect of these errors using wavelet-based multiscale representation of data, which is a powerful feature extraction tool. Multiscale representation of data has been Data-Driven and Model-Based Methods for Fault Detection and Diagnosis https://doi.org/10.1016/B978-0-12-819164-4.00013-3

Copyright © 2020 Elsevier Inc. All rights reserved.

79

80

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

used to improve the FD abilities of latent variable (LV) (like Partial least square (PLS) and principal component analysis (PCA)). Thus combining the advantages of multiscale representation with those of hypothesis testing should provide further improvements in FD. To do that, multiscale latent variable (MSLV)-based GLRT methods are proposed for fault detection. The advantages of MSLV-based GLRT methods are threefold: (i) the dynamical multiscale representation is proposed to extract accurate deterministic features and decorrelate autocorrelated measurements; (ii) the MSLV model evaluates the LV of the wavelet coefficients at each scale. Due to its multiscale nature, MSLV is appropriate for modeling data that contain contributions from events whose behavior changes over time and frequency; (iii) the GLRT is a composite hypothesis testing method and is known to have better fault detection performance compared to conventional LV-based T 2 and Q statistics. The fault detection performances of the MSLV-based GLRT including MSPCA and MSPLS-based GLRT are illustrated through two examples, one using synthetic data and the other using simulated Tennessee Eastman Process (TEP) data. The results demonstrate the effectiveness of the MS-based approaches over the conventional methods. The rest of the chapter is organized as follows. In Section 4.1.2, we present the proposed multiscale PCA-based GLRT fault detection technique followed by its application to process monitoring. In Section 4.1.3, we propose the developed multiscale PLS-based GLRT for fault detection of chemical processes.

4.1.2 Multiscale PCA-based GLRT for fault detection 4.1.2.1 Modeling using multiscale PCA method Measured process data are usually contaminated with errors that mask the important features in the data and limit the effectiveness of any fault detection method in which these data are used. Unfortunately, chemical and environmental process data (as in the case of most practical data) usually possess multiscale characteristics, meaning that they contain features and noise that occur at varying contributions over time and frequency. For example, a sharp change in the data spans a wide range in the frequency domain but a narrow range in the time domain, whereas a slow change spans a wide range in the time domain but a narrow range in the frequency domain [1]. Also, unlike white noise that spans the entire frequency domain, correlated noises (which are commonly encountered in practical chemical and environmental data) have varying frequency spans [1]. Unfortunately, most

Multiscale LV- and KLV-based GLRT for fault detection

81

Figure 4.1 A schematic diagram of data representation at multiple scales [1].

fault detection methods operate at a single scale (because they are applied using time-domain data) and thus do not account for these multiscale characteristics. Wavelet-based multiscale representation of data has been shown to be a powerful data analysis, modeling, and feature extraction tool due to its ability to provide efficient separation of deterministic and stochastic features [2–4,1]. A signal can be represented at multiple resolutions by decomposing it on a family of wavelets and scaling functions. The scaled signals are determined by projecting the original signal on a set of orthonormal scaling functions of the form [1] φij (t) =



2−j φ(2−j t − k)

(4.1)

or equivalently by filtering the signal using a low-pass filter of length r,   h = h1 , h2 , . . . , hr , derived from the scaling functions. On the other hand, the signals in Fig. 4.1, which are called the detail signals, capture the details between any scaled signal and the scaled signal at a finer scale [1]. These

82

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

detail signals are determined by projecting the original signal on a set of wavelet basis functions of the form [1] ψij (t) =



2−j ψ(2−j t − k)

(4.2)

or equivalently by filtering the scaled signal at a finer scale using a highpass filter of length r, g = [g1 , g2 , . . . , gr ], derived from the wavelet basis functions [1]. Therefore the original signal can be represented as the sum of all detail signals at all scales and the scaled signal at the coarsest scale as follows [1]: x(t) =

−J n2 

k=1

−j

aJk φJk (t) +

J n2  

djk ψjk (t),

(4.3)

j=1 k=1

where j, k, J, and n are the dilation parameter, translation parameter, number of scales (or decomposition depth), and the length of the original signal, respectively [5–7,1]. Various researchers have utilized wavelet based-multiscale representation of data to improve process monitoring. For example, the authors in [8] used a multiscale representation to prefilter the raw data and then used the filtered data in process monitoring using PCA. The authors showed multiscale data prefiltering results in more accurate PCA models and thus improved process monitoring. The author in [9], on the other hand, used a multiscale representation to develop a multiscale PCA (MSPCA) modeling algorithm and then used it to improve process monitoring. The idea behind the MSPCA fault detection method is to construct multiple PCA models using the wavelet coefficients (detail signals) at different scales and then use these models in process monitoring. The author shows that the developed MSPCA-based process monitoring algorithm is more sensitive to anomalies than the conventional PCA approach, which is due to the facts that a wavelet representation is an efficient noise-feature separation tool and that the wavelet coefficients are less autocorrelated than the actual process data. Thus the objective of this section is to utilize a wavelet-based multiscale representation of data to further enhance the effectiveness of the latent variable-based hypothesis testing fault detection methods proposed in the first task of this project. Two approaches will be investigated in this regard. First, a multiscale PCA model will be constructed using the wavelet coefficients at different scales, and then hypothesis testing will be applied

Multiscale LV- and KLV-based GLRT for fault detection

83

Figure 4.2 Schematic illustration of MSPCA model.

using this model to improve the fault detection abilities of MSPCA-based GLRT fault detection method even further. Fig. 4.2 illustrates the MSPCA algorithm developed by Bakshi ([10]), and its key steps are highlighted in Algorithm 1. Algorithm 1 MSPCA algorithm. Input: N × m data matrix X, confidence interval α . 1. For each column (i.e., process variable) in the data matrix, compute the wavelet decomposition, 2. For each block (matrix) of scaled and detail coefficients at each scale, compute the covariance matrix along with the number of principal components, as well as PCA loadings and scores of those wavelet coefficients, 3. Once the appropriate number of loadings is selected, select the wavelet coefficients larger than a certain threshold, 4. For all scales together, carry out PCA by including only scales with significant events during reconstruction.

4.1.2.2 Fault detection using GLRT Although the GLRT chart does show improved performance, it is important to note that the statistic is computed using only the current observation. In the literature [11], it is shown through many charts that the utilization of a technique with increased process highly improves the fault detection performance, thus motivating the extension of the GLRT chart to one that incorporates a moving window, and the formulation of this statistic is described next.

84

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

The moving-window GLRT statistic can be computed as follows: M G = 2 log

fθ (Y ) , fθ =0 (Y )

(4.4)

where Y = [y(i − (w − 1)) y(i − (w − 2)) · · · y(i − 1) y(i)], and i and w are the observation number and the length of moving window, respectively. Like the GLRT chart, to establish the threshold for the MW-GLRT chart, the distribution of the derived MW-GLRT statistic needs to be determined. The moment-generating function can be used to derive this distribution. The conventional GLRT statistic follows a chi-square distribution with one degree of freedom, and its moment-generating function is of the following form: r

Myi (x) = (1 − 2x)− 2 ,

(4.5)

where r represents the degree of freedom of a given chi-square distribution. The moment-generating function for the MW-GLRT statistic can then be computed as follows [11]: MYi (x) = wi=1 Myi (x) r − 21

= (1 − 2x)

(4.6) r − 22

.(1 − 2x)

r +r ···+r − 1 22 w

− r2w

· · · (1 − 2x)

= (1 − 2x)

w

= (1 − 2x)− 2 .

The derived expression shows that the MW-GLRT statistic follows a chi-square distribution with degree of freedom equal to the window length (WL) w [11]. Therefore this section focuses on extending MSPCA and developing an MSPCA-based MW-GLRT technique to improve the fault detection performance. The idea behind an MSPCA-based MW-GLRT fault detection algorithm is to incorporate the advantages brought forward by MSPCA with the MW-GLRT fault detection chart. This can be accomplished through the fault detection algorithm illustrated in Algorithm 2 (see also Fig. 4.3). The algorithm, which studies the developed MSPCA-based MWGLRT fault detection technique, is presented in Algorithm 2. The MSPCA-based MW-GLRT is proposed to detect the faults in the residual vector obtained from the MSPCA model, through which the MW-GLRT is used for each residual vector E.

Multiscale LV- and KLV-based GLRT for fault detection

85

Algorithm 2 MSPCA-based MW-GLRT fault detection algorithm. Input: N × m data matrix X, confidence interval α . Training Data 1. For each column (i.e., process variable) in the data matrix, compute the wavelet decomposition. 2. Compute the mean and standard deviation of all process variables and standardize the data matrix. 3. Each variable is decomposed into wavelet coefficients: - A matrix of wavelet coefficients at each scale is formed, - PCA is carried out on each of these scales, - Using either the T 2 or Q, threshold limits for each of the scales are computed to threshold and retain wavelet coefficients at each scale. 4. The data matrix is reconstructed using retained wavelet scales and coefficients. 5. PCA is carried out on the reconstructed training data matrix to obtain an approximate data matrix and residuals. Testing Data 1. The data matrix is standardized using the mean and standard deviation of the fault-free variables computed in the training data set. 2. Each variable is decomposed into wavelet coefficients: - A matrix of wavelet coefficients at each scale is formed, - PCA is carried out on each of these scales, - Compute either the T 2 or Q statistic and compare these to the threshold values computed during the training stage for each scale. 3. The data matrix is reconstructed using only the retained wavelet coefficients. 4. PCA is carried out on the reconstructed testing data matrix to obtain an approximate data matrix and residuals: - The MG (M-GLRT) statistic is computed using the residuals obtained from the MSPCA model. - If MG(y) < Gα , then the process is operating under normal operating conditions. Else, a fault is declared.

86

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.3 Schematic illustration of proposed MSPCA-based MW-GLRT algorithm.

Next, the performance of the developed MSPCA-based MW-GLRT technique is assessed and compared to the conventional PCA and MSPCA methods through two examples, one using synthetic data and the other using simulated TEP data.

4.1.2.3 MSPCA-based MW-GLRT and applications Next, the fault detection performance of the developed MSPCA-based MW-GLRT algorithm will be evaluated using three fault detection criteria: the missed detection (MD) rate, the false alarm (FA) rate, and the out-of-control average run length ARL1 [12]. The missed detection rate is computed by calculating the percentage of observations that go undetected in the faulty region, whereas the false alarm rate is computed by calculating the percentage of incorrect faulty declarations in the nonfaulty region. ARL1 is the number of observations it takes for a particular technique to detect a fault in the faulty region after it has been introduced, that is, speed of detection.

Improved performance using a moving-window GLRT The GLRT fault detection method has been shown to provide improved fault detection abilities over the conventional T 2 and Q statistics as illustrated in subsections on pages 87 and 94. Computing the statistic used in the GLRT, however, requires computing the norm of the evaluated residuals at every time instant. The authors of [11] have indicated that further improvement in the performance of the GLRT can be achieved by applying the GLRT statistic in a moving window, which improves the average run length ARL over the conventional GLRT.

Multiscale LV- and KLV-based GLRT for fault detection

87

It is well known, however, that increasing the moving window length indefinitely may not provide the sought results. This is because computing the norm in a moving window is analogous to applying a “mean filter”, where the filtered data are computed by taking an average of the data in a moving window. In data filtering, using a mean filter of an infinite length corresponds to using a very low frequency threshold, which would eliminate all features in the data. That is why a mean filter of a finite (not infinite) length is normally used in practice [10]. Increasing the window length used in the moving-window GLRT indefinitely may not provide the expected performance, even though it provides better ARL values. To investigate this issue, the following simulation is performed, where the effect of the window length on the performance of the GLRT is studied. The performance of the GLRT is assessed using three different criteria: missed detection rate, false alarm rate, and ARL. The data used in this analysis are generated using a normal distribution with zero mean and unit variance, and then a fault (of magnitude equal twice the standard deviation of the noise) is introduced, which is to be detected using the moving window GLRT statistic at different window lengths. The results of this study are illustrated in Fig. 4.4. These results show that at larger window lengths the missed detection rate and ARL decrease (which means faster and more effective detection), but the false alarm rate increases. This means that there is a trade-off between better detection and false alarms, which means that the window length cannot be increased indefinitely and an optimum window length should be used. The advantage of using a moving-window GLRT statistic over a conventional GLRT statistic can be explained as follows. Computing the norm used in the GLRT statistic using multiple samples (when the window length is larger than one) provides a filtered or a smoother estimate of the GLRT statistic than that obtained using the conventional GLRT method.

Simulated synthetic data The simulated synthetic example replicates and extends the illustrative example carried out in the original MSPCA paper [10]. Two variables are generated using Gaussian measurements that are uncorrelated, of zero mean and unit variance. The final variables are generated by adding and subtracting the first two variables, respectively, as shown in the following equations [10]:

88

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.4 The effect of the window length on the performance of the moving-window GLRT (the fault size equals twice the residuals standard deviation). (A) Missed detection rate, (B) false alarm rate and (C) average run length (ARL) vs. window length.

Multiscale LV- and KLV-based GLRT for fault detection

89

Figure 4.5 Monitoring a fault of magnitude unity using PCA-based T 2 chart.

⎧ ⎪ x˜ 1 (t) = N (0, 1), ⎪ ⎪ ⎨ x˜ (t) = N (0, 1), 2 ⎪ x˜ 3 (t) = x˜ 1 (t) + x˜ 2 (t), ⎪ ⎪ ⎩ x˜ (t) = x˜ (t) − x˜ (t). 4 1 2

(4.7)

The measured data matrix X˜ (of four variables) is then contaminated by white noise, which is an uncorrelated Gaussian error of zero mean and standard deviation 0.2 as follows [10]: X (t) = X˜ (t) + 0.2N (0, 1).

(4.8)

Normal operating condition consisted of 2048 equally spaced observations. Abnormal operation (also of 2048 observations) consists of a step change in the mean in all four variables between samples between observations 1500 and 2048 (the faulty region is highlighted in light blue (light gray in print version) in all charts). The performance of MSPCA-based MW-GLRT chart is illustrated and compared to the conventional PCA and MSPCA methods for two different cases. Case 1 This case considers a mean shift of magnitude unity, which is large enough for reasonable detection by PCA and MSPCA. The fault detection results of PCA and MSPCA methods are shown in Figs. 4.5, 4.6, 4.7, and 4.8, respectively. As shown in Fig. 4.5, the PCA-based T 2 chart is unable to detect most of the fault, whereas the MSPCA-based T 2 chart (Fig. 4.6) has a lower missed detection rate. The same trend can be shown using PCA and MSPCA-based Q charts as well (Figs. 4.7 and 4.8). Figs. 4.9, 4.10, and 4.11 show the fault detection results using the developed MSPCA-based GLRT for Case 1. We can see from Figs. 4.9,

90

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.6 Monitoring a fault of magnitude unity using MSPCA-based T 2 chart.

Figure 4.7 Monitoring a fault of magnitude unity using PCA-based Q chart.

Figure 4.8 Monitoring a fault of magnitude unity using MSPCA-based Q chart.

4.10, and 4.11 that MSPCA-based MW-GLRT charts (Figs. 4.10 and 4.11) are able to produce better detection that the conventional MSPCA-based GLRT chart (Fig. 4.9), as they are able to detect almost 100% of the fault.

Multiscale LV- and KLV-based GLRT for fault detection

91

Figure 4.9 Monitoring a fault of magnitude unity using MSPCA-based GLRT chart.

Figure 4.10 Monitoring a fault of magnitude unity using MSPCA-based MW-GLRT chart (WL = 4).

Figure 4.11 Monitoring a fault of magnitude unity using MSPCA-based MW-GLRT chart (WL = 8).

92

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.12 Monitoring a fault of magnitude 1σ using PCA-based T 2 chart.

Figure 4.13 Monitoring a fault of magnitude 1σ using MSPCA-based T 2 chart.

Figure 4.14 Monitoring a fault of magnitude 1σ using PCA-based Q chart.

Case 2 This case considers a mean shift of one standard deviation σ of all variables. This fault is only 100% of standard deviation of all variables, and most known conventional techniques fail to detect this fault. The fault detection performances of PCA and MSPCA methods are shown in Figs. 4.12–4.18.

Multiscale LV- and KLV-based GLRT for fault detection

93

Figure 4.15 Monitoring a fault of magnitude 1σ using MSPCA-based Q chart.

Figure 4.16 Monitoring a fault of magnitude 1σ using MSPCA-based GLRT chart.

Figure 4.17 Monitoring a fault of magnitude 1σ using MSPCA-based MW-GLRT chart (WL = 4).

Both PCA-based T 2 and Q charts fail to detect most of the fault (as seen in Figs. 4.12 and 4.13). The MSPCA-based Q chart (Figs. 4.14 and 4.15) does show slightly improved performance over the conventional PCAbased Q chart. However, the performance is further improved through the application of the MSPCA-based GLRT and MW-GLRT charts. The

94

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.18 Monitoring a fault of magnitude 1σ using MSPCA-based MW-GLRT chart (WL = 8).

missed detection rate produced by utilizing an MW-GLRT with window size of 8 (Fig. 4.18) is able to achieve a missed detection, which is relatively low when compared to all other charts. To ensure that meaningful results can be made, a Monte Carlo simulation of 5000 realizations was conducted. The fault detection results are summarized in Table 4.1. These results show that the MSPCA-based MWGLRT chart is able to significantly reduce both the missed detection rate and ARL1 values. However, it does show a slightly elevated false alarm rate, which increases using a larger window length. This can be attributed to two reasons: Gibbs’ phenomenon (which is described as the production of artificial artifacts near discontinuities during reconstruction) and the same weightage given to all observations in a moving window. However, it can be argued that the reduction in missed detection rate and ARL1 outweighs the slight increase in the false alarm rate, and therefore its implementation should be pursued. Next, the developed MSPCA-based MW-GLRT fault detection algorithm is illustrated through its application on a Tennessee Eastman Process.

Tennessee Eastman Process To further validate the developed MSPCA-based MW-GLRT fault detection methods, they would need to be tested on process data from a plant. The TEP is a well-defined simulation of a chemical process commonly used in process monitoring research [13–17]. For each of the given testing data sets, the fault was introduced at observation 224, and the fault continued until the end of the data set (the faulty region is highlighted in light blue (light gray in print version) in the figures illustrating the simulation). The missed detection rates (%), false alarm rates

Multiscale LV- and KLV-based GLRT for fault detection

95

Table 4.1 Summary of missed detection (%), false alarms (%), and ARL1 for simulated data using PCA and MSPCA models. Chart Case 1 Case 2 MDR (%) FAR (%) ARL1 MDR (%) FAR (%) ARL1 T2 86.15 3.66 15.00 79.59 3.26 5.00 Q 59.38 0.46 2.00 49.72 0.66 5.00 2 MS-T 27.32 4.66 23.00 2.18 5.13 6.00 MS-Q 1.09 4.73 1.00 1.45 5.00 1.00 MS-GLRT 37.15 3.13 2.00 26.41 2.86 5.00 4.60 1.00 0.18 3.73 1.00 MS MW-GLRT 0.36

(WL = 4) MS-MW-GLRT (WL = 8)

0.18

6.40

1.00

0.18

4.46

1.00

(%), and ARL1 values obtained for all faults are summarized in Tables 4.2 to 4.7. The results obtained using TEP data are consistent with those obtained from the example, which utilize simulated synthetic data. The MSPCAbased MW-GLRT technique is able to provide significantly lower missed detection rates (Table 4.4) and ARL1 values (Table 4.7) than most of the other techniques for a number of the faults. However, the MSPCA-based MW-GLRT technique does show a slightly elevated false alarm rate for a few of the faults (Table 4.5), and this can again be attributed to the Gibbs phenomenon and also to the fact that the moving window assigns equal weightage to all observations in the window. The MSPCA-based MWGLRT technique is able to provide significantly lower false alarm rates, when compared to MSPCA-based T 2 and Q charts (Tables 4.2, 4.4, and 4.6). These results are further illustrated in Figs. 4.19–4.25 for TEP fault 12. The MSPCA-based T 2 and Q charts (Figs. 4.20 and 4.22) outperform the conventional PCA-based T 2 and Q charts (Figs. 4.19 and 4.21) with lower missed detection rates and ARL1 values. The MSPCA-based GLRT and MW-GLRT charts (Figs. 4.23–4.25) provide further reduction in missed detection rates and ARL1 values compared to the conventional PCA and MSPCA through their T 2 and Q charts. The fault detection results of TE process monitoring show that the developed MSPCA-based MW-GLRT provides good results compared to the conventional methods in terms of missed detection rates and ARL1 values. This is because in the MW-GLRT method the detection chart equals the norm of the residuals in that window, which means that we need to select a

96

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 4.2 Summary of missed detection rates (%) for TEP data using PCA and MSPCA. Fault PCA-based T 2 PCA-based Q MSPCA-based T 2 MSPCA-based Q IDV(1) 1.00 0.12 42.25 0.00 IDV(2) 1.62 2.75 1.75 0.00 93.75 90.37 80.50 18.12 IDV(3) 92.75 93.12 83.75 24.12 IDV(4) IDV(5) 77.75 78.87 61.75 21.75 0.00 0.00 0.00 0.00 IDV(6) IDV(7) 76.12 74.12 40.12 10.00 15.87 22.50 0.25 2.00 IDV(8) IDV(9) 94.00 93.25 79.75 16.75 52.62 32.25 2.00 IDV(10) 85.87 89.87 54.25 40.62 IDV(11) 57.50 19.37 0.50 0.37 IDV(12) 32.25 IDV(13) 14.25 6.25 5.00 2.00 9.50 0.00 12.00 IDV(14) 0.62 89.00 63.00 29.00 IDV(15) 94.12 64.78 41.12 2.87 IDV(16) 88.00 14.00 4.12 0.87 IDV(17) 17.50 7.87 2.12 0.00 IDV(18) 8.25 IDV(19) 79.37 88.75 72.62 34.50 53.75 36.75 2.25 IDV(20) 79.50 36.50 86.50 7.75 IDV(21) 50.12

proper moving window length, similar to estimating the mean filter length in data filtering. We have previously demonstrated the effectiveness of the multiscale PCA-based moving-window (MW) generalized likelihood ratio test (GLRT) technique over the classical PCA and MSPCA-based GLRT methods. Since the results of the MSPCA-based MW-GLRT technique demonstrate the importance of selecting the length of the moving window accurately, we suggest extending the MW-GLRT technique to one that assigns exponential weights to residuals in the moving window (instead of equal weightage) as it might be able to further improve fault detection performance by reducing the false alarm rate. The developed detection method, which is called EWMA-GLRT, provides improved properties, such as smaller missed detection and false alarm rates and smaller average run length. The idea behind the developed EWMA-GLRT is computing a new GLRT statistic that integrates current and previous data information in a decreasing exponential fashion giving

Multiscale LV- and KLV-based GLRT for fault detection

97

Table 4.3 Summary of missed detection rates (%) for TEP data using MSPCA-based GLRT and MW-GLRT. Fault GLRT MW-GLRT MW-GLRT (WL = 4) (WL = 8) IDV(1) 1.25 0.00 0.00 IDV(2) 9.00 2.62 0.50 IDV(3) 99.25 88.37 80.75 99.87 91.37 87.37 IDV(4) IDV(5) 79.75 70.75 67.00 IDV(6) 0.50 0.12 0.00 IDV(7) 64.37 54.62 45.25 IDV(8) 14.00 5.37 2.37 IDV(9) 98.75 94.25 90.25 IDV(10) 89.62 32.12 26.0 98.37 92.12 87.75 IDV(11) IDV(12) 9.87 1.50 0.25 IDV(13) 4.87 4.12 3.50 IDV(14) 14.75 0.00 0.00 IDV(15) 98.62 80.62 73.50 IDV(16) 74.25 48.50 36.87 16.37 5.00 1.50 IDV(17) IDV(18) 10.12 8.75 6.75 IDV(19) 96.50 90.50 85.00 IDV(20) 41.87 24.87 19.00 IDV(21) 76.87 33.50 27.00

more weight to the more recent data. This provides a more accurate estimation of the GLRT statistic and provides a stronger memory that will enable better decision making with respect to fault detection. Therefore, next, an MSPCA-based EWMA-GLRT method is developed and utilized to improve fault detection. The idea behind an MSPCA-based EWMA-GLRT fault detection algorithm is combining the advantages brought forward by the proposed EWMA-GLRT fault detection chart with the PCA model and multiscale representation. Thus it is used to enhance fault detection using the simulated example presented in section on page 87. Figs. 4.26 to 4.33 show the detection efficiency of the developed EWMA-GLRT when compared to those based on moving-window GLRT charts (with different window lengths) for the two case studies (section on page 87). The results demonstrate the effectiveness of the proposed MSPCA-based EWMA-GLRT method over moving-window GLRT methods. The detection performance is also assessed and evaluated in terms

98

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 4.4 Summary of false alarm rates (%) for TEP data using PCA and MSPCA. Fault PCA-based T 2 PCA-based Q MSPCA-based T 2 MSPCA-based Q IDV(1) 0.00 1.33 53.12 19.64 IDV(2) 1.33 0.89 37.50 34.37 IDV(3) 3.57 5.35 37.50 66.51 IDV(4) 1.78 3.57 9.82 66.51 IDV(5) 0.00 1.33 40.17 53.12 63.39 27.32 77.67 54.91 IDV(6) IDV(7) 0.44 0.00 46.42 50.89 IDV(8) 0.00 0.44 12.50 29.46 IDV(9) 5.80 4.91 48.21 64.28 IDV(10) 0.44 0.00 20.53 40.62 IDV(11) 2.23 2.23 19.64 39.28 IDV(12) 0.00 0.00 10.71 0.00 0.00 2.23 2.67 IDV(13) 0.00 IDV(14) 0.00 0.44 0.00 16.96 IDV(15) 2.67 3.57 15.17 41.07 IDV(16) 0.89 4.46 43.75 60.71 IDV(17) 0.89 3.12 14.28 33.03 IDV(18) 15.62 9.37 25.89 49.55 0.89 2.67 50.44 IDV(19) 0.44 IDV(20) 0.44 0.44 16.07 25.89 IDV(21) 4.46 14.73 24.10 74.55

of false alarm rates (FAR), missed detection rates (MDR), and ARL1 values (Table 4.8). The developed EWMA-GLRT chart provides fast and effective detection while maintaining a low false alarm rate. In the EWMA-GLRTbased method the detection statistic equals the weighting norm of the residuals, which is equivalent to applying an EWMA filter on the squares of the residuals. The idea behind the developed EWMA-based GLRT is computing a new GLRT statistic that integrates current and previous data information in a decreasing exponential fashion giving more weight to the more recent data. This helps provide a more accurate estimation of the GLRT statistic and provide a stronger memory, which enable better decision making with respect to fault detection.

4.1.3 Multiscale PLS-based GLRT for fault detection 4.1.3.1 Multiscale Partial Least Square (MSPLS) method In the previous works [18,19] the authors have developed linear and nonlinear PCA- and PLS-based GLRT fault detection methods that showed

Multiscale LV- and KLV-based GLRT for fault detection

99

Table 4.5 Summary of false alarm rates (%) for TEP data using MSPCA-based GLRT and MW-GLRT. Fault GLRT MW-GLRT MW-GLRT (WL = 4) (WL = 8) IDV(1) 0.44 1.33 2.23 IDV(2) 0.00 0.00 0.00 IDV(3) 0.00 0.00 0.00 0.00 0.00 0.44 IDV(4) IDV(5) 0.00 0.44 1.78 IDV(6) 0.00 0.00 0.89 IDV(7) 0.00 1.33 6.25 IDV(8) 0.44 2.67 7.14 IDV(9) 4.01 18.75 21.87 IDV(10) 0.00 0.89 3.12 0.00 0.00 1..33 IDV(11) IDV(12) 0.00 0.00 1.78 IDV(13) 0.00 0.00 0.00 IDV(14) 0.00 0.44 2.23 IDV(15) 0.44 0.00 0.00 IDV(16) 0.00 11.16 21.87 0.44 0.89 0.89 IDV(17) IDV(18) 0.00 0.00 0.00 IDV(19) 0.00 0.00 0.00 IDV(20) 0.00 0.00 0.00 IDV(21) 0.44 7.14 14.28

improved detection of anomalies in chemical reactors over conventional methods. The developed linear and nonlinear PCA- and PLS-based GLRT fault detection algorithms provide optimal properties by maximizing the fault detection probability for a particular false alarm rate. Also, since process measurements are usually contaminated with errors, the objective of this section is extending the developed fault detection methods to deal with uncertainty in the measurements. To do that, we will utilize a wavelet-based multiscale representation of data. Multiscale representation is a powerful data analysis and feature extraction tool, which has been shown to improve the fault detection ability of input and input–output models [10]. We will exploit these advantages in this section to enhance the fault detection abilities and take outcome measures into account and widen its applicability in practice using input–output models by developing hypothesis testing fault detection techniques that are based on these latent variable modeling

100

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 4.6 Summary of ARL1 for TEP data using PCA and MSPCA. Fault PCAPCAMSPCAMSPCAbased T 2 based Q based T 2 based Q

IDV(1) IDV(2) IDV(3) IDV(4) IDV(5) IDV(6) IDV(7) IDV(8) IDV(9) IDV(10) IDV(11) IDV(12) IDV(13) IDV(14) IDV(15) IDV(16) IDV(17) IDV(18) IDV(19) IDV(20) IDV(21)

9 13 32 1 1 1 1 29 1 8 7 3 50 1 78 24 2 1 11 68 21

2 6 24 1 1 1 1 29 3 29 22 4 39 3 67 19 20 5 13 79 41

1 5 29 1 1 1 1 1 1 15 9 5 33 1 25 1 17 3 1 9 31

1 1 1 1 1 1 1 1 1 1 1 1 17 3 1 1 1 1 1 1 1

methods. In the developed approach, we will construct PLS model using the wavelet coefficients at different scales and then apply GLRT for fault detection. The fault detection performances of the developed MSPLS-based GLRT technique are illustrated through a simulated continuously stirred tank reactor (CSTR) data and Tennessee Eastman Process (TEP) in terms of missed detection rate, false alarm rate, and ARL1 values. CSTR and TEP provide a realistic continuous process plant problem to validate the process monitoring and fault detection performances. The fault detection results demonstrate the effectiveness of the MSPLS-based GLRT method over the PLS-based GLRT technique, and both of them outperform the conventional PLS and MSPLS techniques. Teppola and Minkkinen [20] were the first to propose implementation of multiscale PLS by combining a multiscale representation with PLS model, and they used this approach basically to remove the long-term drift

Multiscale LV- and KLV-based GLRT for fault detection

101

Table 4.7 Summary of ARL1 for TEP data using MSPCA-based GLRT and MW-GLRT. Fault GLRT MW-GLRT MW-GLRT (WL = 4) (WL = 8)

IDV(1) IDV(2) IDV(3) IDV(4) IDV(5) IDV(6) IDV(7) IDV(8) IDV(9) IDV(10) IDV(11) IDV(12) IDV(13) IDV(14) IDV(15) IDV(16) IDV(17) IDV(18) IDV(19) IDV(20) IDV(21)

3 16 9 161 3 5 1 19 1 36 74 4 37 3 69 19 20 80 69 72 242

1 9 24 59 1 2 1 16 1 8 189 1 34 1 89 16 17 13 10 67 232

1 5 20 55 1 1 1 13 1 5 186 1 29 1 76 2 13 9 7 5 29

Figure 4.19 Monitoring TEP fault 12 using PCA-based T 2 chart.

in the data set. They proposed to extend the work of Bakshi [10], who addressed the multiscale PCA to deal with input–output models (such as PLS). For input–output models, on the other hand, PLS has been widely

102

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.20 Monitoring TEP fault 12 using MSPCA-based T 2 chart.

Figure 4.21 Monitoring TEP fault 12 using PCA-based Q chart.

Figure 4.22 Monitoring TEP fault 12 using MSPCA-based Q chart.

used to extract relationships between two sets of variables, inputs and outputs. PLS is usually used to predict the dependency of certain key variables that are challenging or expensive to measure (called outputs) from other process variables that are easier to measure (called inputs). Process residuals used for the input model-based fault detection techniques may often contain high levels of noise and be highly correlated and non-Gaussian, and

Multiscale LV- and KLV-based GLRT for fault detection

103

Figure 4.23 Monitoring TEP fault 12 using MSPCAbased GLRT chart.

Figure 4.24 Monitoring TEP fault 12 using MSPCA-based MW-GLRT chart (WL = 4).

Figure 4.25 Monitoring TEP fault 12 using MSPCA-based MW-GLRT chart (WL = 8).

these may affect the techniques. Thus the wavelet-based multiscale representation of data is used to further enhance the effectiveness of the PLS. The multiscale PLS algorithm is shown in Algorithm 3. Next, we present the developed MSPLS-based GLRT fault detection technique.

104

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.26 Monitoring a fault of magnitude unity using the MSPCA-based MW-GLRT (WL = 2) chart.

Figure 4.27 Monitoring a fault of magnitude unity using the MSPCA-based MW-GLRT (WL = 4) chart.

Figure 4.28 Monitoring a fault of magnitude unity using the MSPCA-based MW-GLRT (WL = 8) chart.

MSPLS-based GLRT fault detection algorithm In the developed MSPLS-based GLRT approach, PLS model will be constructed using the wavelet coefficients at different scales, and then GLRT

Multiscale LV- and KLV-based GLRT for fault detection

105

Figure 4.29 Monitoring a fault of magnitude unity using the MSPCA-based EWMA-GLRT chart.

Figure 4.30 Monitoring a fault of magnitude 1σ using the MSPCA-based MW-GLRT (WL = 2) chart.

Figure 4.31 Monitoring a fault of magnitude 1σ using the MSPCA-based MW-GLRT (WL = 4) chart.

will be applied using this model to improve the fault detection abilities of the PLS-based GLRT fault detection method. The multiscale PLS-based GLRT fault detection technique aims to detect the additive fault with the

106

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.32 Monitoring a fault of magnitude 1σ using the MSPCA-based MW-GLRT (WL = 8) chart.

Figure 4.33 Monitoring a fault of magnitude 1σ using the MSPCA-based EWMA-GLRT chart. Table 4.8 Summary of missed detection (%), false alarms (%), and ARL1 for simulated data using MSPCA. Case 1 2 Chart/Fault MDR (%) FAR (%) ARL1 MDR (%) FAR (%) ARL1 Detection Metric MW-GLRT 34.60 2.33 1 25.86 1.53 1 (WL = 2) MW-GLRT 0.18 2.93 1 0.36 2.46 1 (WL = 4) 0 4.80 1.00 0.18 3.06 1 MW-GLRT (WL = 8) EWMA-GLRT 0 1.46 1 0 1 1

maximum detection probability for a given false alarm. This can be accomplished through the fault detection algorithm illustrated in Fig. 4.34. The algorithm that studies the developed MSPLS-based GLRT fault detection technique is presented in Algorithm 4.

Multiscale LV- and KLV-based GLRT for fault detection

107

Algorithm 3 MSPLS algorithm. Input: N × m inputs data matrix X and N × p outputs data matrix Y 1. Decompose X and Y into coarse approximate scale and detail scales. 2. Modified PLS algorithm is applied at each scale to compute loading vectors and score vector of X and Y matrices. 3. At each scale for X and Y , the coefficients that are not significant (lower than threshold values) are neglected, 4. Modified PLS algorithm is carried out on reconstructed X and Y (after threshold).

Figure 4.34 Representation of MSPLS fault detection model.

Next, the application of the developed MSPLS-based GLRT technique for fault detection of chemical processes is presented.

4.1.3.2 MSPLS-based GLRT fault detection technique and applications In this subsection, the fault detection performance of MSPLS-based GLRT is evaluated in terms of three detection indicators: • Missed Detection Rate (MDR %): percentage of faulty observations undetected; • False Alarm Rate (FAR %): percentage of wrong fault declared in faultfree region; • Average Run Length (ARL1 ): the number of observations taken to detect the fault after the fault is introduced.

108

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Algorithm 4 MSPLS-based GLRT fault detection algorithm. Input: N × m inputs data matrix X and N × p outputs data matrix Y , confidence interval α . Training Data 1. Standardized X and Y matrices with zero mean and unit variance. 2. Select the best decomposition length, the same for both X and Y matrices. 3. X and Y are decomposed, and coarse approximate coefficients and detail coefficients are stored. 4. Modified PLS algorithm is carried out at each scale. 5. Insignificant approximate and detail coefficients are removed from the coefficient matrix; Q statistic is used to calculate the threshold value. 6. Data matrices X and Y are reconstructed based on retained coefficients. 7. Compute GLRT statistic G(y) from the residue obtained from PLS. 8. Compute the GLRT statistic threshold Gα . Testing Data 1. Standardized new X and Y data matrices with the use of mean and standard deviation form training data set from step 1, 2. X and Y data matrices are decomposed using the best decomposition level. 3. Modified PLS algorithm is carried out at each scale, 4. Q statistic is used with PLS at each scale and compared with threshold value Qα . 5. Test data matrices are reconstructed with retained coefficients. 6. Modified PLS algorithm is applied on new reconstructed data sets, and the model residue is computed, 7. Compute GLRT statistic G(y) from the residue obtained from PLS. 8. If G(y) < Gα , then the process is operating under normal operating conditions. Else, a fault is declared.

Multiscale LV- and KLV-based GLRT for fault detection

109

Figure 4.35 Monitoring multiple faults in temperature using PLS-based Q chart.

To demonstrate the advantages of the developed technique, its performance is assessed and compared to conventional fault detection techniques using two applications, one using simulated continuous stirred tank reactor data and the other using Tennessee Eastman process.

Fault detection in simulated CSTR data The input matrix X consists of the cooling water flow rate, reactant flow rate, temperature, and concentration at the exit of CSTR, X = [FC F T CA ]. The training data (Xtrain ) and testing data (Xtest ) are computed by introducing stepwise changes in concentration and temperature controller. Zero-mean Gaussian noises having standard deviation 0.005 and 0.002 were introduced, N (0, 0.0052 ) and N (0, 0.0022 ) were introduced in concentration and temperature of outlet stream to represent practical process measurements. MSPLS is modeled based on the fault-free training data set of nonisothermal CSTR model. Xtrain is the input training matrix with 500 observations in total and four process variables. Next, the performance of the developed MSPLS-based GLRT fault detection method is illustrated and compared to the conventional PLS-based GLRT, PLS, and MSPLS methods. In the case study the sensor measuring the temperature of the reactor (T) is assumed to be faulty with a simple fault of magnitude 3σ . Figs. 4.35 and 4.36 show the FD comparison between PLS-based Q statistic and MSPLS-based Q statistic. MSPLS-based Q statistic shows improvement FD results over PLS-based Q statistic with lower missed detection rate. Detailed results are shown in Table 4.9. PLS-based GLRT (Fig. 4.37) and MSPLS-based GLRT (Fig. 4.38) show improved fault detection performance compared to PLS and MSPLS-based Q statistics, whereas MSPLS-based GLRT shows a good fault detection performance.

110

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.36 Monitoring multiple faults in temperature using MSPLS-based Q chart. Table 4.9 Summary of MDR (%), FAR (%), and ARL1 . Chart/Fault Detection Metric MDR (%) FAR (%) PLS-based Q 15.84 5.75 MSPLS-based Q 7.92 2.00 PLS-based GLRT 11.85 1.00 MSPLS-based GLRT 6.91 0

ARL1

1 1 1 1

Figure 4.37 Monitoring multiple faults in temperature using PLS-based GLRT chart.

Next, the developed MSPLS-based GLRT algorithm is illustrated through its application on a Tennessee Eastman Process.

Fault detection in TE process Tables 4.3–4.5 show the missed detection rate, false alarm rate, and ARL1 for all the 21 faults test data sets respectively. MSPLS-based Q statistic shows slight FD improvement compared to PLS-based Q statistic, and MSPLS-based GLRT shows better fault detection performance with lower missed detection rate (Table 4.10), false alarm rate (Table 4.11), and ARL1 (Table 4.12) compared to conventional PLS-based Q, MSPLS-based Q,

Multiscale LV- and KLV-based GLRT for fault detection

111

Figure 4.38 Monitoring multiple faults in temperature using MSPLS-based GLRT chart. Table 4.10 Missed detection rates (%) for TEP data. Fault PLS-based Q MSPLS-based Q PLS-based

IDV(1) IDV(2) IDV(3) IDV(4) IDV(5) IDV(6) IDV(7) IDV(8) IDV(9) IDV(10) IDV(11) IDV(12) IDV(13) IDV(14) IDV(15) IDV(16) IDV(17) IDV(18) IDV(19) IDV(20) IDV(21)

7 7.87 99.52 100 87.45 2.54 85.33 64.23 99.31 100 100 55.34 41.23 100 100 89.23 20.13 15.31 82.31 53.23 56.41

0.3 1.37 99.3 93.25 76 0 68.34 1.52 93.125 52.98 56.63 9.92 7.5 0 91.625 51.78 2.5 10 79.42 57.25 33.23

GLRT 3 7 99.87 100 85 2 88 60.01 95.41 83.12 100 50.23 33.23 99 100 84 11.34 11.893 83.42 40.23 47.42

MSPLS-based

GLRT 0 1 92.25 97.42 73.5 0 65.5 0.87 87.41 49.82 52.23 7.45 5.13 0 91.03 80 2.25 8.68 74.31 35.23 41.23

and PLS-based GLRT techniques for most of the faults. Figs. 4.39, 4.40 and 4.42 show the fault detection performance of PLS- and MSPLS-based Q techniques for IDV 2 fault test data. The MSPLS-based Q statistic shows better fault detection than PLS-based Q statistic, which are out-

112

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 4.11 False alarm rates (%) for TEP data. Fault PLS-based Q MSPLS-based Q

PLS-based

MSPLS-based

IDV(1) IDV(2) IDV(3) IDV(4) IDV(5) IDV(6) IDV(7) IDV(8) IDV(9) IDV(10) IDV(11) IDV(12) IDV(13) IDV(14) IDV(15) IDV(16) IDV(17) IDV(18) IDV(19) IDV(20) IDV(21)

GLRT 0 0 0 0 0 0 0 0 5.41 0 0 0 11.34 0 0 3.42 0.89 9.31 9.41 0 17.14

GLRT 0 0.89 3.34 0.83 0 0 0 0 7.3 0 0 5.53 0 2.34 1.31 9.64 0 1.78 5.42 0 7.53

4.24 12.24 0 0 0 2.23 0 12.23 12.42 1.43 5.35 30 5.24 0 0 0.44 2.23 14.73 17.31 0 22.41

15.18 14.71 6.25 4.73 3.43 1.16 1.63 0 4.01 5.34 6.25 2.24 3.12 0 5.80 30.12 3.12 14.01 4.46 4.46 26.23

performed by PLS-based GLRT (Fig. 4.41) and MSPLS-based GLRT (Fig. 4.41).

4.1.4 Conclusions In this chapter, we proposed a generalized likelihood ratio test based on multiscale PCA (MSPCA) and multiscale PLS (MSPLS) methods for fault detection. This was accomplished by using the MSPCA and MSPLS methods for modeling purposes and then by detecting faults by applying the GLRT chart on the residuals obtained from the models. The performances of the developed techniques are assessed and compared to conventional techniques using two illustrative examples, simulated synthetic data, and the Tennessee Eastman Process. The results demonstrate the effectiveness of the MSPCA- and MSPLS-based GLRT techniques over the PCA and PLS methods in terms of lower missed detection rates and ARL1 values.

Multiscale LV- and KLV-based GLRT for fault detection

Table 4.12 ARL1 values for TEP data. Fault PLS-based Q MSPLS-based Q

IDV(1) IDV(2) IDV(3) IDV(4) IDV(5) IDV(6) IDV(7) IDV(8) IDV(9) IDV(10) IDV(11) IDV(12) IDV(13) IDV(14) IDV(15) IDV(16) IDV(17) IDV(18) IDV(19) IDV(20) IDV(21)

7 6 364 −

18 4 35 81 12 − −

56 55 − −

193 31 13 1 65 168

1 12 5 1 1 1 1 22 1 6 6 4 8 1 210 1 15 24 8 17 55

113

PLS-based

MSPLS-based

GLRT 9 17 355

GLRT 1 9 4 21 1 1 1 5 8 25 7 5 23 1 341 3 10 25 9 71 84



17 17 30 80 9 65 −

33 51 775 −

193 21 9 12 68 210

Figure 4.39 Monitoring TEP IDV 2 fault using PLS-based Q chart.

4.2 Multiscale nonlinear latent variable-based generalized likelihood ratio test for fault detection 4.2.1 Introduction In section 4.1, we have demonstrated the effectiveness of the multiscale linear latent variable (LV)-generalized likelihood ratio test (GLRT) technique

114

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.40 Monitoring TEP IDV 2 fault using MSPLS-based Q chart.

Figure 4.41 Monitoring TEP IDV 2 fault using PLS-based GLRT chart.

over the classical LV-based GLRT methods. The developed fault detection algorithms provided optimal properties by maximizing the detection probability for a particular false alarm rate with different values of windows. However, most real systems are nonlinear, which make the linear LV method unable to tackle the issue of nonlinearity to a great extent. Also, from Chapter 3, the results showed the better detection performances of the nonlinear KPCA- and KPLS-based GLRT approaches when compared with linear techniques. Thus the first objective of this chapter is to propose an enhanced nonlinear modeling technique using multiscale kernel LV (MSKLV) (including MSKPCA and MSKPLS) that exploits the advantages of multiscale representation of the data and nonlinear KLVR model. Second, we extend the moving window-GLRT technique (developed in section 4.1) to one that utilizes exponential weights to residuals in the moving window (instead of equal weightage) as it might be able to further improve fault detection performance by reducing the false alarm rate using exponentially weighted moving average (EWMA). The developed

Multiscale LV- and KLV-based GLRT for fault detection

115

Figure 4.42 Monitoring TEP IDV 2 fault using MSPLS-based GLRT chart.

detection method, called EWMA-GLRT, provides improved properties, such as smaller missed detection and false alarm rates and a smaller average run length. The idea behind the developed EWMA-GLRT is computing a new GLRT statistic that integrates current and previous data information in a decreasing exponential fashion, giving more weight to the more recent data. This provides a more accurate estimation of the GLRT statistic and provides a stronger memory, which enables better decision making with respect to fault detection. Therefore, in this chapter, an MSKLV-based EWMA-GLRT method is developed and utilized in practice to improve fault detection in chemical processes. The idea behind an MSKLV-based EWMA-GLRT fault detection algorithm is combining the advantages brought forward by the proposed EWMA-GLRT fault detection chart with the MSKLV model. The third objective of this chapter is evaluating the proposed MSKLVbased EWMA-GLRT fault detection algorithm using one synthetic example and two well-known chemical processes (continuous stirred tank reactor process, CSTR, and Tennessee Eastman process, TEP). The detection effectiveness of the developed technique are evaluated using three fault detection criteria: the missed detection rate (MDR), the false alarm rate (FAR), and the out-of-control ARL1 . The rest of the chapter is organized as follows. In section 4.2.2, we present the proposed multiscale kernel PCA-based GLRT fault detection technique followed by its application to process monitoring. In section 4.2.3, we propose the developed multiscale kernel PLS-based GLRT for fault detection of chemical processes.

116

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.43 Representation of MSKPCA fault detection model.

4.2.2 Multiscale kernel PCA-based GLRT for fault detection 4.2.2.1 Multiscale kernel PCA description The linear multiscale PCA has been first developed in [20] by combining multiscale representation with linear PCA model. It has been applied basically to remove the long-term drift in the data set. The nonlinear extension (multiscale KPCA) has been proposed by Zhang and Hu [21]. The wavelet-based multiscale representation of data can help enhance the effectiveness of the KPCA. In this work, we propose to use a multiscale KPCA algorithm, which works similarly to MSPCA, proposed in [10]. However, for KPCA algorithm, where the training data, both online and quality variables, are decomposed in the feature space by Discrete wavelet transform (DWT), and KPCA model with statistical threshold is applied at each individual scale, and only important scale coefficients are selected to reconstruct the data (see Fig. 4.43). For fault detection decision, KPCA model is applied on global scale. The statistical thresholding at individual scale acts like a data filtering stage and increases the efficiency of KPCA technique. The MSKPCA algorithm is shown in Algorithm 5, and its representation model is shown in Fig. 4.43.

4.2.2.2 Multiscale kernel GLRT fault detection chart with applications Application 1: synthetic data

To illustrate the proposed multiscale generalized likelihood ratio test (MSGLRT), we consider the following nonlinear simulation example based on

Multiscale LV- and KLV-based GLRT for fault detection

117

Algorithm 5 MSKPCA algorithm. 1. Decompose X into coarse approximate scale and detail scales. 2. KPCA algorithm is applied at each scale to compute loading vectors and score vector of matrix X. 3. At each scale for X, the nonsignificant coefficients (lower than threshold values) are neglected. 4. KPCA algorithm is carried out on reconstructed X (after threshold).

six variables in X with N = 1000 samples. The monitored variables are described in different instants k by the following relations: ⎧ x1 (k) = v1 (k), v1 (k) ∼ N (0, 0.01), ⎪ ⎪ ⎪ ⎪ ⎪ x2 (k) = v2 (k), v2 (k) ∼ N (0, 0.01), ⎪ ⎪ ⎨ x (k) = sin(x (k)) + v (k), v (k) ∼ N (0, 0.01), 3 1 3 3 2 − 3x (k) + 4 + v (k), v (k) ∼ N (0, 0.01), ⎪ x ( k ) = x ( k ) 4 1 1 4 4 ⎪ ⎪ ⎪ 2 + cos(x (k)2 ) + 1 + v (k), v (k) ∼ N (0, 0.01), ⎪ x ( k ) = x ( k ) ⎪ 5 2 2 5 5 ⎪ ⎩ x6 (k) = x3 (k)2 + x3 (k)x4 (k) + x1 (k) + v6 (k), v6 (k) ∼ N (0, 0.05),

(4.9)

where vj (k), j = 1, . . . , 6, represent Gaussian noise with small variance added to the measurements. KPCA model is identified with  = 2 retained principal components. The X data are split into training and testing data sets of 500 observations each to carry out fault detection. After a process model has been successfully identified, we can proceed with fault detection. First, one fault of magnitude 3σ is simulated on variable x5 between samples 400 and 500. Second, two faults of magnitude 3σ are introduced in variable x5 between samples 400 and 500 and in variable x4 between samples 300 and 350. To evaluate the efficiency of the proposed fault detection technique, two metrics are used: the false alarms rate (FAR) and the miss detection rate (MDR). For the first scenario, Fig. 4.44 represents the time evolution of fault detection charts SPE, KGLRT, and MS-KGLRT, computed based on residuals in the feature space. For the second scenario with two faults, the proposed technique still provides better detection abilities in comparison to the KGLRT and SPE charts (see Fig. 4.45).

118

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.44 Evolutions of SPE, KGLRT, and MS-KGLRT in faulty case.

Figure 4.45 Evolutions of SPE, KGLRT, and MS-KGLRT in faulty case.

Table 4.13 FAR %, MDR %, and ARL1 for the presented fault detection charts (simulated example). Charts Case 1 Case 2 FAR % MDR % ARL1 FAR % MDR % ARL1 MS-KGLRT 1.44 0 1 2.33 0 1

KGLRT SPE

0 62

1.21 4.87

5 6

6.71 1.21

4.33 5.33

7 8

Both SPE and KGLRT charts fail to detect most of the fault (as seen in Figs. 4.44 and 4.45). However, the performance is further improved through the application of the MS-KGLRT chart. The performances of different fault detection charts are summarized in Table 4.13.

Multiscale LV- and KLV-based GLRT for fault detection

119

Figure 4.46 Evolutions of SPE, KGLRT, and MS-KGLRT in faulty case. Table 4.14 FAR %, MDR %, and ARL1 for the presented fault detection charts (CSTR process). ARL1 Charts FAR % MDR %

MS-KGLRT KGLRT SPE

0 5 3

1 19 3.21

1 1 1

Application 2: nonisothermal CSTR process

Next, the performance of the developed MS-KGLRT fault detection method is illustrated and compared to SPE and KGLRT charts. In the case study the sensor measuring the temperature inside the reactor is assumed to be faulty with bias fault. To illustrate the performances of the proposed approach, a sensor fault is introduced on variable x3 (temperature T) between samples 400 and 500. The magnitude of the fault represents 5σ of this variable. We can show from Fig. 4.46 that the MS-KGLRT chart shows good improvement in comparison to KGLRT and SPE charts in terms of FAR, MDR, and ARL1 values due to the advantages of multiscale representation. The missed detection rates (%), false alarm rates (%), and ARL1 values obtained using the three techniques are summarized in Table 4.14. Application 3: Tennessee Eastman Process

To prove the effectiveness of the proposed MS-KGLRT method against GLRT and SPE charts, it is necessary to be tested on highly nonlinear process data such as the Tennessee Eastman Process. The Eastman chemical company created the Tennessee Eastman Process, which has been commonly used for evaluating process control and fault detection. The TEP is widely detailed in the literature [13]. 1000 observations are used as train-

120

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 4.15 Summary of MDR, FAR, and ARL1 values for TEP data. Faults SPE KGLRT MS-KGLRT MDR FAR ARL1 MDR FAR ARL1 MDR FAR ARL1

IDV 1 IDV 2 IDV 3 IDV 4 IDV 5 IDV 6 IDV 7 IDV 8 IDV 9 IDV 10 IDV 11 IDV 12 IDV 13 IDV 14 IDV 15 IDV 16 IDV 17 IDV 18 IDV 19 IDV 20 IDV 21

0 0.37 64.39 63.83 7.5 1.07 1.42 3.61 65.64 6.64 41.00 1.92 0.83 40.81 92.23 44.12 0.39 6.33 60.38 22.08 62.5

7.01 2.63 3.26 5.07 4.09 3.16 2.27 9.18 9.15 6.11 7.65 4.78 5.74 6.30 3.74 1.70 5.69 9.31 11.94 1.92 11.93

1.0 1.0 5.0 5.0 5.0 4.0 5.0 2.0 4.0 1.0 1.0 1.0 2.0 2 2 3.0 1.0 2.0 1.0 1.0 3.0

34.19 81.43 70.02 66.51 9.28 84.94 20.71 19.40 72.34 22.87 43.16 7.37 31.38 41.42 96.11 52.21 43.70 77.37 70.04 32.08 66.07

6.57 2.19 2.17 3.98 3.18 1.81 2.27 6.63 4.92 5.67 6.30 3.19 3.44 4.95 2.38 1.70 5.69 6.45 6.48 1.53 6.08

1.0 1.0 7.0 5.0 5.0 4.0 5.0 2.0 4.0 1.0 1.0 1.0 2.0 2 2 3.0 1.0 2.0 1.0 4.0 7.0

0 0 45.40 57.14 0 0 0 0 55.86 11.80 51.79 0 0 96.73 88.34 39.68 22.04 3.61 81.15 13.33 42.85

0 0 0 2.89 3.63 4.07 0 12.24 1.40 3.93 4.50 0 3.44 3.60 0 0.85 4.06 6.81 5.80 0 7.20

1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1

ing data to construct the reference model, and 1000 observations are used as testing data. To evaluate the ability and capability of several monitoring process, a number of faults were introduced from observation 224 and are presented in [13]. The KPCA model formed utilizing the Tennessee Eastman process training data include Gaussian noise with standard deviation typical to the measurement type. Twenty-one testing sets were generated [13] using the preprogrammed faults (Faults 1–21). To further validate the developed MS-KGLRT fault detection method, it would need to be tested on process data from a plant. For each of the testing data sets, the fault starts at observation 224. The missed detection rates (%), false alarm rates (%), and ARL1 values obtained for all faults are summarized in Table 4.15. The results obtained using TEP data are consistent with those obtained from the example by utilizing simulated synthetic and CSTR data (sections on pages 116 and 119). The MS-KGLRT technique is able to provide significantly lower missed detection and false alarm rates (see Table 4.15) and ARL1 than KGLRT and SPE for most faults. These results are further illustrated in Figs. 4.47–4.51 for all TEP faults 1–21.

Multiscale LV- and KLV-based GLRT for fault detection

121

Figure 4.47 Monitoring faults (A) IDV1 , (B) IDV2 , (C) IDV3 and (D) IDV4 using SPE, KGLRT, and MS-KGLRT charts.

The MS-KGLRT shows further detection improvements in comparison to GLRT and SPE charts. Table 4.15 summarizes the monitoring performances of the three methods obtained for all faults. From this table, the recorded results using Tennessee Eastman process are coherent with those obtained from the two previous CSTR and simulated synthetic examples. It should be noted that the developed monitoring process returns a higher good detection rate in comparison to other methods for mostly faults, which can be explained by the powerless fault detection methods based on fixed reference model to monitor the nonlinear TEP.

122

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.48 Monitoring faults (A) IDV5 , (B) IDV6 , (C) IDV7 and (D) IDV8 using SPE, KGLRT, and MS-KGLRT charts.

4.2.3 Multiscale kernel PLS-based GLRT for fault detection 4.2.3.1 Multiscale Kernel Partial Least Square (KPLS) method The linear multiscale PLS has been first developed in [20] by combining multiscale representation with the linear PLS model. It has been applied basically to remove the long-term drift in the data set. The wavelet-based multiscale representation of data can help enhance the effectiveness of the

Multiscale LV- and KLV-based GLRT for fault detection

123

Figure 4.49 Monitoring faults (A) IDV9 , (B) IDV10 , (C) IDV11 and (D) IDV12 using SPE, KGLRT, and MS-KGLRT charts.

KPLS. In this work, we propose to use a multiscale KPLS algorithm, which works similarly to MSPCA, proposed in [10], but for KPLS algorithm, where the training data, both online and quality variables, are decomposed in the feature space by Discrete wavelet transform (DWT), and KPLS model with statistical threshold is applied at each individual scale, and only important scale coefficients are selected to reconstruct the data (see Fig. 4.52). For fault detection decision, KPLS model is applied on global scale. The statistical thresholding at individual scale acts like a data filtering stage and

124

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.50 Monitoring faults (A) IDV13 , (B) IDV14 , (C) IDV15 and (D) IDV16 using SPE, KGLRT, and MS-KGLRT charts.

increases the efficiency of KPLS technique. The MSKPLS algorithm is shown in Algorithm 6, and its representation model is shown in Fig. 4.52. The validation of the developed technique is done using a simulated example and a simulated CSTR process.

4.2.3.2 MSKPLS-based GLRT technique and applications The developed MSKPLS-based GLRT is addressed so that the KPLS model is constructed using the wavelet coefficients at different scales and the

Multiscale LV- and KLV-based GLRT for fault detection

125

Figure 4.51 Monitoring faults (A) IDV17 , (B) IDV18 , (C) IDV19 , (D) IDV20 and (E) IDV21 using SPE, KGLRT, and MS-KGLRT charts.

GLRT is applied to detect the faults based on the residual vector obtained from the MSKPLS model. The algorithm describing the proposed MSKPLS-based GLRT fault detection technique is presented in Algorithm 7. The effectiveness of the MSKPLS for fault detection purposes will be demonstrated through two illustrative examples using a simulated synthetic data set [18] and a simulated CSTR process.

126

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.52 Representation of MSKPLS fault detection model.

Algorithm 6 Multiscale kernel PLS algorithm. 1. Decompose X and Y into coarse approximate scale and detail scales. 2. KPLS algorithm is applied at each scale to compute loading vectors and score vector of matrices X and Y . 3. At each scale for X and Y , the nonsignificant coefficients (lower than threshold values) are neglected. 4. KPLS algorithm is carried out on reconstructed X and Y (after threshold).

Application 1: synthetic example

The simulated nonlinear synthetic example (4.10) is used to generate the responses of four-variable vector as functions of time xt = [xt1 , xt2 , xt3 , xt4 ] as follows: ⎧ ⎪ xt1 = [−2 × randn(200, 1); randn(300, 1); 3 × randn(500, 1)], ⎪ ⎪ ⎨ x = sin (0.1 : 0.1 : 100)T + cos (0.1 : 0.1 : 100)T , t2 ⎪ xt3 = .5 × x2t1 + .5 × xt2 , ⎪ ⎪ ⎩ x = .3 × x + .7 × x2 . t4 t1 t2

(4.10)

These simulated states are assumed to be noise free. Then they are contaminated with zero-mean Gaussian errors, that is, a measurement noise

Multiscale LV- and KLV-based GLRT for fault detection

127

Algorithm 7 MSKPLS-based GLRT algorithm. Split the data into training set and testing set. Training data: 1. Standardize the data to have zero mean and unit variance. 2. Decompose X and Y data into coarse approximate scale and detail scale after selecting the best decomposition depth. 3. Apply the KPLS at each scale. 4. Reconstruct the X and Y data with KPLS. 5. Construct the MSKPLS model. 6. Compute the GLRT statistic G. 7. Compute the GLRT threshold Gα . Testing data: 1. Standardize the data to have zero mean and unit variance. 2. Decompose X and Y data into coarse approximate scale and detail scale after selecting the best decomposition depth. 3. Apply the KPLS at each scale. 4. Reconstruct the X and Y data with KPLS model. 5. Compute the GLRT statistic K G. If the statistic G is under the threshold Gα , then the system is under normal conditions. Else, a fault is declared.

vk−1 ∼ N (0, σv2 ), so that ⎧ ⎪ X1 (t) = xt1 + σv × randn(size(xt1 )), ⎪ ⎪ ⎨ X (t) = x + σ × randn(size(x )), 2 t2 v t2 ⎪ X3 (t) = xt3 + σv × randn(size(xt3 )), ⎪ ⎪ ⎩ X (t) = x + σ × randn(size(x )). 4 t4 v t4

(4.11)

The generated input data were arranged as a matrix X (t) = [X1 (t), X2 (t), X3 (t), X4 (t)] having 1000 samples and four model variables. The responses of the four input variables X1 , X2 , X3 , and X4 are shown in Fig. 4.53. The output data matrix Y is generated as follows: Y (t) = X 2 (t).

(4.12)

These data are first scaled and then are used to construct the KPLS model.

128

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.53 The time evolution of the generated data X.

Figure 4.54 Time evolution of detection using PLS-based GLRT method.

In normal conditions the data set consists of 1000 observations. It is then split into 500 samples for training and 500 samples for testing. The fault detection performance of GLRT chart is illustrated for linear and nonlinear PLS models. The kernel parameter of the RBF function is set to η = 6. Using the CPV criterion, the number of retained kernel principal components is equal to 2, and the noise variance σv2 is fixed to 0.2. Here a fault (mean shift of size 1σ ) is introduced in X1 at [400 to 500] in the testing data set. Figs. 4.54–4.57 and Table 4.16 show the detection comparison between linear and nonlinear PLS-based GLRT methods. We can show from these results that both PLS-based and kernel PLS methods

Multiscale LV- and KLV-based GLRT for fault detection

129

Figure 4.55 Time evolution of detection using KPLS-based GLRT method.

Figure 4.56 Time evolution of detection using MSPLS-based GLRT method.

provide good detection results (Figs. 4.54 and 4.55). We can show also that the detection abilities based on linear PLS model result in a lot of false alarm and missed detection rates; this is because the PLS method assumes that the relationship between variables is linear. However, the simulated synthetic example (4.10) is nonlinear, which means that the linear PLS method is not able to efficiently tackle the issue of nonlinearity. To deal with this issue, the kernel PLS (KPLS) method is applied here for modeling purposes. Fig. 4.55 and Table 4.16 show the detection performances based on KPLS model. We can show the benefits of KPLS model on the detection performances over the linear PLS model (see Table 4.16) in terms of MDR, FAR, and ARL1 values.

130

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 4.57 Time evolution of detection using MSKPLS-based GLRT method. Table 4.16 Summary of missed detection rate (%), and ARL1 . Chart/Fault Detection Metric MDR (%) PLS-based GLRT 58.41 MSPLS-based GLRT 40.59 KPLS-based GLRT 18.81 0.99 MSKPLS-based GLRT

rate (%), false alarm FAR (%) 4.01 2.00 4.51 4.01

ARL1

1 2 1 2

From Figs. 4.56 and 4.57 and Table 4.16 we can illustrate the detection abilities based on multiscale representation. It can be shown that the use of multiscale nature of the data may reduce the false alarm and missed detection rates and enhance the monitoring quality. The wavelet-based multiscale representation of data can help enhance the effectiveness of the KPLS and PLS-based methods (Figs. 4.56 and 4.57). It can be shown also that the detection abilities based on nonlinear PLS methods (Figs. 4.55 and 4.57) present clear improvements in comparison to those based on linear PLS models (Figs. 4.54 and 4.56). Application 2: nonisothermal CSTR process

Next, the performance of the developed fault detection methods (PLS, MSPLS, KPLS, MSKPLS-based GLRT) is illustrated. In the case study the sensor measuring the concentration inside the reactor is assumed to be faulty with bias fault. To illustrate the performances of the proposed approach, a sensor fault is introduced on variable x4 (concentration CA ) between samples 400 and

Multiscale LV- and KLV-based GLRT for fault detection

131

Figure 4.58 Time evolution of detection using KPLS-based GLRT method.

Figure 4.59 Time evolution of detection using MSKPLS-based GLRT method.

500. The magnitude of the fault represents 5σ of this variable. When applying the MSKPLS to construct the model, it is necessary to choose the best decomposition depth to get good detection with lower MDR, FAR, and ARL1 . The best decomposition depth is equal to 3. The damage detection performances of KPLS- and MSKPLS-based GLRT methods are shown in Figs. 4.58 and 4.59 and Table 4.17. We can conclude from the monitoring results (Fig. 4.58 and Table 4.17) that the KPLS-based GLRT chart results in a bad monitoring abilities. To deal with more practical processes, a multiscale KPLS is applied in the modeling phase. Fig. 4.59 illustrates the detection performances and shows the benefits of multiscale representation when using MSKPLS-based GLRT method. This figure shows that MSKPLS-based GLRT method has a superior effectiveness over KPLS-based GLRT method (see Table 4.17). This

132

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 4.17 Summary of missed detection rate (%), false alarm rate (%), and ARL1 (CSTR process). Chart/Fault Detection Metric MDR (%) FAR (%) ARL1 KPLS-based GLRT 37.62 1.75 1 0 3.76 1 MSKPLS-based GLRT

is because the multiscale nature of the data provides a representation that can be made effective to noises and has a great impact on the detection performances.

4.2.4 Conclusion The application of multiscale kernel PCA (MSKPCA) and multiscale kernel PLS (MSKPLS) based GLRT to nonlinear fault detection in chemical processes has been presented. The objective of the developed MSKPCA- and MSKPLS-based GLRT is to use wavelet-based multiscale representation of data to further enhance the effectiveness of the MSKPCA- and MSKPLSbased GLRT fault detection methods. In the developed techniques, KPCA and KPLS models are constructed using the wavelet coefficients at different scales, and then GLRT is applied using these models to improve the fault detection abilities of the MSKPCA- and MSKPLS-based GLRT fault detection methods. Two illustrative applications are applied to evaluate the performance of the developed techniques. The first example used simulated continuously stirred tank reactor (CSTR) data, and the other used Tennessee Eastman Process (TEP) data. The fault detection results are assessed using the detection indicators: missed detection rates, false alarm rates, and ARL1 values. The developed MSKPCA- and MSKPLS-based GLRT techniques have shown improved fault detection with lower missed detection rates and ARL1 values in comparison to KPCA- and KPLS-based GLRT, and all they provide better fault detection results than the conventional techniques.

References [1] M. Nounou, H. Nounou, Reduced noise effect in nonlinear model estimation using multiscale representation, Modelling and Simulation in Engineering 2010 (2010) 217305. [2] M. Nounou, B. Bakshi, Online multiscale filtering of random and gross errors without process models, AIChE Journal 45 (5) (1999) 1041–1058. [3] M. Nounou, B. Bakshi, Multiscale methods for denoising and compression, in: B. Walczak (Ed.), Wavelets in Analytical Chemistry, Elsevier, Amsterdam, 2000, pp. 119–150.

Multiscale LV- and KLV-based GLRT for fault detection

133

[4] M. Nounou, H. Nounou, Multiscale fuzzy systems identification, Journal of Process Control 15 (2005) 763–770. [5] G. Strang, Wavelets and dilation equations, SIAM Review 31 (1989) 614–627. [6] I. Daubechies, Orthonormal bases for compactly supported wavelets, Communications on Pure and Applied Mathematics 41 (1988) 909–996. [7] S. Mallat, A theory of multiresolution signal decomposition: the wavelet representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 11 (7) (1989) 764. [8] M. Piovoso, K. Kosanovich, PCA of wavelet transformed process data for monitoring, Intelligent Journal of Data Analysis 1 (2) (1997) 85–99. [9] B. Bakshi, Multiscale PCA with application to multivariate statistical process monitoring, AIChE Journal 44 (1998) 1596–1610. [10] B.R. Bakshi, Multiscale PCA with application to multivariate statistical process monitoring, AIChE Journal 44 (7) (1998) 1596–1610. [11] M.R. Reynolds Jr, J. Lou, An evaluation of a GLR control chart for monitoring the process mean, Journal of Quality Technology 42 (3) (2010) 287. [12] D.C. Montgomery, G.C. Runger, Applied Statistics and Probability for Engineers, John Wiley & Sons, 2010. [13] J.J. Downs, E.F. Vogel, A plant-wide industrial process control problem, Computers & Chemical Engineering 17 (3) (1993) 245–255. [14] P.R. Lyman, C. Georgakis, Plant-wide control of the Tennessee Eastman problem, Computers & Chemical Engineering 19 (3) (1995) 321–331. [15] S. Yin, S.X. Ding, A. Haghani, H. Hao, P. Zhang, A comparison study of basic datadriven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process, Journal of Process Control 22 (9) (2012) 1567–1581. [16] T. Rato, M. Reis, E. Schmitt, M. Hubert, B. De Ketelaere, A systematic comparison of PCA-based statistical process monitoring methods for high-dimensional, time-dependent processes, AIChE Journal (2016). [17] D.X. Tien, K.-W. Lim, L. Jun, Comparative study of PCA approaches in process monitoring and fault detection, in: 30th Annual Conference of IEEE Industrial Electronics Society, vol. 3, IECON 2004, IEEE, 2004, pp. 2594–2599. [18] M. Mansouri, M. Nounou, H. Nounou, K. Nazmul, Kernel PCA-based GLRT for nonlinear fault detection of chemical processes, Journal of Loss Prevention in the Process Industries 26 (1) (2016) 129–139. [19] C. Botre, M. Mansouri, M. Nounou, H. Nounou, M.N. Karim, Kernel PLS-based GLRT method for fault detection of chemical processes, Journal of Loss Prevention in the Process Industries 43 (2016) 212–224. [20] P. Teppola, P. Minkkinen, Wavelet-PLS regression models for both exploratory data analysis and process monitoring, Journal of Chemometrics 14 (5–6) (2000) 383–399. [21] Y. Zhang, Z. Hu, Multivariate process monitoring and analysis based on multi-scale KPLS, Chemical Engineering Research and Design 89 (12) (2011) 2667–2678.

CHAPTER 5

Linear and nonlinear interval latent variable approaches for fault detection Contents 5.1. Interval latent variable approaches for fault detection 5.1.1 Introduction 5.1.2 Interval PCA-based GLRT for fault detection 5.1.2.1 Interval data description 5.1.2.2 Principal component analysis for interval-valued data 5.1.2.3 Interval-valued PCA model identification 5.1.2.4 Fault detection using complete information PCA-based GLRT 5.1.2.5 Complete information PCA-based GLRT and applications 5.1.2.6 Fault detection using midpoints radii PCA-based EWMA 5.1.2.7 Midpoints radii PCA-based EWMA and applications 5.1.3 Interval PLS-based GLRT for fault detection 5.1.3.1 Partial least squares for interval-valued data 5.1.3.2 Fault detection charts based on interval PLS 5.1.3.3 Fault detection using interval PLS-based GLRT 5.1.3.4 Interval PLS-based GLRT and applications 5.1.4 Conclusion 5.2. Interval nonlinear latent variable approaches for fault detection 5.2.1 Introduction 5.2.2 Interval kernel PCA-based GLRT for fault detection 5.2.2.1 Kernel PCA for interval-valued data (IKPCA) 5.2.2.2 Interval KPCA-based fault detection charts 5.2.2.3 Applications 5.2.3 Interval kernel PLS-based GLRT for fault detection 5.2.3.1 Kernel PLS for interval-valued data (IKPLS) 5.2.3.2 Interval KPLS-based fault detection charts 5.2.3.3 Interval KPLS-based GLRT and application 5.2.4 Conclusion References

136 136 137 137 140 143 145 148 158 162 170 170 172 174 175 185 188 188 190 190 191 198 204 204 207 210 214 215

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis https://doi.org/10.1016/B978-0-12-819164-4.00014-5

135

Copyright © 2020 Elsevier Inc. All rights reserved.

136

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

5.1 Interval latent variable approaches for fault detection 5.1.1 Introduction Latent variable (LV) process monitoring-based approaches like principal component analysis (PCA) and partial least squares (PLS) have gained wide applications in various industrial scenarios [1–4]. The application of multivariate charts for abnormal events detection has been developed mainly in the area of process monitoring [5–12]. The process monitoring scheme involves building a PCA model of the system under normal operating condition (NOC). The identification of the PCA/PLS model relies on estimating the process structure by using an eigendecomposition problem [13,4]. However, PCA/PLS-based monitoring methods have only been developed for single-valued data. These data are a result of simplification during data mining procedure and are imprecise. Principal component analysis (PCA)-based fault detection is a wellestablished data driven-approach, which has long been praised for its performances. However, data are often affected by different types of errors/uncertainties, including measurement noise, sensor imprecision, and variability of measured quantity. These uncertainties have a negative impact on the established PCA model and thus on the fault detection performances. For more precision in representing the real data, this uncertainty can be treated by considering an interval representation instead of a single-valued representation. In this case, the determination of PCA model requires using new techniques adapted for the interval-valued data. The first Interval Principal Component Analysis (IPCA) methods proposed were the centers and vertices methods by Cazes et al. [14]. The centers PCA (CPCA) method used the centers matrix of the input interval dataset to compute the principal components. Thus the centers method only utilizes the variations between intervals, ignoring the variation within the interval. Lauro and Palumbo [15] also proposed the symbolic object and range-transformation methods to eliminate some of the shortcomings in the vertices method. The symbolic object method introduces an additional Boolean transformation matrix, the purpose of which is to remove any interdependency between the vertices. However, this approach still suffers from the limitation of utilizing only the variance between the vertices, thereby ignoring some of the dataset’s internal variance. The chapter [16] proposes new techniques that take into consideration the internal structure of symbolic variables. The authors of [17] have proposed a three-way PCA of interval data to

Linear and nonlinear interval LV approaches for fault detection

137

extract the dynamic main features of Copper futures market to reduce the variable space dimension. A new interval PCA method with an enhanced covariance matrix calculation is proposed in [18] and is called the completeinformation principal component analysis (CIPCA). The authors of [19] introduced both interval centers and interval ranges; the new method is called the midpoints radii PCA (MRPCA) and is an enhancement of CPCA by including the radius of data. Regarding the interval-based regression methods, PLS is widely used in process monitoring [20]. However, to deal with uncertain process and interval-valued data, a new PLS model is required. An extension of the PLS approach is proposed in order to be applied for interval-valued data. The proposed approach, called the center PLS method consists of fitting a PLS regression on the center of intervals of variables on the training set and then applies the identified PLS model on the lower and upper bounds of the interval values of the independent variables to predict the interval values of the dependent variables. A regression model for interval-valued data was introduced by Billard and Diday [21]. Model is built on the center points of the intervals and then applied to interval independent variables to make prediction to the interval dependent variables [22]. The remainder of this chapter is organized as follows. In section 5.1.2, we present the developed interval PCA (IPCA)-based GLRT technique and its application to fault detection. In section 5.1.3, we present the proposed interval PLS-based GLRT and its application to monitoring chemical and environmental processes.

5.1.2 Interval PCA-based GLRT for fault detection 5.1.2.1 Interval data description The uncertainty, which is strictly connected to the measurement imprecision or process uncertainties, may be treated by considering data as interval-valued. The interval data formulation offers a way to represent the available information where uncertainty or variability must be taken into account. In reality the actual measurement value x∗j (k) of a variable can deviate from the measured one xcj (k). The measurement errors are defined as δ xj (k) = xcj (k) − x∗j (k). Hence, once a measurement xcj (k) is available, the actual (unknown) value x∗j (k) of the measured variable belongs to the interval 



x∗j (k) = xj (k) xj (k) , where xj (k) = xcj (k)−δ xj (k) and xj (k) = xcj (k)+δ xj (k).

138

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

An interval-valued variable (IVV) [Xj ] ⊂ R is represented by a set of values delimited by ordered couples: [Xj ] = {[xj (1)], [xj (2)], . . . , [xj (n)]}, where [xj (k)] ≡ [xj (k), xj (k)] for k ∈ [1, . . . , n] and xj (k) ≤ xj (k). The interval [xj (k)] can also be expressed by a couple {xcj (k), xrj (k)}, where 1 xcj (k) = (xj (k) + xj (k)) 2

(5.1)

1 xrj (k) = (xj (k) − xj (k)). 2

(5.2)

and

Typically, some standardization must be performed prior to processing data. Definition 5.1.1. Let us introduce two basic concepts, the mean interval and variance [23,24]. The mean interval [mj ] is defined as 1 [xj (k)]. n k

(5.3)

   2 1   c    xj (k) − mcj  + xrj (k) − mrj  , n k=1

(5.4)

[mj ] =

The variance is given by n

σ2 = 

mj =

where 1 n

n k=1

1 n

n k=1

n

1 n

xj (k)

k=1



xj (k) , mcj =

1 n

n k=1

xcj (k), and mrj =

xrj (k).

A standardized interval is given by

1 σ

  1      xcj (k) − mcj + xrj (k) − mrj  . σ

 

xcj (k) − mcj − xrj (k) − mrj  ,

(5.5)

Definition 5.1.2. Given any interval-valued variables 

Xj =





xj (1) xj (1)





xj (k) xj (k)

...



xj (n) xj (n)

...

T

and [Xi ] =



xi (1) xi (1)



 ...



xi (k) xi (k)

 ...

xi (n) xi (n)

T

, j = i,

Linear and nonlinear interval LV approaches for fault detection

139

the product of two intervals is defined as 





xj (k) xj (k)

xi (k) xi (k)

=



 min(s) max(s)

,

(5.6)



where s = xi (k)xj (k), xi (k)xj (k), xi (k)xj (k), xi (k)xj (k) [25], and the square of the interval is given by 

xi (k) xi (k)

2

 ⎧   ⎨ min (s1 ) max (s1 ) if 0 ∈/ xi (k) ,   = ⎩ otherwise, 0 max (s1 )



where s1 = xi (k)2 , xi (k)2

(5.7)



Definition 5.1.3. For an interval-valued variable [Xj ], the squared norm is given by n      2 2 Xj , Xj =  Xj  =  xj (k)  k=1  n  1 2 =3 xj (k) + xj (k)xj (k) + x2j (k) . k=1

Definition 5.1.4. Given any interval-valued variables [X1 ] , [X2 ] , . . . , [Xm ] of n observations and for all aj ∈ R, j = 1, . . . , m, define the interval-valued variable [Y ] as a linear function or combination of [X1 ] , [X2 ] , . . . , [Xm ], that is, m  [Y ] = aj Xj j=1      T y(1) y(1) y(n) y(n) = . ...

To avoid the problem that the predicted lower bound values of response variable are greater than the upper bound values, Moore’s linear combination algorithm [26] is adopted as follows: y(k) =

m  



aj τ xj (k) + (1 − τ ) xj (k) ,

j=1

y(k) =

m    aj (1 − τ ) xj (k) + τ xj (k) j=1

140

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

with  τ=

0 if aj ≤ 0, 1 otherwise.

Before presenting interval-valued data-based PCA methods, let us define an interval-valued data matrix. Let [X ] be an N × m data matrix. Then ⎛ ⎜ ⎜ [X ] = ⎜ ⎝

x1 (1), x1 (1)



. .  x1 (n), x1 (n)

. . . . . .



xm (1), xm (1)



⎟ . ⎟ ⎟, . ⎠  xm (n), xm (n)

(5.8)

where xj (k) ≤ xj (k) for all k = 1, 2, . . . , N and j = 1, 2, . . . , m. The majority of the available methods rely on a strategy called symbolicconventional-symbolic, which can be summarized as: 1. Transform an interval-valued data matrix to a numerical data matrix, 2. Apply PCA/PLS on a numerical data matrix, that is, determine numerical eigenvalues and eigenvectors, 3. Estimate the interval-valued principal component from numerical principal components.

5.1.2.2 Principal component analysis for interval-valued data Interval-valued data-based PCA is an active research area over the last ten years. Chouakria et al. [27,28] proposed an extended version of PCA to address interval-valued data by developing vertices principal component analysis (VPCA) and centers principal component analysis (CPCA). Other developed technique-based symbolic data like a symbolic sample covariance matrix [29–31] has been proposed for interval-valued data. Interval-valued data methods are commonly used to explain the total variances of a set of interval variables. The authors of [19] proposed an enhanced version of CPCA based on both interval centers and interval ranges, which is called the midpoints radii PCA (MRPCA). Another sophisticated interval PCA method, which uses an enhanced covariance matrix calculation, is proposed in [18] and is called the complete-information PCA (CIPCA). However, the VPCA approach suffers from the high computational complexity. Therefore, in the current work, we propose to use the CIPCA and MRPCA methods to address the problem of modeling for process monitoring purposes. Next, MRPCA and CIPCA methods are presented.

Linear and nonlinear interval LV approaches for fault detection

141

Midpoints radii PCA method Midpoints radii PCA (MRPCA) is one of the most known interval datadriven technique used for process modeling. First, an MRPCA model is built off-line using fault-free data. From the obtained MRPCA model, interval residuals are generated and used for process monitoring purposes. Problems with statistical analysis of interval data using standard interval arithmetic can be avoided by representing them using interval midpoints and ranges. The midpoints radii PCA for interval-valued data, introduced in [32,19], is a hybrid method, which is an improvement of CPCA by including radius. MRPCA is resolved in terms of midranges (X c ) and midpoints (X r ), given in Eqs. (5.97) and (5.98), and their interconnection. According to MRPCA [19], two independent PCAs are applied to these two matrices. The solutions are given by the following eigensystems: X c  −1 P c = c P c , X r  −1 P r

= r P r ,

(5.9) (5.10)

where c , P c and r , P r are, respectively, the eigenvalues and eigenvectors of the two partial eigendecompositions of midpoints and midrange matrices, and  is the covariance matrix given by          = X c T X c + X r T X r + X c T X r  +  X r T X c  .

(5.11)

To get a logical graphical representation of the statistical units based on MRPCA model, the rotated radii coordinates are superimposed on the midpoint PCs as supplementary points, which can be achieved by maximizing the Tucker congruence coefficient between midpoints and radii [23] or by using a rotation matrix A = QP T [19], given the following singular value decomposition: X cT X r = P cr QT .

(5.12)

Let Cc = Pc PcT and Cr = Pr PrT be the PCA models for centers and ranges data matrix, respectively. Interval-valued estimations based on the MRPCA model are then given by 

xc (k) = Cc xc (k), xr (k) = Cr xr (k),

(5.13)

142

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

and



xˆ (k) = xˆ c − Axˆ r , xˆ (k) = xˆ c + Axˆ r .

(5.14)

Complete information PCA method The complete information PCA (CIPCA) method has been developed in [18] to extend the classical PCA to deal with interval-valued data and extract more information within interval measurements. According to CIPCA [18], given any two interval-valued variables [Xj ] and [Xj ], the inner product is defined as 

n    [Xj ], [Xj ] = [xj (k)], [xj (k)] ,



(5.15)

k=1

where     1 [xj (k)], [xj (k)] = xj (k) + xj (k) xj (k) + xj (k) .

(5.16)

4







2

In the case of autocorrelation given by [Xj ], [Xj ] , the inner product [Xj ] for interval-valued data is defined as n      [Xj ]2 = [xj (k)]2 ,

(5.17)

k=1

where     [xj (k)]2 = 1 x 2 (k) + x (k)xj (k) + xj 2 (k) . j j

3

(5.18)

Based on these definitions of the interval norm and inner product and with all data units preprocessed, the covariance matrix  of X ∈ Rn×m is given by ⎛

[X1 ] , [X1 ] [X1 ] , [X2 ]

⎜ [ 1 ⎜ X2 ] , [X1 ] [X2 ] , [X2 ]

= ⎜ .. .. n⎜ . . ⎝ [Xm ] , [X1 ] [Xm ] , [X2 ]

··· ··· .. . ···

⎞ [X1 ] , [Xm ]

⎟ [X2 ] , [Xm ] ⎟ ⎟. .. ⎟ . ⎠ [Xm ] , [Xm ]

(5.19)

The determination of the interval principal components [T ] in CIPCA method is based on a linear combination algorithm for interval-valued variables first developed by Moore [26].

Linear and nonlinear interval LV approaches for fault detection

143

The CIPCA method starts by computing the covariance matrix  of the interval data matrix [X ] using Eq. (5.19) and then performs an eigendecomposition, where λ1 , . . . , λm and p1 , . . . , pm are the resulting eigenvalues and eigenvectors respectively. The interval-valued principal components, using Moore’s rule [26], are given by ⎧ m   ⎪ ⎪ ⎨tj (k) = pij τ xi (k) + (1 − τ ) xi (k) , i=1

m   ⎪ ⎪ ⎩tj (k) = pij (1 − τ ) xi (k) + τ xi (k) ,

(5.20)

i=1

with  τ=

0, pij ≤ 0, 1, pij ≥ 0.

The interval-valued estimations from CIPCA model are given by ⎧   m ⎪ ⎪ C qj τ xq (k) + (1 − τ ) xq (k) , ⎨xˆ j (k) = q=1   m ⎪ ⎪ ˆ x ( k ) = C 1 − τ x ( k ) + τ x ( k ) , ) ( ⎩ j  qj q q

(5.21)

q=1

with the same condition on τ , given that C = P P T .

5.1.2.3 Interval-valued PCA model identification A key issue to identify a PCA model is selecting the adequate number of principal components [33–35]. The number of retained principal components  has a significant impact on each step of the process modeling and monitoring scheme. Qin and Dunia [36,33,35] proposed to determine this parameter  by minimizing the variance of reconstruction error. This method will be retained here to determine  in the case of interval-valued data-based PCA model. First of all, let us present the reconstruction approach and its extension to deal with interval-valued data. Then a new criterion, based on variance of interval reconstruction errors, will be proposed to select the number of retained principal components. The reconstruction approach used generally in the case of PCA-based single-valued data consists of estimating a variable using the PCA model and the other process variables. The reconstruction accuracy is related to the ability of the PCA model to reveal the redundancy relations among variables [34,35].

144

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Here we introduce an extension of the variable reconstruction-based interval PCA method using the projection matrix G(i) for the first  PCs. The reconstructed measurement [zi (k)] is given by the relations 







zi (k) zi (k) = gTi x(k)

(5.22)

with G(i) = 



 

T

 . . . gi

ξ1



. . . ξm





c−Ti 0 c+Ti

, gTi =

T

1 − cii 





 ,

(5.23) 

x(k) = x1 (k) . . . xi (k) . . . xm (k) , and xi (k) = xi (k) xi (k) . By applying Moore’s rule, the reconstruction can be expressed as ⎧ ⎪ ⎪ ⎨zi (k) = ⎪ ⎪ ⎩zi (k) =

m q=1,gi,q 0

xq (k)gi,q , (5.24) xq (k)gi,q .

This expression can be also rewritten as ⎧   m ⎪ ⎪ gi,q τ xq (k) + (1 − τ ) xq (k) , ⎨zi (k) = q=1   m ⎪ ⎪ gi,q (1 − τ ) xq (k) + τ xq (k) , ⎩zi (k) =

(5.25)

q=1

where gi,q is the qth element of the vector gi , and  τ=

0 for gi,q < 0, 1 for gi,q ≥ 0.

Once the interval reconstruction [zi (k)] = [zi (k), zi (k)] is obtained for the ith variable and for a fixed number of components , the interval reconstruction error [eir (k)] can be calculated as [eir (k)] = [xi (k)] − [zi (k)],

(5.26)

where [xi (k)] is the ith interval-valued data sample at time k, [zi (k)] is its interval-valued reconstruction given by the interval PCA model, and [eir (k)] = [eir − (k), eir + (k)]. Afterward, a calculation of the variance for the

145

Linear and nonlinear interval LV approaches for fault detection

interval reconstruction error (VIRE) for each variable is performed as: 2 1   r [ei (k)] , n k=1 n

ρi () =

(5.27)

where   r   [e (k)]2 = 1 (er )2 (k) + er (k)er (k) + (er )2 (k) . i i i i i

3

(5.28)

The variance of the interval reconstruction error VIRE for all variables and for a given number of retained components  is defined as VIRE() =

m 

(5.29)

ρi ().

i=1

The determination of the number of components can be formulated by the following optimization problem with respect to the number of PCs : opt = min VIRE () .

(5.30)



The number of principal components to be kept in the interval PCA model is the value of  that minimizes VIRE(). After process modeling under normal operating conditions, the obtained model is used for process monitoring.

5.1.2.4 Fault detection using complete information PCA-based GLRT The obtained PCA model describes normal process behavior, and unusual events are then detected by referencing the observed behavior against this model. Once the interval-valued data PCA model is identified, which means that the number of retained principal components  is determined, interval residuals can be generated for fault detection using interval arithmetic [26]. From Eq. (5.21) residual can be generated in the measurement space [36–39]. Let us consider the interval residual vector 







e(k) = e1 (k), . . . , em (k)   = x(k) − xˆ (k) ,

[ej (k)] = [ej (k) ej (k)], j = 1, . . . ., m.

(5.31)

(5.32)

146

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

From Eq. (5.20) residuals can also be generated by taking the m −  last principal components, which span the residual subspace [36–39]. Let us consider the qth interval residual, 







tq (k) = pTq x(k) , q =  + 1, . . . ., m,

(5.33)

[tq (k)] = [tq (k) tq (k)], q =  + 1, . . . ., m.

(5.34)

The GLR will be applied to residuals generated from the interval PCA model. From Eq. (2.14) and residuals equations (5.31) and (5.33), several cases can be considered. Next, we will present two versions of GLR for interval-valued data.

Univariate interval weighted GLR chart From Eq. (2.14), let E = ej for j = 1, . . . , m. Then the formulation of the hypothesis testing problem, for single-valued data can be written as 

H0 = ej ∼ N (0, 1)} H1 = ej ∼ N (θ, 1)

(null hypothesis), (alternative one),

(5.35)

and the univariate statistic chart is given by G(ej ) = ej2 ∼ χ12,α ,

j = 1, . . . , m.

(5.36)

From Eq. (5.35), an extension of the GLR to deal with interval-valued  ej for j = data, univariate interval GLR, can be developed. Let E = ej 1, . . . , m, 

UIGLR(k) =

e2j (k) e2j (k)

for j = 1, . . . , m.

(5.37)

Another formulation of the GLR can be expressed using the ignored principal components. From Eq. (2.14), let E = tj for j =  + 1, . . . , m with corresponding variance λj as presented in Eq. (2.2). Then the formulation of the hypothesis testing problem for single-valued data can be written as 

H0 = tj ∼ N (0, λj )} (null hypothesis), H1 = tj ∼ N (θ, λj ) (alternative one),

(5.38)

Linear and nonlinear interval LV approaches for fault detection

147

and the univariate weighted statistic chart is given by G(tj ) =

1 λj

tj2 ∼ χ12,α ,

j =  + 1, . . . , m.

(5.39)

For the interval-valued data, Univariate Weighted GLR-based [t˜] can  tj and σj2 = λj for j =  + 1, . . . , m. Then be derived. Let E = tj UIWGLR(k) =

⎧ ⎪ ⎨ ⎪ ⎩



2

tj (k)



λj

for j =  + 1, . . . , m.

2

tj (k)

(5.40)

λj

Multivariate interval weighted GLR chart From Eq. (2.14), let E = e ∈ Rm with equal variances. Then the formulation of the hypothesis testing problem can be written as 

H0 = {e ∼ N (0, Im )} H1 = {t˜ ∼ N (θ, Im )

(null hypothesis), (alternative one),

(5.41)

and the multivariate statistic chart is given by G(e) = e 22 ∼ χm2 ,α .

(5.42)

For interval-valued data, let E = [e]. Then the multivariate interval GLR can be expressed as 



2 MIGLR(k) =  e(k) 

=

m    ej (k) 2 ,

(5.43)

j=1

where the interval norm is given by      ej (k) 2 = 1 e2 (k) + e (k)ej (k) + e2 (k) . j j j

3

(5.44)

Now consider the case where E = t˜ ∈ Rm− with corresponding vari˜ m− as presented in Eq. (2.2). Then the formulation of the ance matrix  hypothesis testing problem can be written as 

˜ m− )} H0 = {t˜ ∼ N (0,  ˜ ˜ H1 = {t ∼ N (θ, m− )

(null hypothesis), (alternative one),

(5.45)

148

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

and the multivariate statistic chart is given by −1/2

˜ m− 22 ∼ χm2 −,α . G(t˜) = t˜

(5.46)

An extension of this result to interval-valued data can be formulated. Let E = [t˜] be the interval-valued residuals vector. Then the multivariate interval weighted GLR can be expressed as  2    m   t!q (k) 2 =  λq  ,

1/2 ˜− MIWGLR(k) =  t(k) m− 

(5.47)

q=+1

⎛$ " # %2 %2 ⎞ $  t (k) 2 1 tq (k) tq (k)tq (k) t q   q ⎠. + + !  = ⎝ !  !  3 λq λq  λq λq

(5.48)

Under the assumption that the N samples are independent and the joint distribution of the m variables is multivariate normal, the MIWGLR statistic follows a chi-squared (χ 2 ) distribution [39]. The MIWGLR control limit (threshold) Gα can be computed from its approximate distribution, based on Box’s formula [40], as Gα = gχh2,α ,

(5.49)

2

where g = 2ab , h = 2ab , and a and b are, respectively, the estimated mean and variance of the MGLR. An abnormal situation or fault is detected if MIWGLR(k) > Gα .

(5.50)

5.1.2.5 Complete information PCA-based GLRT and applications Numerical example

The efficiency of the proposed scheme is first verified through a numerical example. Consider the following linear simulation example based on 7 variables j = 1, . . . , 7 and N = 1000 measurements. The monitored variables are

Linear and nonlinear interval LV approaches for fault detection

149

described in different instants k by the following relations: ei (k) ∼ N (0, 0.02), x1 (k) = u1 (k) + e1 (k), x2 (k) = u2 (k), x3 (k) = x2 (k) + e3 (k), x4 (k) = 2x1 (k) + x3 (k) + e4 (k), x5 (k) = x2 (k) + x3 (k) + e5 (k), x6 (k) = 2x1 (k) + x2 (k) + e6 (k), x7 (k) = x1 (k) + 2x3 (k),

(5.51)

where u(k) are random generated variables, and e1 –e6 are independent Gaussian noise N (0, 0.02). As training data, samples are generated under normal conditions. To obtain the interval-valued data matrix [X ], a variation δ xj (k) = 1%, j = 1, . . . , 7, which simulates the presence of uncertainties, is added to each variable. Hence the construction of intervals is given by 







xj (k) = xj (k) − δ xj (k), xj (k) + δ xj (k) .

Fig. 5.1 shows the time evolution of interval-valued variables [x1 ], [x4 ], and [x7 ] of the simulation example. Before applying PCA modeling, data are standardized. The reconstruction-based criterion (VIRE) is then used to select the number of principal components to be kept in the interval PCA model. The proposed criterion is based on the minimization of the variance of the interval-valued reconstruction error. Results are illustrated in Table 5.1, where the selected number of principal components is chosen to be  = 2. The selected  corresponds to the number of components that gives the best reconstruction. Fig. 5.2 shows time evolution of measurements and their estimations given by the CIPCA model. From this figure it is clear that the obtained CIPCA model gives good estimations for upper and lower bounds of the simulated data. For process monitoring, a bias fault is introduced on variable x3 from sample 700 to 1000, interval residuals are generated, and then GLR and WGLR charts are computed. Figs. 5.3–5.6 show the time evolution of the indices GLR and WGLR for multivariate and univariate cases, respectively. From those figures it is clear that the fault is detected, but with several false alarms in nonfaulty zone when using the classical univariate and multivariate GLR charts. Table 5.2 shows the detection performances of the different techniques in

150

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.1 Time evolution of simulated interval-valued data.

Table 5.1 VIRE of different interval-valued for CIPCA model.  1 2 3 4 ρ1 0.725 0.013 0.013 0.013 ρ2 0.335 0.019 0.032 0.050 ρ3 0.330 0.014 0.016 0.015 ρ4 0.394 0.011 0.013 0.014 ρ5 0.332 0.012 0.012 0.352 ρ6 0.396 0.013 0.015 0.017 ρ7 0.014 0.011 0.010 0.009 m i=1

ρi ()

2.528

0.095

0.114

0.473

5

6

0.013 0.147 0.012 0.057 7.156 0.037 0.010

0.039 – 0.011 – 32.151 1888.131 0.009

7.435



terms of false alarm rate (FAR), missed detection rate (MDR), and Average Run Length (ARL1 ). It is clear that the proposed charts UIWGR and MIWGLR give the best performance results compared to the classical GLR-based charts.

Linear and nonlinear interval LV approaches for fault detection

151

Figure 5.2 Measurements and estimations of interval-valued data using CIPCA model.

Figure 5.3 Time evolution of univariate interval GLR.

Distillation column benchmark

The presented FD strategy has been tested in a simulated distillation column process. More descriptions on this application are presented in Appendix.

152

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.4 Time evolution of univariate interval weighted GLR.

Figure 5.5 Time evolution of multivariate GLR.

Figure 5.6 Time evolution of multivariate interval weighted GLR.

Linear and nonlinear interval LV approaches for fault detection

153

Table 5.2 FAR, MDR, and ARL1 . Detection Charts Univariate charts Multivariate charts UIGLR UIWGLR MIGLR MIWGLR

FAR % MDR % ARL1

11.58 17.94 2

8.58 0 1

5.57 68.77 4

4 0 1

Figure 5.7 Distillation column interval-valued measurements.

Fig. 5.7 shows the time evolution of column interval-valued data. From those data the interval PCA model with  = 6 retained principal components is derived. Fig. 5.8 chows the evolution of the VIRE criterion with respect to the number of principal components. In Fig. 5.9 the evolution of some variables and their estimation based on interval PCA model is presented, which shows that the measurements are well estimated and demonstrates the accuracy of the identified model. To illustrate the performances of the proposed approach, a sensor fault is introduced on variable x3 (feed temperature TF ) between samples 1000 and 2000. The magnitude of the fault represents 2% of the range of variation of this variable. Figs. 5.10, 5.11, 5.12, and 5.13 present the detection abilities of the classical and Weighted GLR for interval data. We can show from Figs. 5.10 and 5.11 that the developed interval univariate Weighted GLR provides a good detection performances in terms of MDR and FAR when

154

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.8 Evolution of the VIRE with respect to the number of principal components .

Figure 5.9 Measurements and estimations.

compared to the univariate classical one. This is because the proposed interval chart takes into consideration the variance amount of each chart. We get the same conclusions in a multivariate case. Figs. 5.12 and 5.13 present the monitoring results using multivariate GLR and weighted GLR charts. Table 5.3 shows the detection performances of the different techniques in terms of FAR, MDR, and ARL1 . The classical interval GLR (see Fig. 5.12) is able to detect the fault between the samples 1000 and 2000 but results in a high missed detection rate 64.23, whereas the proposed chart can detect the fault with zero missed detection rate (see Fig. 5.13).

Linear and nonlinear interval LV approaches for fault detection

Figure 5.10 Time evolution of univariate interval GLR.

Figure 5.11 Time evolution of univariate interval weighted GLR.

Figure 5.12 Time evolution of multivariate interval GLR.

155

156

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.13 Time evolution of multivariate interval weighted GLR. Table 5.3 FAR, MDR rates, and ARL1 for the distillation column. Detection Charts Univariate charts Multivariate charts UIGLR UWIGLR MIGLR MIWGLR

FAR % MDR % ARL1

16.01 10.78 1

12.31 0 1

6 64.23 33

5 0 1

Air quality monitoring network

In this work, we considered six measurement stations. The data matrix X contains 18 variables, x1 to x18 , corresponding, respectively, to ozone concentrations O3 and nitrogen dioxide (NO2 and NO) of each station. For process monitoring, 1500 observations are used, which correspond to collected data during two weeks. Description of this application is presented in Appendix. It should be noted that each concentration measurement is taken as a mean of several measurements over 15 minutes (sample time). To evaluate tendency and variability of pollutants concentrations, more relevant information can be taken into account by considering the minimum and maximum concentrations, recorded over 15 minutes or by considering the sensor measure precision. This means that data will be transformed from single-valued to interval-valued data. Fig. 5.14 shows the time evolution of ozone concentrations in singlevalued and interval-valued data. The interval-valued CIPCA model is identified, and the number of principal components kept in the PCA model is  = 5 as shown in Fig. 5.15.

Linear and nonlinear interval LV approaches for fault detection

157

Figure 5.14 Ozone concentrations for single-valued and interval-valued data.

Figure 5.15 Evolution of VIRE with respect to the number of principal components.

Time evolution of pollutant concentrations and their estimations obtained from the CIPCA model are illustrated in Figs. 5.16 and 5.17, respectively. Using the identified CIPCA model, the interval-valued measurements are well estimated. To illustrate the performances of the proposed fault detection and isolation approach, a sensor fault is introduced on variable x7 (O3 of the third station) between samples 1000 and 1500. The magnitude of the fault represents 30% of the range of variation of this variable. Figs. 5.18, 5.19, 5.20, and 5.21 show time evolution of UIGLR, UIWGLER, MIGLR, and MIWGLR, respectively. From those figures it is clear that the fault is not detected and the classical univariate and multivariate GLR charts present several false alarms in

158

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.16 Measurements and estimations of O3 station 1.

Figure 5.17 Measurements and estimations of O3 station 3.

nonfaulty zone. Table 5.4 shows the detection performances of the different techniques in terms of FAR, MDR, and ARL1 . It is clear that the proposed charts UIWGR and MIWGLR give the best performance results compared to the classical GLR.

5.1.2.6 Fault detection using midpoints radii PCA-based EWMA The MRPCA presented in section on page 141 will be applied in the proposed MRPCA-based EWMA technique to perform the modeling phase,

Linear and nonlinear interval LV approaches for fault detection

159

Figure 5.18 Time evolution of univariate interval GLR with a fault on x7 .

Figure 5.19 Time evolution of univariate interval weighted GLR with a fault on x7 .

Figure 5.20 Time evolution of multivariate interval GLR with a fault on x7 .

whereas the EWMA chart will be used to detect the fault. The EWMA chart description will be presented in the next section.

160

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.21 Time evolution of multivariate interval weighted GLR with a fault on x7 . Table 5.4 FAR, MDR, and ARL1 for air quality data. Detection Charts Univariate charts Multivariate charts UIGLR UIWGLR MIGLR MIWGLR

FAR % MDR % ARL1

16.01 31.93 4

24.82 0 1

3.80 86.62 159

3.40 8.78 1

Exponentially weighted moving average based single-valued data The EWMA chart was established by Roberts in 1959 and called the Geometric Moving Average (GMA) chart [41]. Later, the GMA chart became popularly referred as the EWMA chart [42]. Like CUSUM chart [43], the EWMA chart is capable of detecting smaller faults shifts in the mean if compared to Shewhart chart [44]. The single-valued EWMA-based statistic Z may be calculated using [45]: Zi = λXi + (1 − λ)Zi−1 , i = 1, . . . , N ,

(5.52)

where λ denotes a smoothing parameter between 0 and 1, which changes the memory of the detection statistic, Xi is the value of the ith individual observation. The initial value Z0 is set equal to process in-control mean or target value μ0 . The EWMA statistic Z detects a fault in the process when Zi exceeds the control limits. The control limits (UCL, upper control limit; LCL, lower control limit) for the EWMA control chart may be calculated as [46] &

UCL = μ0 + L σ

λ [1 − (1 − λ)2k ], 2−λ

(5.53)

Linear and nonlinear interval LV approaches for fault detection

&

LCL = μ0 − L σ

λ

2−λ

[1 − (1 − λ)2k ],

161

(5.54)

where L represents the control width of the EWMA chart, and σ is the in-control standard deviation of X. At steady state, [1 − (1 − λ)2i ] becomes unity, whereas steady-state values will be rewritten as [46] &

UCL = μ0 + L σ &

LCL = μ0 − L σ

λ , 2−λ

(5.55)

λ . 2−λ

(5.56)

When the EWMA statistic value is between the control limits under null hypothesis, there is no fault, and if EWMA statistic exceeds the threshold value, fault is declared in the system.

Exponentially weighted moving average-based interval-valued data In interval-valued EWMA, the interval residuals e(k) and e(k) can be obtained using MRPCA model as 

e(k) = x(k) − xˆ (k), e(k) = x(k) − xˆ (k).

(5.57)

Furthermore, a method of calculating the EWMA statistic for intervalvalued data can be achieved using the interval residuals as in classical case given by (5.52), thus yielding an interval with an upper Z (k) and a lower bound Z (k), corresponding respectively to the upper and lower bounds of the calculated residuals as 

Z (k) = λr− (k) + (1 − λ)Z (k − 1), Z (k) = λr(k) + (1 − λ)Z (k − 1).

(5.58)

The corresponding control limits (UCL, upper control limit of lower chart; LCL, lower control limit of lower chart; UCL, upper control limit of upper chart; LCL, lower control limit of upper chart) for the interval EWMA control chart can be computed as [46] &

UCL = μ0 + L σ

λ [1 − (1 − λ)2k ], 2−λ

(5.59)

162

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

&

LCL = μ0 − L σ

λ

[1 − (1 − λ)2k ],

(5.60)

[1 − (1 − λ)2k ],

(5.61)

λ [1 − (1 − λ)2k ], 2−λ

(5.62)

2−λ &

UCL = μ0 + L σ

λ

2−λ

and &

LCL = μ0 − L σ

where μ0 and μ0 are the means of lower and upper bounds of the EWMA chart, respectively, and σ and σ are the in-control standard deviations of lower and upper bounds of X, respectively. At steady state, [1 − (1 − λ)2i ] simplifies to unity, and the following steady-state values are obtained [46]: &

UCL = μ0 + L σ &

LCL = μ0 − L σ &

UCL = μ0 + L σ

λ , 2−λ

(5.63)

λ , 2−λ

(5.64)

λ

,

(5.65)

λ . 2−λ

(5.66)

2−λ

and &

LCL = μ0 − L σ

Next, the developed MRPCA-based EWMA technique is validated through two examples: the first one using a simulated example and the second one using an air quality monitoring network.

5.1.2.7 Midpoints radii PCA-based EWMA and applications Simulation example

The efficiency of the proposed MRPCA-based EWMA scheme is first validated through a numerical example. Consider the following simulation

Linear and nonlinear interval LV approaches for fault detection

163

Figure 5.22 Time evolution of simulated data.

example based on 7 variables and N = 1000 measurements. The monitored variables are described in different instants k by the following relations: x1 (k) = u1 (k) + e1 (k), ei (k) ∼ N (0, 0.05), x2 (k) = u2 (k), x3 (k) = x2 (k) + e3 (k), x4 (k) = 2x1 (k) + x3 (k) + e4 (k), x5 (k) = x2 (k) + x3 (k) + e5 (k), x6 (k) = 2x1 (k) + x2 (k) + e6 (k), x7 (k) = x1 (k) + 2x3 (k),

(5.67)

where u(k) are independent random generated variables, and e1 –e6 are independent Gaussian noise N (0, 0.05). As training data, samples are generated under normal conditions. To obtain the interval-valued data matrix [X ], a variation δ xj (k), j = 1, . . . , 7, which simulates the presence of uncertainties, is added to each variable. Hence the construction of intervals is given by 







xj (k) = xj (k) − δ xj (k), xj (k) + δ xj (k) .

Fig. 5.22 shows the time evolution of interval-valued variables [x1 ], [x4 ], and [x7 ] of the simulation example. Before applying MRPCA modeling, data are scaled to zero mean and unit variance. The mean square estimation error (MSE) criterion is used

164

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 5.5 MSE using CPCA, CIPCA, and MRPCA models. Method Mean Squares Error (MSE)

CPCA CIPCA MRPCA

MSE1 0.013 0.013 0.009

MSE2 0.039 0.039 0.038

MSE3 0.025 0.025 0.024

MSE4 0.020 0.020 0.020

MSE5 0.020 0.020 0.019

MSE6 0.026 0.026 0.026

MSE7 0.030 0.030 0.030

to select the best model, which will be next applied for monitoring purposes. The MSE criterion measures the estimation distance between the interval-valued fault-free data and the estimated one. It is expressed as MSEj =

N 2 1   ej (k)  , j = 1, . . . , m, N k=1

(5.68)

where      ej (k) 2 = 1 e2 (k) + e (k)ej (k) + e2 (k) . j j j

3

(5.69)

The proposed criterion is based on the minimization of the estimation error using the three interval models: CPCA, CIPCA, and MRPCA. Results are illustrated in Table 5.5, which shows the estimated MSE values using the CPCA, CIPCA, and MRPCA methods. The results show that the MRPCA approach provides the lowest estimation MSE (best modeling). For the rest of the chapter, the MRPCA technique will be applied for modeling purposes, and the EWMA chart will be used to detect faults in cases of interval-valued data. The detection and monitoring performances of MRPCA-based EWMA will be compared with MRPCA-based Shewhart, MRPCA-based GLRT, and MRPCA-based SPE techniques. To test the performances of the presented MRPCA-based EWMA in terms of fault detection, two faults are simulated of variables x2 from sample 300 to 500 and x3 from sample 700 to 800. The EWMA chart control width L and the smoothing parameter λ are fixed to 3 and 0.95 respectively. Figs. 5.23, 5.24, 5.25, and 5.26 show the time evolution of MRPCA-based EWMA, MRPCA-based Shewhart, MRPCA-based GLRT, and MRPCAbased SPE, respectively, for lower bond LB and upper bound UB. From those figures it is clear that MRPCA-based EWMA presents the best detection performances. Table 5.6 gives a summary of false alarm (FA) and missed detection (MD) rates and ARL1 values for the four approaches.

Linear and nonlinear interval LV approaches for fault detection

165

Figure 5.23 The time evolution of the MRPCA-based SPE statistic in the presence of faults in x2 and x3 .

Figure 5.24 The time evolution of the MRPCA-based EWMA statistic in the presence of faults in x2 and x3 .

Air quality monitoring network

In this work, six measurement stations were considered. The data matrix X contains 18 variables, x1 to x18 , corresponding, respectively, to ozone concentrations O3 and nitrogen dioxide (NO2 and NO) of each station. For process monitoring, 1080 observations are used. For example, Figs. 5.27 and 5.28, respectively, present measurements and estimations of pollutant concentrations O3 for stations 1 and 3, the estimations being given by MRPCA method. Station 1 is a peri-urban

166

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.25 The time evolution of the MRPCA-based GLRT statistic in the presence of faults in x2 and x3 .

Figure 5.26 The time evolution of the MRPCA-based EWMA statistic in the presence of faults in x2 and x3 .

Table 5.6 Summary of missed detection (MDR %), false (FAR%) and ARL1 . Chart/Fault Detection Metric MDR (%) FAR (%) MRPCA-based SPE 14.5695 11.6071 MRPCA-based Shewhart 10.9272 1.7857 0 66.96436 MRPCA-based GLRT 1.6026 0 MRPCA-based EWMA

alarms ARL1

2 1 1 1

Linear and nonlinear interval LV approaches for fault detection

167

Figure 5.27 Measurements and estimations of O3 station 1.

Figure 5.28 Measurements and estimations of O3 station 3.

station, which has the highest ozone levels, and Station 3 behaves like the others peri-urban stations. Using the identified MRPCA model, the interval-valued measurements are well estimated. To illustrate the performances of the proposed fault detection approach, a sensor fault is introduced on variable x7 (O3 of the third station) between samples 1500 and 2000. The magnitude of the fault represents 30% of the range of variation of this variable. Figs. 5.29, 5.30, 5.31, and 5.32 show the fault detection results of the MRPCA-based SPE, MRPCA-based Shewhart, MRPCA-based GLRT, and MRPCA-based EWMA techniques.

168

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.29 The time evolution of the MRPCA-based SPE statistic in the presence of faults in O3 .

Figure 5.30 The time evolution of the MRPCA-based EWMA statistic in the presence of faults in x2 and x3 .

We can show from these figures that SPE, Shewhart, and EWMA charts could detect the fault in the ozone with high false alarm and missed detection rates. The detection results show also that the MRPCA-based EWMA technique shows a good detection rate for fault in ozone O3 (see Fig. 5.32). Table 5.7 presents comparison of the monitoring performances between the four charts. We can show the detection effectiveness of MRPCA-based EWMA technique in terms of false alarm (FA), missed detection (MD) rates, and ARL1 values.

Linear and nonlinear interval LV approaches for fault detection

169

Figure 5.31 The time evolution of the MRPCA-based GLRT statistic in the presence of faults in O3 .

Figure 5.32 The time evolution of the MRPCA-based EWMA statistic in the presence of faults in O3 .

Table 5.7 Summary of missed detection (MDR %), false alarms (FAR %), and ARL1 . Chart/Fault Detection Metric MDR (%) FAR (%) ARL1 MRPCA-based SPE 79.5 6.61 3 9.92 68.87 1 MRPCA-based Shewhart 12.33 3.96 1 MRPCA-based GLRT MRPCA-based EWMA 0.71 3.26 1

170

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

5.1.3 Interval PLS-based GLRT for fault detection 5.1.3.1 Partial least squares for interval-valued data Assume that [X1 ], . . . , [Xm ] are m independent interval-valued variables and [Y1 ], . . . , [YM ] are M dependent interval-valued variables. ⎡ ⎢ ⎢ ⎢ [X ] = ⎢ ⎢ ⎣

[x1 (1)] . . . [x1 (n)]

... . . . ...

[xm (1)] . . . [xm (n)]

⎡ 

⎤ ⎥ ⎥ ⎥ ⎥, ⎥ ⎦

⎢ ⎢ ⎢ [Y ] = ⎢ ⎢ ⎣

y1 (1)



. . .  y1 (n)

... . . . ...



yM (1)



. . .  yM (n)

⎥ ⎥ ⎥ ⎥. ⎥ ⎦

(5.70) Let xcj (k) and ycl (k), k = 1, . . . ., n, j = 1, . . . , m, l = 1, . . . , M, be the center points for interval-valued data. Let the kth observed values of [Xj ] be [xj (k)] = [xj (k), xj (k)], and observed values of [Yl ] be [y (k), yl (k)]. Hence, l xcj (k) = (xj (k) + xj (k))/2,

k = 1, 2, . . . , n.

(5.71)

ycl (k) = (yl (k) + yl (k))/2, Then matrices X c and Y c are given by ⎡ ⎢ ⎢ ⎢ Xc = ⎢ ⎢ ⎣

xc1 (1) . . . xm (1)c . . . c x1 (n)

. . . ...

. . . c xm (n)

⎤ ⎥ ⎥ ⎥ ⎥, ⎥ ⎦

⎡ ⎢ ⎢ ⎢ Yc = ⎢ ⎢ ⎣

yc1 (1) . . . ycM (1) . . . c y1 (n)

. . . ...

. . . c yM (n)

⎤ ⎥ ⎥ ⎥ ⎥ . (5.72) ⎥ ⎦

The fitted linear regression model then is Y c = X c Bc + ε c ,

(5.73)

where B ∈ m×M is the matrix of regression parameters. The PLS estimator ˆ respectively. ˆ and C, of B and C are B

Linear and nonlinear interval LV approaches for fault detection



171

ˆ Then the lower and upper  bounds of interval data estimation X = Xˆ Xˆ and Yˆ = Yˆ Yˆ are given by    Xˆ = Xˆ





ˆ = [X ] C =

(5.74) 



X X

   Yˆ = Yˆ



ˆ, C



ˆ = [X ] B =

(5.75) 



X X

ˆ. B

By taking into account the following property for interval valued-data: 



x x

γ

 ⎧  ⎨ γx γx if γ > 0,  =  ⎩ γx γx if γ < 0,

(5.76)

where γ ∈ R is a real scalar, the estimations can be rewritten explicitly as ⎧ ⎪ ⎪ xˆ (k) = ⎪ ⎨ j ⎪ ⎪ ⎪ ⎩ xˆ j (k) = ⎧ ⎪ ⎪ yˆ = ⎪ ⎨ j ⎪ ⎪ ⎪ ⎩ yˆ j =

m i=1, Cˆ ij 0

ˆ ij , xi (k)C

(5.77)

(5.78)

Let [x(k)] and [y(k)] be new interval measurement vectors at time k. Their estimations given by the CPLS model are given by:   xˆ (k) = Cˆ x(k) ,   ˆ x(k) , yˆ (k) = B

(5.79)

172

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

and then the corresponding estimation errors can be generated, using interval arithmetic [25], as    e (k) = x(k) −  xˆ (k) ,  x  ey (k) = y(k) − yˆ (k) ,

(5.80)

where 



ex (k) = =



ex (k) ex (k)

 



ey (k) = =





x(k) − xˆ (k)

 

 

x(k) − xˆ (k)

(5.81) ,





ey (k) ey (k)

 

y(k) − yˆ (k)

 

y(k) − yˆ (k)

 

(5.82) .

5.1.3.2 Fault detection charts based on interval PLS The obtained interval PLS model describes normal process behavior, and unusual events are then detected by referencing the observed behavior against this model. From Eq. (5.81) and (5.82) it is clear that we have to evaluate two sets of interval residuals. Based on those residuals, several indices can be computed, and to reduce the number of equations and simplify the presentation, the index x or y in those equations will be replaced by •. In this case, Eqs. (5.81) and (5.82) can be represented by only one equation as 



e• (k) =





where e• (k) = e• (k) =

 

 



e1,• (k)





...

e• (k) e• (k) 

ej,• (k)



...

e1,• (k) . . . ej,• (k) . . . em,• (k)

(5.83)

, 

em,• (k)

T T

T

,

, and

e• (k) = e1,• (k) . . . ej,• (k) . . . em,• (k) . This notation is adopted in the rest of the chapter. From those primary interval residuals [ex ] and [ey ] several fault detection indices can be derived. The most used multivariate index for fault detection is the Q statistic, also called the squared prediction error (SPE), defined for single-valued

Linear and nonlinear interval LV approaches for fault detection

173

data as [47,6,7,48,11] 

2

Q(k) = e(k) = eT (k)e(k) =

m

j=1

(5.84)

ej2 (k).

From the definition of the Q statistic based on single-valued data approaches and its different expressions given by Eq. (5.84), several indices can be derived for interval-valued data.

Norms of upper and lower bounds of residuals This chart computes two classical single-valued Q charts using lower and upper bounds of interval residuals, respectively. It is defined as Q1,• = =

   e • 2 "  j



e• 2 # 2  2 ej,• (k) ej,• (k) .

(5.85)

j

Sum of squares of interval residuals This chart represents another form to compute a quadratic form for interval-valued residuals and is defined as [49,38], Q2,• =



  2

ej,• k

⎧j        ⎪ if 0 ∈/ ej,• (k) , min sj max sj ⎨ = j  ⎪ ⎩ 0 max s  otherwise, j

(5.86)



where sj = e2j,• (k), e2j,• (k) . Furthermore, the two fault detection indices Q1,• and Q2,• for intervalvalued data, proposed in [49,38], are computed from upper and lower bounds of the interval residuals, thus yielding an interval index with upper and lower bounds. Ait-Izem et al. propose to use the SPE control limit presented in [50] and extend it to interval data based on Box’s quadratic form approximation [40]. Hence the limits for the corresponding indices can be computed based on their respective estimate mean a and estimate variance b: ηα = gindex χh2index ,α ,

(5.87)

174

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

where g = b/2a and h = 2a2 /b. Note that the thresholds for the presented indices are calculated for each bound of the statistic. However, the computed thresholds (upper and lower bound) tend to be equal due to the symmetry of bounds. So, in the case of those two fault detection indices Q1,• and Q2,• , only one control limit (threshold) will be computed for each index and then used for fault detection.

Product of interval residuals A global interval residual can be computed as a product of interval residual vectors: 





T 



Q3,• (k) = e• (k )  e• (k ) ej,• (k) ej,• (k) , =

(5.88)

j

where 





ej,• (k) ej,• (k) =



Q3,• (k) Q3,• (k)

 ,









Q3,• (k) = min e2j,• (k), ej,• (k)ej,• (k), e2j,• (k) , Q3,• (k) = max e2j,• (k), ej,• (k)ej,• (k), e2j,• (k) .

(5.89)

(5.90)

The result is an interval residual, and the fault is detected if 0 ∈/ Q3,• (k) Q3,• (k) .



Interval norm of residuals The interval norm of residual index Q4,• is defined as [51,52] 

2

Q4,• (k) = e• (k) =

1 3

m 

j=1



e2j,• (k) + ej,• (k)ej,• (k) + e2j,• (k) .

(5.91)

5.1.3.3 Fault detection using interval PLS-based GLRT To deal with interval-valued data, we will derive the interval GLRT and apply to the interval residuals computed from the interval PLS method.

Linear and nonlinear interval LV approaches for fault detection

175

Let us consider Eq. (2.14) and E = [e• (k)]. The interval GLR (I G) in this case is given by I G• (k) = =

m 1 3



 m   ej,• (k) 2  σj,• 

j=1



e2j,• (k)+ej,• (k)ej,• (k)+e2j,• (k) σj2

j=1

(5.92) ,

where σj2,• is the variance of the jth interval residual given by Eq. (5.4). The control limits Gα for the described fault detection charts, that is, I G• and Q4,• , can be computed from their corresponding approximate distribution as [51,52] Gα = gχh2,α .

(5.93)

This control limit is based on Box’s equation [40]. Considering that a is the estimated mean of the fault detection chart and b is its estimated variance, we note that g=

b , 2a

h=

2a2 . b

(5.94)

Therefore, the proposed detection approach combines the benefits of PLS and IGLRT: PLS is used for modeling, and IGLRT is used for fault detection. To effectively assess the performance of the developed technique, it is necessary to examine its performance using a synthetic example and an actual benchmark process as well.

5.1.3.4 Interval PLS-based GLRT and applications Synthetic data

At this stage, to illustrate the proposed partial least squares (PLS)-based interval generalized likelihood ratio test (GLRT), we consider the following simulation example based on seven variables with N = 500 samples. The monitored variables are described in different instants k by the following

176

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

relations: ⎧ x1 (k) = 0.4v1 (k) − 1.3 sin(k/2) + ε1 (k), v1 (k) ∼ N (0, σ 2 ), ⎪ ⎪ ⎪ ⎪ x2 (k) = v2 (k) − 2 cos(k/4) + ε2 (k), v2 (k) ∼ N (0, σ 2 ), ⎪ ⎪ ⎪ ⎪ ( v ( k )− 1 ) 3 ⎪ + ε3 (k), v3 (k) ∼ N (0, σ 2 ), ⎨ x3 (k) = cos(k/3)e x4 (k) = x1 (k) + x2 (k) + ε4 (k), ⎪ ⎪ ⎪ x5 (k) = x2 (k) + x3 (k) + ε5 (k), ⎪ ⎪ ⎪ ⎪ x6 (k) = 2x1 (k) + x3 (k) + ε6 (k), ⎪ ⎪ ⎩ x7 (k) = x4 (k) + 2x5 (k) + ε7 (k),

(5.95)

where εj (k), j = 1, . . . , 7, represent a Gaussian noise with small variance added to the measurements. To generate interval-valued data [x], a variation δ xj , j = 1, . . . , 7, is added to each variable; δ xj is represented by 10% of the variation range of the corresponding variable xj . Hence the construction of intervals is given by 







xj (k) = xj (k) − δ xj (k), xj (k) + δ xj (k) .  

















X (k) = x1 (k) , . . . , x5 (k) , Y (k) = x6 (k)



x7 (k) .

(5.96)

Fig. 5.33 shows time evolution of interval-valued data of the simulated example. CPLS model is identified with  = 2 retained latent variables. Scatter plots of the measured and predicted variables y1 and y2 are presented, respectively, in Figs. 5.34 and 5.35. Those plots indicate a good performance of the identified PLS model. After a process model has been successfully identified, we can proceed with fault detection. Two faults are simulated on variable x3 between samples 120 and 130 and on variable x7 (y2 ) between samples 300 and 340. To quantify the efficiency of the proposed interval fault detection indices, we used two metrics, the false alarms rate (FAR) and the miss detection rate (MDR). The FAR is the number of normal observations that are wrongly judged as faulty (false alarms) over the total number of fault-free samples. The MDR is the number of faulty samples that are wrongly considered as normal (missed detections) over the total number of faulty samples. Figs. 5.36, 5.38, 5.40, 5.42, and 5.44 represent the time evolution of fault detection indices Q1,x , Q2,x , Q3,x , Q4,x , and IGLRx , respectively, computed based on the interval residuals [ex ]. Figs. 5.37, 5.39, 5.41, 5.43, and 5.45 represent the time evolution of fault detection indices Q1,y , Q2,y , Q3,y ,

Linear and nonlinear interval LV approaches for fault detection

177

Figure 5.33 Time evolution of interval-valued simulated variables.

Figure 5.34 Scatter plots of predicted and observed training data y1 .

Q4,y , and IGLRy , respectively, computed based on the interval residuals [ey ]. The performances of the different fault detection charts are summarized in Table 5.8. From those results it should be noted that the index Q3,• presents a false alarm rate FAR = 0 because the evaluation does not need any control limit estimation as in the other cases of fault detection charts, where an approximation of the chart distribution is needed, and a probabilistic control limit is computed with certain confidence level. However, this chart

178

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.35 Scatter plots of predicted and observed training data y2 .

Figure 5.36 Evolution of Q1,x in both fault-free and faulty cases.

presents a fairly high rate of missed detection MDR = 68.88 for the index Q4,x (respectively, MDR = 35.24 for Q4,y ). The fault detection chart Q1,• (respectively, Q2,• ) is formed by quadratic upper and lower bounds as expressed in Eq. (5.85) (respectively, Eq. (5.86)). During the training phase, two control limits are computed for each chart corresponding to upper and lower values. The computed control limits are very close, and only one control limit is used the fault detection. The detection charts Q1,• and Q2,• present the same rates of false alarms and missed detection with FAR = 62% and MDR = 3.86 when the indices

Linear and nonlinear interval LV approaches for fault detection

179

Figure 5.37 Evolution of Q1,y in both fault-free and faulty cases.

Figure 5.38 Evolution of Q2,x in both fault-free and faulty cases.

are computed from the interval residuals [ex ]. This is due to the expression of those indices, given by Eqs. (5.85) and (5.86), which are very close. At each sample time, each chart has two values, upper and lower, but in quadratic forms, and those bounds are j e2• and j e2• . It should be noted that the IGLR• index is a weighted Q4,• , where weights are inverse of interval residuals variances. The index Q4,y presents

180

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.39 Evolution of Q2,y in both fault-free and faulty cases.

Figure 5.40 Evolution of Q3,x in both fault-free and faulty cases.

similar performance to the IGLRy index because variances of interval residuals [ey ] are close to each other, but IGLRx presents a better performance than Q4,x . This is due to the fact that the Q4,• takes advantage of the squared prediction error statistic computed in residual subspace as the interval norm of residual components. When the variances of interval residuals are quite different from one another, the detection ability of the Q4,• visibly declines. To accurately monitor faults, the IGLR• is proposed to take into account

Linear and nonlinear interval LV approaches for fault detection

181

Figure 5.41 Evolution of Q3,y in both fault-free and faulty cases.

Figure 5.42 Evolution of Q4,x in both fault-free and faulty cases.

the difference in variances between residuals. In this simulated example the interval residuals [ey ] have comparable variances, and this is why the performances of Q4,y are close to those of IGLRy . In conclusion, from those results the proposed IGLR• gives in general the best results in terms of FAR and MDR rates compared to the other indices. To comply with this observation, we will apply the proposed fault detection approach to a distillation column benchmark.

182

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.43 Evolution of Q4,y in both fault-free and faulty cases.

Figure 5.44 Evolution of IGx in both fault-free and faulty cases.

Distillation column benchmark

In this section, we tested the fault detection strategy presented in a simulated distillation column process (see Appendix). Fig. 5.46 shows the time evolution of column interval-valued data.

Linear and nonlinear interval LV approaches for fault detection

183

Figure 5.45 Evolution of IGy in both fault-free and faulty cases. Table 5.8 FAR % and MDR % for the presented fault detection charts (simulated example). Charts X Y FAR % MDR % FAR % MDR %

Q1,• Q2,• Q3,• Q4,• I G•

62 62.03 0 29.26 3.44

4.87 3.86 68.88 3.86 0

6.13 6 0 5.37 4.95

1.21 2 35.24 0 0

The output matrix contains two variables Y = [x6 x7 ] (dependent variables), and the remaining variables form the input data matrix X (independent variables). From those data we derive the interval CPLS model with  = 6 retained latent variables. For fault detection, we tested two scenarios. In the first scenario, a fault is introduced on the variable x2 (input variable) between sample times 1000 and 2000. Figs. 5.47, 5.48, 5.49, 5.50, and 5.51 show, respectively, the time evolution of fault detection indices Q1,• , Q2,• , Q3,• , Q4,• , and IGLR•. The performances of the different fault detection indices in terms of FAR and MDR are presented in Table 5.9.

184

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.46 Distillation column interval-valued measurements.

Figure 5.47 Evolution of Q1,x and Q1,y with a fault on variable x2 .

From those results it is clear that indices IGLRx and Q4,x present the best performances. However, the index IGLRx presents slightly better performances than Q4,x . The second scenario consists of introducing a fault on variable y2 (output variable). Here only indices Q4,• and IGLR• are presented as depicted in Figs. 5.52 and 5.53, respectively. FAR and MDR results in this case are illustrated in Table 5.10. The index IGLRy gives the best performances compared to the other indices with MDR = 0 and FAR = 5.3.

Linear and nonlinear interval LV approaches for fault detection

185

Figure 5.48 Evolution of Q2,x and Q2,y with a fault on variable x2 .

Figure 5.49 Evolution of Q3,x and Q3,y with a fault on variable x2 .

In conclusion, the proposed interval fault detection index IGLR• gives similar performances in terms of FAR and MDR when the interval residuals have comparable variances. However, when the variances of interval residuals are different from one another, the detection ability of the Q4,• visibly declines, and the proposed fault detection index gives the better performances compared to the other presented fault detection indices.

5.1.4 Conclusion In this chapter, we proposed a new fault detection technique for monitoring chemical processes based on interval latent variables (LV) and

186

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.50 Evolution of Q4,x and Q4,y with a fault on variable x2 .

Figure 5.51 Evolution of IGLRx and IGLRy with a fault on variable x2 .

generalized likelihood ratio test (GLRT). The proposed interval LV (including PCA and PLS)-based GLRT approaches deal with the problem of uncertainties in the systems by using PCA and PLS models based on interval-valued data. The developed interval PCA and PLS models are applied to generate the interval residuals to use later for fault detection. The idea behind the novel approaches is to widen their applicability for

Linear and nonlinear interval LV approaches for fault detection

187

Table 5.9 FAR % and MDR % for the presented fault detection charts (scenario 1 of distillation column). Charts X Y FAR % MDR % FAR % MDR %

Q1,• Q2,• Q3,• Q4,• IGLR•

4.40 4.40 0 4.80 4.90

79.02 79.02 72.12 2.10 0

3.40 3.40 0.40 4.50 5.30

92.10 92.10 100 92.00 91.60

Figure 5.52 Evolution of Q4,x and Q4,y with a fault on variable y2 .

processes represented by interval-valued data. This helps provide a more accurate modeling of the uncertain systems and then provide a more effective way that enables better decision making with respect to fault detection. Three examples are used to evaluate the fault detection performances of the proposed approaches. The first one is a simulated example, the second one is a distillation column benchmark, and the third one is an air quality monitoring network. The detection abilities of the proposed techniques are evaluated in terms of missed detection and false alarms rates.

188

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.53 Evolution of IGLRx and IGLRy with a fault on variable y2 . Table 5.10 FAR % and MDR % for the presented fault detection charts (scenario 2 of distillation column). Charts X Y FAR % MDR % FAR % MDR %

Q1,• Q2,• Q3,• Q4,• IGLR•

4.40 4.40 0 4.80 4.90

92.20 92.20 100 92.30 92.30

3.40 3.40 0.40 4.50 5.30

94.80 94.80 73.82 34.36 0

5.2 Interval nonlinear latent variable approaches for fault detection 5.2.1 Introduction Data-driven process monitoring-based approaches like latent variables (LV) (including PCA and PLS) have gained wide applications in various industrial processes [2,4]. The commonly used statistics T 2 and Q for fault detection have been thoroughly studied [5–9,11,12]. The basic LV modeling process was in greater detail investigated in [11]. However, the application of LV-based modeling and monitoring to nonlinear processes may lead to

Linear and nonlinear interval LV approaches for fault detection

189

inefficient and unreliable process monitoring, because the linear LV is inappropriate to describe the nonlinear behavior of the process [53]. To cope with this problem, nonlinear LV (like kernel PCA and kernel PLS) models have been developed [54–58]. Kernel LV (KLV) has already shown better performance than LV in several fields, especially in the field of fault diagnosis [59–62]. However, diagnosis methods based on multivariate statistical approaches have only been developed for single-valued data [62]. The single-valued data representation is a result of simplification during data mining procedure. The need for interval-valued data may arise in connection with imprecision of measurement devices, process uncertainties, or with data fluctuations in the case of recorded measures during a specific interval of time. In fact, considering the minimum and maximum recorded values offers a more complete insight about the measured phenomenon than considering the average values. So, for more precision, data uncertainty can be represented by considering an interval-valued data representation. There are many cases in which measurements can be represented by interval-valued data. For example, in air quality data (O3 , NO2 , and NO), where a measurement sample is taken as a mean of several measurements over 15 minutes (sample time), the minimum and maximum concentrations, recorded over 15 minutes, represent a more accurate information to evaluate variability of pollutants concentrations. For this propose, several linear LV models for interval-valued data are proposed in the literature [27,19,63,31]. However, in the field of statistical process monitoring, few works have dealt with this problem. Recently, a PCA-based process monitoring approach for interval-valued data is proposed [51,52]. PLS has been first proposed by Wold [64]. It has been successfully applied in several research areas such as process modeling, fault detection, and process monitoring and deals with highly and noisy correlated data. Geladi and Kowalski [65] presented a more detailed description and tutorial with some applications of the PLS method. However, in several situations the process is uncertain, and the available information is formalized in terms of intervals. To enhance the detection abilities for interval-valued data, the authors of [66,67] proposed to use exponentially weighted moving average (EWMA) and generalized likelihood ratio test (GLRT) control charts. However, the developed techniques addressed only the linear process models and assumed that the relationships between variables is linear. This poses a limitation on the applicability of those methods to deal with nonlin-

190

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

ear relationships. Nonlinear interval-valued data-based LV approaches (like kernel PCA and kernel PLS) have been introduced for classification problems [68–70]. Therefore, in the current chapter, we propose to extend the techniques developed in the previous section to address the problem of nonlinear uncertain process modeling and monitoring. To do that, GLRT approaches based on interval kernel PCA and interval kernel PLS are proposed for chemical process monitoring. The detection effectiveness of the developed interval approaches is evaluated through two examples, synthetic data and Tennessee Eastman process benchmark. The fault detection results are evaluated in terms of false alarm (FAR) and missed detection (MDR) rates. The remainder of this chapter is organized as follows. In section 5.2.2, we present the developed interval KPCA (IKPCA)-based GLRT technique and its application to fault detection. Section 5.2.3 presents the proposed interval kernel PLS-based GLRT and its application to monitoring chemical processes.

5.2.2 Interval kernel PCA-based GLRT for fault detection 5.2.2.1 Kernel PCA for interval-valued data (IKPCA) Based on the developed kernel PCA approach for single-valued data (see section 3.1), in this section, we propose an interval kernel PCA for intervalvalued data. Interval-valued data offer a way to represent the available information where uncertainty or variability must be taken into account. Next, we introduce two intervals KPCAs (IKPCAs). The first one is the bivariate center and range IKPCA method, which consists of fitting a KPCA model on the new numerical matrix formed by the concatenation of center and range of interval matrix. The second one is the upper and lower IKPCA method, which consists of fitting two KPCA models on the lower and upper bounds of an interval-valued data matrix. An interval-valued variable [Xj ] ⊂ R is represented by a set of values delimited by ordered couples of bounds referred as minimum and maximum [30]: [Xj ] = {[xj (1)], [xj (2)], . . . , [xj (n)]}, where [xj (k)] = [xj (k), xj (k)], k ∈ {1, . . . , N }, and xj (k) ≤ xj (k). The generic interval [xj (k)] can be also expressed by a couple {xcj (k), xrj (k)}, where 1 xcj (k) = (xj (k) + xj (k)) 2

(5.97)

Linear and nonlinear interval LV approaches for fault detection

191

and 1 xrj (k) = (xj (k) − xj (k)). 2

(5.98)

Before presenting interval-valued data-based KPCA methods, let us define an interval-valued data matrix. Let [X] be an N × m data matrix. Then ⎛ 

 ⎞ . . xm (1), xm (1) ⎟  ⎜ . . . ⎜ ⎟ X =⎜ ⎟ . . . ⎝ ⎠   x1 (N ), x1 (N ) . . xm (N ), xm (N )

x1 (1), x1 (1)



(5.99)

= ([x1 ], . . . , [xN ])T ,

where xj (k) ≤ xj (k) for all k = 1, 2, . . . , N and j = 1, 2, . . . , m, and [xk ] =   [x1 (k), x1 (k)], . . . , [xm (k), xm (k)] . The proposed method steps can be summarized as follows: 1. Transform interval-valued data matrix on a numerical data matrix. 2. Apply KPCA on a numerical data matrix, that is, determine numerical eigenvalues and eigenvectors. 3. Compute the fault detection indices in the feature space.

5.2.2.2 Interval KPCA-based fault detection charts Interval centers and ranges (IKPCACR ) chart

The Bivariate Center and Range (BCR) method was introduced in linear interval-valued data regression [71–73], where the center and range variables are used together to build a regression model simultaneously. In the proposed KPCA approach for interval-valued data, a new data matrix is formed by the concatenation of center and range data matrices, and a KCPA method is applied to this new data matrix. The new data matrix XCR can be expressed as 



XCR = X c X r ,

(5.100)

where X c = [xc1 , xc2 , . . . , xcN ]T ∈ RN ×m and X r = [xr1 , xr2 , . . . , xrN ]T ∈ RN ×m are the center and range matrices, respectively.

192

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Let xCR = [xc xr ] ∈ R2m be a new data sample. The kernel matrix KCR -based center and range method is given by ⎡

T φCR ... ,1 φCR,1 ⎢ . ⎢ T =⎢ ... KCR = XCR XCR . ⎢ ⎢ ⎣ . φCR,N T φCR,1 . . . ⎡ k(xCR,1 , xCR,1 ) ⎢ . ⎢ ⎢ =⎢ . ⎢ ⎣ . k(xCR,N , xCR,1 )



T φCR ,1 φCR,N . . . T φCR ,N φCR,N

...

⎥ ⎥ ⎥ ⎥ ⎥ ⎦

k(xCR,1 , xCR,N )

. ... . . . . . k(xCR,N , xCR,N )

⎤ ⎥ ⎥ ⎥ ⎥, ⎥ ⎦

(5.101) 

T

k(xCR ) = k(xCR,1 , xCR ), . . . , k(xCR,N , xCR )

,

(5.102)

and −1 T XCR α CR , vCR = λCR

(5.103)

−1 T T α PCR = λCR,1 −1 XCR CR,1 , . . . , λCR, XCR α CR, ],

(5.104)



where α CR is the eigenvector of the matrix KCR , and λCR is its corresponding eigenvalue. 2 and SPE From this model the fault detection indices TCR CR are computed in the feature space: −1 T 2 = k(x T TCR CR ) PCR CR PCR k(xCR ),

(5.105)

SPECR = k(xCR , xCR ) − kT (xCR )CCR k(xCR ),

(5.106)





T . where φCR = φ(xCR ), CR = λCR,1 , . . . , λCR, , and CCR = PCR PCR 2 2 spe CR 2 > τ T CR or SPE , where ταT CR and A fault is detected if TCR CR > τα α spe 2 and SPE τα CR represent the control limits of TCR CR , respectively. Those control limits are computed using Eqs. (3.19) and (3.20).

Linear and nonlinear interval LV approaches for fault detection

193

Interval upper and lower bounds (IKPCAUL ) chart

Let [xj (k)] = [xj (k) xj (k)] be a bounded closed interval with lower bound xj (k) and upper bound xj (k), where j = 1, . . . , m and k = 1, . . . , N. Let   xi = x1 (i), . . . , xm (i) and xi = (x1 (i), . . . , xm (i)). For interval-valued kernel PCA, two single-valued KPCA models are built based on the lower and upper bounds of interval-valued data. The kernel matrix for lower bound data is given by ⎡ ⎢ ⎢ ⎢ K =X X =⎢ ⎢ ⎣ T

⎡ ⎢ ⎢ ⎢ =⎢ ⎢ ⎣

φ T1 φ 1 . . . φ TN φ 1



φ T1 φ N . ... . . . . . φ TN φ N ...

k(x1 , x1 )

...

⎥ ⎥ ⎥ ⎥ ⎥ ⎦

k(x1 , xN )

. . .

. ... . . k(xN , x1 ) . . . k(xN , xN )



T

k(x) = k(x1 , x), . . . , k(xN , x) $

k(xi , xj ) = exp

− xi − xj 2

2σ 2

,



(5.107)

⎥ ⎥ ⎥ ⎥, ⎥ ⎦

(5.108)

% ,

(5.109)

and v = λ−1 X T α, 

(5.110)

P = λ1 −1 X T α 1 , . . . , λ −1 X T α  ,

(5.111)

where α is the eigenvector of the matrix K, and λ is the corresponding eigenvalue.

194

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

The second kernel matrix for the upper bound of interval-valued data is given by ⎡ ⎢ ⎢ ⎢ K =X X =⎢ ⎢ ⎣ T

T

φ1 φ1 . . .

⎢ ⎢ ⎢ =⎢ ⎢ ⎣

φ1 φN . . .

...

T



T

...

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

T

φN φ1

. . . φN φN

k(x1 , x1 )

...

k(x1 , xN )

. . .

. ... . . k(xN , x1 ) . . . k(xN , xN )



(5.112)

⎥ ⎥ ⎥ ⎥, ⎥ ⎦

k(x) = (k(x1 , x), . . . , k(xN , x))T ,

(5.113)

. − xi − xj 2 , k(xi , xj ) = exp σ2

(5.114)

-

and −1

T

v = λ X α, 

−1

−1

T

(5.115) T



P = λ1 X α 1 , . . . , λ X α  ,

(5.116)

where α is the eigenvector of the matrix K, and λ is the corresponding eigenvalue. From those models, residuals are generated, and fault detection indices T 2 and SPE are computed for upper and lower bounds of interval-valued data in the feature space: T 2 = k(x)T P −1 P T k(x), 2

−1

T

(5.117)

T = k(x)T P  P k(x),

(5.118)

SPE = k(x, x) − kT (x)Ck(x),

(5.119)

Linear and nonlinear interval LV approaches for fault detection

SPE = k(x, x) − kT (x)Ck(x), 



195

(5.120) 



where φ = φ(x),  = λ1 , . . . , λ , C = P P T , φ = φ(x),  = λ1 , . . . , λ , T and C = P P . A fault is detected using T 2 index if both upper and lower indices T 2 2 and T are out of their respective control limits, that is, 2

2

2

T 2 > ταT and T > ταT , 2

(5.121)

2

where the control limits ταT and ταT are computed as in Eq. (3.19). A fault is also detected using SPE if both upper and lower indices SPE and SPE are out of their respective control limits., that is, spe

SPE > τα

and SPE > ταspe ,

(5.122)

spe

where the control limits τα and ταspe are computed as in Eq. (3.20). Interval GLRT chart

Based on the developed KGLRT (section 3.1.4) for single-valued data, in this section, we propose an interval kernel GLR test (IKGLRT) for interval-valued data. Based on the two KPCA models developed in the previous section, IKPCACR and IKPCAUL , we propose two versions of IKGLRT, IKGLRTCR and IKGLRTUL . IKPCAUL -based GLRT

In this section, we propose GLRT fault detection index-based KPCA for interval-valued data. The proposed approach IKPCAUL uses two classical KPCA models for upper and lower bounds of interval-valued data. Two GLR tests (upper and lower) are generated for fault detection. To compute the first GLR test using the data matrix of upper values of interval-valued data, let the eigenvectors corresponding to nonzero eigenvalues of the  T matrix K be α = (α 1 , α 2 , . . . , α  ) and φ(x) = φ(x1 ), φ(x2 ), . . . , φ(xN ) . According to Eq. (3.34), φ(x)T C φ φ(x) can be represented as φ(x)T P φ P Tφ φ(x) = φ(x)T φ(x)T α α T φ(x)φ(x) = kT α α T k,

(5.123)

196

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

where ⎛



k=⎜ ⎜

⎟ .. ⎟ . ⎠ k(x, xN )

k(x, x1 ) ⎜ ⎟ ⎜ k(x, x2 ) ⎟ ⎝

(5.124)

and G=

1 T

1 − k α αT k

.

(5.125)

Starting from the second KPCA model using upper values of intervalvalued data, let the eigenvectors corresponding to nonzero eigenvalues of the matrix K be α = (α 1 , α 2 , . . . , α  ) and φ(x) = (φ(x1 ), φ(x2 ), . . . , φ(xN ))T . According to Eq. (3.34), φ(x)T C φ φ(x) can be represented as T

φ(x)T P φ P φ φ(x) = φ(x)T φ(x)T α α T φ(x)φ(x) = kT α α T k,

(5.126)

where ⎛



k=⎜ ⎜

⎟ .. ⎟ . ⎠ k(x, xN )

k(x, x1 ) ⎜ ⎟ ⎜ k(x, x2 ) ⎟ ⎝

(5.127)

and G=

1 T

1 − k α αT k

.

(5.128)

IKPCACR -based GLRT

In this approach, we apply an IKPCACR for modeling and investigate fault detection index-based GLRT. From the previous presented IKPCACR model, let the eigenvectors corresponding to nonzero eigenvalues of the matrix KCR be α CR = (αCR,1 , αCR,2 , . . . , αCR, ) and  T φ(xCR ) = φ(xCR,1 ), φ(xCR,2 ), . . . , φ(xCR,N ) . According to Eq. (3.34),

Linear and nonlinear interval LV approaches for fault detection

197

Algorithm 1 IKGLRTUL algorithm. Inputs: N × m interval-valued data matrix [X ]. 1. Acquire the data set under normal operation, {[x1 ], [x2 ], . . . , [xN ]}. 2. Determine the normalized data matrices X and X for lower and upper values of interval-valued data, respectively. 3. Determine the kernel matrices K and K. 4. Determine the number of principal components (PCs), the eigenvalues, and the eigenvectors of each KPCA model, 5. Normalize the received m-dimensional vector xi (respectively, xi ) by xi =

xi . ||xi ||2

6. Compute the kernel vector of k = k(x, xi ) by Eq. (3.35). 7. Compute the values of G defined in Eq. (5.125) (respectively, G defined in Eq. (5.128)), 8. Determine a threshold value Gα (respectively, Gα ) for a desired false alarm rate α . 9. Detect the presence or absence of fault in [xi ] by checking whether G > Gα and G > Gα or not.

φ(xCR )T CCR,φ φ(xCR ) can be represented as T T T T φ(xCR )T PCR,φ PCR ,φ φ(xCR ) = φ(xCR ) φ(xCR ) α CR α CR φ(xCR )φ(xCR )

= kTCR α CR α TCR kCR ,

(5.129) where





kCR = ⎜ ⎜

⎟ .. ⎟ . ⎠ k(xCR , xCR,N )

k(xCR , xCR,1 ) ⎜ ⎟ ⎜ k(xCR , xCR,2 ) ⎟ ⎝

(5.130)

and GCR =

1 1 − kTCR α CR α TCR k CR

.

(5.131)

198

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Algorithm 2 IKGLRTCR algorithm. Inputs: N × 2m data matrix XCR = [Xc Xr ], where Xc and Xr are center and range data matrices of interval-valued data matrix [X ]. 1. Acquire the data set under normal operation, {xCR,1 , xCR,2 , . . . , xCR,N }. 2. Determine the normalized data matrix XCR . 3. Determine the kernel matrices KCR . 4. Determine the number of principal components (PCs), the eigenvalues, and the eigenvectors of the interval KPCA model. 5. Normalize the received 2 × m-dimensional vector xCR,i by xCR,i = 6. 7. 8. 9.

xi ||xCR,i ||2

.

Compute the kernel vector of kCR = k(xCR , xCR,i ) by Eq. (3.35). Compute the values of GCR defined in Eq. (5.125). Determine a threshold value GCR,α for a desired false alarm rate α . Detect the presence or absence of fault in [xi ] by checking whether GCR > GCR,α or not.

5.2.2.3 Applications Next, we present the detection enhancement-based interval KPCA through its charts T 2 and SPE.

Simulation example To validate the proposed fault detection scheme, we use a simulation example. Consider the following nonlinear simulation example based on three variables j = 1, 2, 3 and n = 600 samples: x1 (k) = e1 (k), x2 (k) = x21 (k) + sin(2π x1 (k)) + e2 (k), x3 (k) = x31 (k) + x1 (k) + 1 + e3 (k),

e1 (k) ∼ N (0, 2), e2 (k) ∼ N (0, 0.01), e3 (k) ∼ N (0, 0.01). (5.132)

To generate the interval-valued data matrix [X ], to each variable, we added a variation δj = 2%, j = 1, 2, 3, which simulates the sensor impreci-

Linear and nonlinear interval LV approaches for fault detection

199

Figure 5.54 3-D scatter plot of the generated interval-valued data.

sion. Hence the upper and lower bounds of intervals are given by xj (k) = xj (k) − δj xj (k),

j = 1, 2, 3.

(5.133)

xj (k) = xj (k) + δj xj (k), Fig. 5.54 shows the 3-D scatter plot of the interval-valued variables [x1 ], [x2 ], and [x3 ] of the simulated example. KPCA model for single-valued data and interval KPCA model are identified. To quantify the efficiency of the proposed interval fault detection indices, two metrics are used, the false alarms rate (FAR) and the miss detection rate (MDR). The FAR is the number of normal observations that are wrongly judged as faulty (false alarms) over the total number of fault-free samples. The MDR is the number of faulty samples that are wrongly considered as normal (missed detections) over the total number of faulty samples. A fault is simulated between sample times 300 and 600. Figs. 5.55, 5.56, and 5.57 show time evolution of fault detection indices T 2 and SPE in KPCA, IKPCACR , and IKPCAUL models, respectively. The performances in terms of FAR and MDR of the different fault detection charts are summarized in Table 5.11. The bold values highlight the best performance of the proposed method corresponding to the minimum values of FAR and MDR. IKPCAUL based on the interval upper and lower bounds gives the best results compared to KPCA and IKPCACR with FAR = 3.6% and MDR = 0% for variables x1 and x2 . Variable x3 presents FAR = 3.3% and MDR = 1.3%.

200

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.55 Time evolution of indices T 2 and SPE in KPCA model with a fault on variable x1 .

Figure 5.56 Time evolution of indices T 2 and SPE in IKPCACR model with a fault on variable x2 .

201

Linear and nonlinear interval LV approaches for fault detection

Figure 5.57 Time evolution of x3 . Table 5.11 Summary of MDR and FAR values for the simulation example. Faults

KPCA

T2 x1 x2 x3

FAR 6 6 6

IKPCACR

T2

SPE MDR 97.3 84.6 97

FAR 5.3 5.3 5

MDR 4 23.6 14.3

FAR 5.3 5.3 5.3

IKPCAUL

T2

SPE MDR 95 79.3 91.6

FAR 7.3 7.3 7.0

MDR 1.3 17.6 4.6

FAR 3.3 3.3 3.3

SPE MDR 100 85.3 96

FAR 3.6 3.6 3.3

MDR 0 0 1.3

Tennessee Eastman process We apply the proposed method to the well-known Tennessee Eastman process (see the process description in Appendix). Interval-valued data are generated by taking into account measurement sensors imprecision. If we suppose that all sensors have an imprecision of 2%, then measurement is represented as an interval-valued sample. Intervalvalued data matrices, training and testing, of the TEP are generated. The two proposed IKPCA approaches (IKPCACR and IKPCAUL ) have been applied to TEP. The training data under normal operating condition is used to build the IKPCA models and to determine the control limits ταT2 and ταspe with α = 5%. The cumulative percentage of variance (CPV) is used

202

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.58 Time evolution of indices T 2 and SPE in KPCA model with IDV-1 fault.

to select the number of principal components to be kept in the KPCA and IKPCA models. Figs. 5.58, 5.59, and 5.60 show the time evolution of the fault detection indices T 2 and SPE in KPCA, IKPCACR and IKPCAUL models, respectively. Figs. 5.61 and 5.62 present, respectively, time evolution of fault detection indices GLRT in IKPCAUL and IKPCACR models with fault IDV1 . The performances of the fault detection approach based on KPCA, IKPCACR , and IKPCAUL models, respectively, in terms of FAR and MDR are summarized in Table 5.12. From this table we can see that the two approaches based IKPCACR and IKPCAUL have reduced MDR compared to the classical KPCA-based SPE index. However, IKPCACR presents a big FAR compared to KPCA and IKPCAUL approaches. It should be noted that only IKPCA-based GLRT approach makes a tradeoff between MDR and FAR. The interval KPCA provides the best MDR and FAR reduction using the GLRT statistics. The bold values highlight the best performance of the proposed method.

Linear and nonlinear interval LV approaches for fault detection

Figure 5.59 Time evolution of indices T 2 and SPE in IKPCA model with IDV-1 fault.

Figure 5.60 Time evolution of indices T 2 and SPE in IKPCA model with IDV-1 fault.

203

204

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 5.61 Time evolution of indices IGLRTUL in IKPCA model with IDV-1 fault.

Figure 5.62 Time evolution of indices IGLRTCR in IKPCA model with IDV-1 fault.

5.2.3 Interval kernel PLS-based GLRT for fault detection 5.2.3.1 Kernel PLS for interval-valued data (IKPLS) Although, in process monitoring, linear PLS regression is often preferred for its simplicity, there are many situations in practice where nonlinearity does exist and linear methods are most sufficient to tackle those problems. Therefore developing a nonlinear PLS regression methods for intervalvalued data is important. Based on the developed Kernel PLS (see section 3.2.2) approach for single-valued data, in this section, we propose two interval kernel PLS models for interval-valued data. The first approach is based on the upper and lower bounds of the intervals, and the second uses the center and ranges of intervals. Let [X ] and [Y ] be the interval-valued data matrix of inputs and outputs, respectively. Similarly to interval KPCA, interval KPLS transforms the

Table 5.12 Missed detection rate (MDR %) and False Alarm Rate (FAR %) values for TEP data sets. Faults KPCA Interval KPCACR T2 SPE T2 SPE GLRT T2 FAR MDR FAR MDR FAR MDR FAR MDR FAR MDR FAR MDR

Interval KPCAUL FAR

MDR

FAR

IDV 1 IDV 2 IDV 3 IDV 4 IDV 5 IDV 6 IDV 7 IDV 8 IDV 9 IDV 10 IDV 11 IDV 12 IDV 13 IDV 14 IDV 15 IDV 16 IDV 17 IDV 18 IDV 19 IDV 20 IDV 21

5.8 4.9 5.3 3.5 3.5 2.2 0.8 5.3 16.9 3.5 4.4 8 2.6 3.1 4.4 19.1 4.9 4 4.4 2.6 10.2

0 1.1 77.6 0 25 0 0 1.2 78.2 19.1 12 0.3 3.6 0 72.8 21.6 2.1 8.2 32.5 20 37

0.44 0.00 5.8 0.89 0.89 0.00 0.00 1.33 15.62 0.89 1.78 5.35 1.33 0.44 0.00 23.21 0.00 1.33 0.44 0.00 4.91

2.6 3.5 2.2 1.3 1.3 0.4 1.3 0.8 4.4 0.8 1.7 4.4 1.7 2.2 1.3 6.2 1.3 2.6 2.2 2.2 4.4

0.5 1.2 93.1 36.4 67.9 49.6 0 2.6 93.6 55.4 39.7 1 4.6 0 92.1 72.8 18.1 70.9 79 53.1 50

12.5 9.3 11.1 8.03 8 6.6 6.2 9.8 16.5 9.3 11.1 10.2 5.3 8.9 7.5 15.1 8.9 10.7 13.3 4.9 16.5

0 1 82.6 0 36.4 0 0.2 1.6 85.4 23.6 18.6 0.3 3.6 0 82.4 29.1 2.1 8.8 42.8 26.1 39.7

6.9 4.9 12.9 5.8 5.8 1.7 4 4.4 16 3.5 4.4 9.8 2.6 5.3 2.6 22.3 4.9 4.9 2.2 3.5 7.1

0.5 1.2 78.5 0 61 80.2 0 2 80.6 38.6 18.3 0.6 4.3 0 77.1 51.5 4.6 71.6 63.7 35.6 47.7

23.21 16.9 33 25 25 14.2 19.1 26.7 29.4 21.4 19.6 27.6 12.9 22.3 25 33 26.7 24.1 25.4 18.7 37

0.89 0.3 59.7 0 7.8 0 0 0.7 58.7 13.5 9.2 0 2.6 0 58.5 10.6 1.7 6.1 31.7 12.3 23.1

0.50 0.89 7.58 0.44 0.44 0.44 0.00 1.33 15.17 0.44 1.33 5.80 1.33 0.44 0.44 23.66 0.00 0.44 0.44 0.00 4.46

0 1.37 87.87 6.12 65.37 0.00 0.00 1.87 88.75 44.37 31.12 0.62 5.25 0.00 83.87 58.75 9.25 9.25 90.12 44.12 56.75

3.1 3.5 2.2 0.8 0.8 0 0 1.3 5.3 0.8 1.3 3.5 2.2 1.7 1.3 4 1.7 2.2 2.2 1.7 3.2

0.6 1.3 92 36 67.6 92.1 0 2.7 92.6 51.1 39.7 4.2 8.8 0.1 90.8 70.1 32.5 84.7 78.7 50.2 50

SPE

GLRT MDR

0.75 1.35 86.37 54.50 65.12 0.62 0.00 2.00 88.25 44.12 44.00 1.00 5.37 0.12 82.87 60.50 16.00 9.25 90.50 47.50 57.37

206

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

interval-valued data matrices [X ] and [Y ] into new single-valued matrices, and a classical KPLS is then applied using the new single-valued matrices. Recall that input and output interval-valued measurement vectors are     given, respectively, by [x] = x, x and y = y, y . The KPLS algorithm for single-valued data is presented in section 3.2.2. The first proposed interval KPLS method fits two classical KPLS models using input and output data matrices formed, respectively, by the lower and upper bounds of interval measurements. To deal with nonlinearities, xi (respectively, xi ), i = 1, . . . , N, are mapped into a feature space H: xi → φ(xi ) (respectively, xi → φ(xi )). According to the KPLS algorithm (Algorithm 2), the kernel matrices K and K must be centralized as 1 1 K ← (IN − 1N 1TN )K (IN − 1N 1TN ), n n

(5.134)

1 1 K ← (IN − 1N 1TN )K (IN − 1N 1TN ), n n

(5.135)

where 1N is an (N × 1) vector with elements equal to one, and IN is the N-dimensional identity matrix. Based on the matrix of mapped input data in the feature space, the NIPALS-PLS algorithm is modified, and the KPLS is presented in Algorithm 2 (section 3.2.2) [74]. On this basis the regression coefficients matrix B (respectively, B) can be obtained from 

B = T U T T K U

−1

TTY

(5.136)

and T



T

B= U T KU

−1

T

T Y.

(5.137)

The prediction of the output variables is given by 

Yˆ = B = KU T T K U

−1

TTY

(5.138)

and 

Yˆ = B = KU T K U T

−1

T

T Y.

(5.139)

Linear and nonlinear interval LV approaches for fault detection

207

For a new observation x (x) of input variables, the output is estimated by



yˆ = B φ(x) = Y T T U T T K U

−1 T

k(x)

(5.140)

k(x).

(5.141)

and



T

yˆ = B φ(x) = Y T U T K U T

−1 T

The residuals are then given by e = y − yˆ,

(5.142)

e = y − yˆ .

5.2.3.2 Interval KPLS-based fault detection charts Interval upper and lower bounds IKPLSUL chart

From the interval residuals (Eq. (5.142)) two GLRT charts can be computed using the formulation presented in Eq. (2.13). The univariate GLRT and multivariate GLRT charts. The univariate GLRT chart computed using the upper and lower bounds of interval residuals is given by UG =

UG =

1 σ2

1 σ2

e2j ,

j = 1, . . . , m,

(5.143)

e2j ,

j = 1, . . . , m.

(5.144)

Let Gj,α and Gj,α be their corresponding thresholds, respectively. A fault is detected if both upper and lower univariate GLRT charts are out of their respective thresholds, U Gj > Gj,α

and

U Gj > Gj,α

for

j = 1, . . . , m. (5.145)

The multivariate GLRT chart is given by MG =

1  2 e , 2

σ

(5.146)

208

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

MG =

1 σ2

e 2 .

(5.147)

Let Gα and Gα be their corresponding thresholds, respectively. A fault is detected if both upper and lower GLRT charts are out of their respective thresholds, G > Gα

and

G > Gα .

(5.148)

The interval KPLS-based lower and upper bounds of intervals (IKPLSUL ) algorithm is shown in Algorithm 3. Algorithm 3 IKPLS-GLRTUL algorithm. Training step: Inputs: N × m input data matrices X, X and N × p output data matrices Y , Y . 1. Standardize data matrices to zero mean and unit variance, X, X, Y , and Y . 2. Determine the kernel matrices K and K. 3. Scale the data matrices K and K as given by Eqs. (5.134) and (5.135), respectively. 4. Determine the number of principal components (PCs) to be kept in the IKPLS model. 5. Compute the regression coefficients matrices B and B as given by Eq. (5.138). 6. Compute the estimations of output matrices Yˆ and Yˆ . 7. Compute the interval estimation errors as in Eq. (5.142) and then compute the GLRT charts G and G and their corresponding thresholds Gα and Gα , respectively, Testing step: 1. For new testing data vectors x, x, y, and y, map input data vectors into feature space to get k(x) and k(x). ˆ 2. Compute the output estimations yˆ and y. 3. Compute the interval residuals as in Eq. (5.142) and compute the GLRT charts G and G. 4. Detect the presence or absence of fault by checking M G > MGα and G > Gα or U Gj > Gj,α andU Gj > Gj,α for j = 1, . . . , m.

Linear and nonlinear interval LV approaches for fault detection

209

Interval center and range IKPLSCR chart

In this proposed interval kernel PLS, interval data matrices [X ] and [Y ] are transformed into numerical matrices using the interval center and range. The center and range variables are used together to build a regression model simultaneously. In the proposed KPLS approach for interval-valued data, new data matrices are formed by the concatenation of center and range data matrices, and a KPLS method is applied to these new data matrices. The new input data matrix XCR is given in Eq. (5.100), and the new output data matrix is given by



YCR = Y c Y r ,

(5.149)

where Y c = [yc1 , yc2 , . . . , ycN ]T ∈ RN ×p and Y r = [yr1 , yr2 , . . . , yrN ]T ∈ RN ×p are the output center and range matrices, respectively. The new IKPLS approach consists of applying a KPLS using the new input data matrix XCR and output data matrix YCR . First, the input data matrix XCR is mapped to feature space. Let xCR = [xc xr ] ∈ R2m be a new data sample. The kernel matrix KCR -based center and range method is expressed in Eq. (5.101). The regression coefficient BCR is given by 

T KCR UCR BCR = TCR UCR TCR

−1

T TCR YCR .

(5.150)

The prediction of the output variables is given by 

T Yˆ CR = CR BCR = KCR UCR TCR KCR UCR

−1

T TCR YCR .

(5.151)

For a new observation xCR of input variables, the output is estimated by 



T T yˆ CR = BCR φCR (xCR ) = YCR TCR UCR TCR KCR UCR

−1 T

kCR (xCR ). (5.152)

The residuals are then generated as eCR = yCR − yˆ CR .

(5.153)

From those interval residuals two GLRT charts can be computed using the formulation presented in Eq. (2.13).

210

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 5.13 Selected monitored variables in the TE process. No. Monitored variables No. Monitored variables

1 2 3

A Feed (stream 1) D Feed (Stream 2) E Feed (Stream 3)

9 10 11

4 5 6

12 13 14

7

Total feed flow (Stream 4) Recycle Flow (Stream 8) Reactor Feed Rate (Stream 6) Reactor Temperature

8

Purge Rate (Stream 9)

16

15

Product Sep. Temp. Product Sep. Pressure Prod. Sep. Underflow (Stream 10) Stripper Pressure Stripper Temperature Stripper Stream Flow Reactor Cooling Water Outlet Temp. Separator Cooling Water Outlet Temp

The univariate GLRT and multivariate GLRT charts. The univariate GLRT chart is given by U Gj,CR =

1 σ2

j = 1, . . . , 2m.

ej2,cr ,

(5.154)

Let the corresponding threshold be Gj,CR,α . A fault is detected if U Gj,CR > Gj,CR,α ,

j = 1, . . . , 2m,

(5.155)

and the multivariate GLRT chart is given by M GCR =

1 σ2

eCR 2 .

(5.156)

Let GCR,α be the corresponding threshold. A fault is detected if the multivariate GLRT is out of its threshold, M GCR > GCR,α .

(5.157)

The interval KPLS-based center and range of interval (IKPLSCR ) algorithm is shown in Algorithm 4.

5.2.3.3 Interval KPLS-based GLRT and application To evaluate the performance of the IKPLS method, the well-known TE chemical process is used (see Appendix for description details of the TE process). The measured variables listed in Table 5.13 are chosen as X, and

Linear and nonlinear interval LV approaches for fault detection

Table 5.14 Selected output measured variables in the TE process. No. Measured variables No. Measured variables

1 2 3 4 5

Composition of A in Reactor Feed Composition of B in Reactor Feed Composition of C in Reactor Feed Composition of D in Reactor Feed Composition of E in Reactor Feed

6 7 8 9 10

Composition of F in Reactor Feed Composition of E in Product Flow Composition of F in Product Flow Composition of G in Product Flow Composition of H in Product Flow

Figure 5.63 Time evolution of univariate GLRT index based on IKPLSUL model.

Figure 5.64 Time evolution of multivariate GLRT index based on IKPLSUL model.

211

212

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Algorithm 4 IKPLS-GLRTCR algorithm. Training step: Inputs: N × m input interval data matrix [X ] and N × p output interval data matrix [Y ] Compute center and ranges matrices X c , X r , Y c and Y r , respectively. Form the new data matrices XCR = [X c X r ], and YCR = [Y c Y r ], respectively. 1. Standardize data matrices to zero mean and unit variance, XCR and YCR . 2. Determine the kernel matrix KCR using data matrix XCR . 3. Scale the data matrix KCR . 4. Determine the number of principal components (PCs) to be kept in the IKPLS model. 5. Compute the regression coefficients matrix BCR as given by Eq. (5.150). 6. Compute the estimations of output matrix Yˆ CR . 7. Compute the estimation error as in Eq. (5.142) and then compute the GLRT charts U Gj,CR , M GCR and their corresponding thresholds Gj,cr ,α and GCR,α , respectively. Testing step: 1. For new testing data vectors xCR and yCR , map input data vector into feature space to get k(xCR ). 2. Compute the output estimations yCR ˆ . 3. Compute the residuals as in Eq. (5.153) and compute the GLRT charts U Gj,CR and M GCR . 4. Detect the presence or absence of fault by checking whether M GCR > GCR,α or U Gj,CR > Gj,cr ,α for j = 1, . . . , 2m.

those listed in Table 5.14 are chosen as Y [75]. Figs. 5.63 and 5.64 show time evolution of univariate and multivariate GLRT charts, respectively, based on IKPLSUL approach. Figs. 5.65 and 5.66 show time evolution of univariate and multivariate GLRT charts based on IKPLSCR approach. Performance results in terms of false alarm rate (FAR %) and missed detection rate (MDR %) are illustrated in Table 5.15. The multivariate GLRT chart shows better results in comparison to the univariate chart.

Linear and nonlinear interval LV approaches for fault detection

213

Figure 5.65 Time evolution of univariate GLRT index based IKPLSCR model.

Figure 5.66 Time evolution of multivariate GLRT index based on IKPLSCR model.

The performances of the proposed interval KPLS method in terms of MDR using the multivariate GLRT index for all the 21 faults are compared to the KPLS approach proposed by Xie et al. [75]. Table 5.16 shows the MDR for the 21 faults of the TEP obtained using the proposed IKPLSCR , IKPLSUL methods, and KPLS approach [75]. The bold values highlight the best performance of the proposed methods. The IKPLSUL - and IKPLSCR -based GLRT indices provide the best MDR reduction for faults 1, 2, 3, 6, 8, 9, 12, 13, 15, 18, and 19.

214

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 5.15 Missed detection rate (MDR %) and false alarm rate (FAR %) values for TEP data sets using IKPCA-based GLRT approach.

IDV 1 IDV 2 IDV 3 IDV 4 IDV 5 IDV 6 IDV 7 IDV 8 IDV 9 IDV 10 IDV 11 IDV 12 IDV 13 IDV 14 IDV 15 IDV 16 IDV 17 IDV 18 IDV 19 IDV 20 IDV 21

IKPLS-GLRTCR Univariate GLRT Multivariate GLRT FAR MDR FAR MDR

IKPLS-GLRTUL Univariate GLRT Multivariate GLRT FAR MDR FAR MDR

38.83 28.12 30.35 33.03 33.03 30.35 31.69 24.55 33.03 28.12 24.10 32.14 27.67 25.89 29.46 33.48 25.89 32.14 23.66 25.44 32.58

41.51 30.35 32.58 33.48 33.48 33.48 32.58 26.78 37.05 29.01 24.10 33.92 31.25 28.57 32.58 37.50 29.91 34.82 26.78 29.01 37.05

0.25 0.75 64.62 63.50 47.25 0.37 45.25 3.75 67.37 16.87 59.75 2.75 3.00 1.25 62.75 48.25 17.62 6.37 62.25 30.50 47.25

7.58 6.69 9.37 10.71 10.71 4.01 4.01 6.69 8.03 4.01 4.46 6.69 6.25 5.35 5.80 11.16 4.46 10.26 4.91 4.46 9.82

0.62 1.50 89.87 87.50 69.25 0.87 60.37 4.87 89.87 28.62 82.75 4.37 5.25 3.62 85.12 73.62 25.62 9.87 86.00 45.12 68.62

0.37 0.75 62.87 62.50 46.12 0 44 3.62 67.25 16.25 58.62 2.37 3 1.37 61.25 46.62 16.37 6.25 60.62 29.12 43

7.58 5.80 9.82 10.71 10.71 4.018 4.46 6.69 7.148 4.918 4.91 7.14 6.698 6.25 5.35 11.16 4.01 10.71 4.46 3.578 9.378

0.62 1.62 90.37 86.87 68.12 0.12 59.87 4.87 89.62 28.37 83 3.83 5.12 3 84.87 71.62 22.50 9.87 85.75 45.50 64.75

5.2.4 Conclusion In this chapter, we proposed a new technique based on interval kernel PCA and kernel PLS-based GLRT for nonlinear fault detection. The idea behind the novel approaches is to widen their applicability for processes represented by interval valued data. This helped provide a more accurate modeling of the uncertain nonlinear systems and then presented a more effective way that enables better decision making with respect to fault detection. Two examples were used to evaluate the fault detection performances of the proposed interval kernel PCA- and kernel PLS-based GLRT approaches. The first one was a simulated example, and the second one was a Tennessee Eastman process benchmark. The detection abilities of the proposed techniques are evaluated in terms of missed detection and false alarms rates. The detection results demonstrated the effectiveness of the proposed techniques over the classical methods.

Linear and nonlinear interval LV approaches for fault detection

215

Table 5.16 Missed detection ratio (MDR %)-based Q and GLRT for the 21 faults of the TEP. Xie et al. [75] Proposed IKPLS approach Q-KPLS Multivariate Multivariate GLRT-IKPLSCR GLRT-IKPLSUL

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

3 5.97 99.4 12.7 8.6 2.99 4.66 19.36 99.5 19.2 20.28 9.44 13.57 2.22 99.3 66.4 18.62 15.4 93.74 42.18 61.46

0.62 1.50 89.87 87.50 69.25 0.87 60.37 4.87 89.87 28.62 82.75 4.37 5.25 3.62 85.12 73.62 25.62 9.87 86.00 45.12 68.62

0.62 1.62 90.37 86.87 68.12 0.12 59.87 4.87 89.62 28.37 83 3.83 5.12 3 84.87 71.62 22.50 9.87 85.75 45.50 64.75

References [1] S. Wold, N. Kettaneh-Wold, B. Skagerberg, Nonlinear PLS modeling, Chemometrics and Intelligent Laboratory Systems 7 (1989) 53–65. [2] T. Kourti, J. MacGregor, Process analysis, monitoring and diagnosis using multivariate projection methods: a tutorial, Chemometrics and Intelligent Laboratory Systems 28 (3) (1995) 3–21. [3] M.-F. Harkat, G. Mourot, J. Ragot, An improved PCA scheme for sensor FDI: application to an air quality monitoring network, Journal of Process Control 16 (6) (2006) 625–634. [4] E.L. Russell, L.H. Chiang, R.D. Braatz, Data-Driven Methods for Fault Detection and Diagnosis in Chemical Processes, Springer Science & Business Media, 2012. [5] J.E. Jackson, G.S. Mudholkar, Control procedures for residuals associated with principal component analysis, Technometrics 21 (3) (1979) 341–349.

216

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

[6] J.V. Kresta, J.F. MacGregor, T.E. Marlin, Multivariate statistical monitoring of process operating performance, The Canadian Journal of Chemical Engineering 69 (1) (1991) 35–47. [7] J.F. MacGregor, T. Kourti, Statistical process control of multivariate processes, Control Engineering Practice 3 (3) (1995) 403–414. [8] H. Wang, Z. Song, P. Li, Fault detection behavior and performance analysis of principal component analysis based process monitoring methods, Industrial & Engineering Chemistry Research 41 (10) (2002) 2455–2464. [9] J. Gertler, T.J. McAvoy, Principal component analysis and parity relations – a strong duality, IFAC Proceedings Volumes 30 (18) (1997) 833–838. [10] S.J. Qin, W. Li, Detection, identification, and reconstruction of faulty sensors with maximized sensitivity, AICHE Journal 45 (9) (1999) 1963–1976. [11] S. Joe Qin, Statistical process monitoring: basics and beyond, Journal of Chemometrics 17 (8–9) (2003) 480–502. [12] S.J. Qin, Survey on data-driven industrial process monitoring and diagnosis, Annual Reviews in Control 36 (2) (2012) 220–234. [13] P. Geladi, B.R. Kowalski, Partial least-squares regression: a tutorial, Analytica Chimica Acta 185 (1986) 1–17. [14] P. Cazes, A. Chouakria, E. Diday, Y. Schektman, Extension de l’analyse en composantes principales a des données de type intervalle, Revue de Statistique Appliquée 45 (3) (1997) 5–24. [15] C. Lauro, F. Palumbo, Principal component analysis of interval data: a symbolic data analysis approach, Computational Statistics 15 (1) (2000) 73–87. [16] J.G. Le-Rademacher, Principal Component Analysis for Interval-Valued and Histogram-Valued Data and Likelihood Functions and Some Maximum Likelihood Estimators for Symbolic Data, PhD dissertation, University of Georgia, 2008. [17] M. Jie, Three-way PCA of interval data for dynamic features extraction in futures market, in: Chinese Control and Decision Conference, CCDC 2008, IEEE, 2008, pp. 1083–1086. [18] H. Wang, J. Wu, R. Guan, Cipca: complete-information-based principal component analysis for interval-valued data, Neurocomputing 86 (2012) 158–169. [19] F. Palumbo, C.N. Lauro, A pca for interval-valued data based on midpoints and radii, in: New Developments in Psychometrics, Springer, 2003, pp. 641–648. [20] J.F. MacGregor, C. Jaeckle, Process monitoring and diagnosis by multiblock PLS methods, AIChE Journal 40 (5) (1994). [21] L. Billard, E. Diday, Regression analysis for interval-valued data, in: Data Analysis, Classification, and Related Methods, Springer, 2000, pp. 369–374. [22] B. Sinova, A. Colubi, G. González-Rodrı, et al., Interval arithmetic-based simple linear regression between interval data: discussion and sensitivity analysis on the choice of the metric, Information Sciences 199 (2012) 109–124. [23] N.C. Lauro, F. Palumbo, Principal component analysis on subpopulations: an interval data approach, in: IMPS Conference’01, Osaka (Japan), 2001. [24] V. Kreinovich, H.T. Nguyen, B. Wu, On-line algorithms for computing mean and variance of interval data, and their use in intelligent systems, Information Sciences 177 (16) (2007) 3228–3238. [25] T. Hickey, Q. Ju, M.H. Van Emden, Interval arithmetic: from principles to implementation, Journal of the ACM (JACM) 48 (5) (2001) 1038–1068. [26] R.E. Moore, Interval Analysis, Prentice Hall, Englewood Cliffs, NJ, 1966.

Linear and nonlinear interval LV approaches for fault detection

217

[27] A. Chouakria, E. Diday, P. Cazes, Vertices principal components analysis with an improved factorial representation, in: Advances in Data Science and Classification, Springer, 1998, pp. 397–402. [28] A. Chouakria, E. Diday, P. Cazes, An improved factorial representation of symbolic objects, in: Knowledge Extraction from Statistical Data, 1998, pp. 301–305. [29] J. Le-Rademacher, L. Billard, Likelihood functions and some maximum likelihood estimators for symbolic data, Journal of Statistical Planning and Inference 141 (4) (2011) 1593–1602. [30] H.-H. Bock, E. Diday, Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information From Complex Data, Springer Science & Business Media, 2012. [31] J.L. Rademacher, L. Billard, Symbolic covariance principal component analysis and visualization for interval-valued data, Journal of Computational and Graphical Statistics 21 (2) (2012) 413–432. [32] C.N. Lauro, F. Palumbo, Principal component analysis of interval data: a symbolic data analysis approach, Computational Statistics 15 (1) (2000) 73–87. [33] S.J. Qin, R. Dunia, Determining the number of principal components for best reconstruction, IFAC Proceedings Volumes 31 (11) (1998) 357–362. [34] S. Valle, W. Li, S.J. Qin, Selection of the number of principal components: the variance of the reconstruction error criterion with a comparison to other methods, Industrial & Engineering Chemistry Research 38 (11) (1999) 4389–4401. [35] S.J. Qin, R. Dunia, Determining the number of principal components for best reconstruction, Journal of Process Control 10 (2) (2000) 245–250. [36] R. Dunia, S.J. Qin, T.F. Edgar, T.J. McAvoy, Identification of faulty sensors using principal component analysis, AIChE Journal 42 (10) (1996) 2797–2812. [37] R. Dunia, S.J. Qin, Joint diagnosis of process and sensor faults using principal component analysis, Control Engineering Practice 6 (4) (1998) 457–469. [38] T. AitIzem, W. Bougheloum, M.F. Harkat, M. Djeghaba, Fault detection and isolation using interval principal component analysis methods, IFAC-PapersOnLine 48 (21) (2015) 1402–1407. [39] T. AitIzem, W. Bougheloum, M.F. Harkat, M. Djeghaba, Interval PCA based fault detection and isolation with new interval spe statistic, in: Proceedings of International Conference on Automatic Control, Telecommunication and Signals, ICATS’15, Annaba, Algeria, 2015. [40] G.E.P. Box, Some theorems on quadratic forms applied in the study of analysis of variance problems: effect of inequality of variance in one-way classification, The Annals of Mathematical Statistics 25 (1954) 290–302. [41] S.W. Roberts, Control chart tests based on geometric moving averages, Technometrics 1 (3) (1959) 239–250. [42] S.V. Crowder, M.D. Hamilton, An EWMA for monitoring a process standard deviation, Journal of Quality Technology 24 (1) (1992) 12–21. [43] E.S. Page, Continuous inspection schemes, Biometrika (1954) 100–115. [44] M. Hart, R. Hart, Shewhart control charts for individuals with time-ordered data, in: Frontiers in Statistical Quality Control, vol. 4, 1992, p. 123. [45] J.S. Hunter, The exponentially weighted moving average, Journal of Quality Technology 18 (4) (1986) 203–210. [46] D.C. Montgomery, Introduction to Statistical Quality Control, John Wiley& Sons, New York, 2005.

218

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

[47] J. Gertler, D. Singer, A new structural framework for parity equation-based failure detection and isolation, Automatica 26 (2) (1990) 381–388. [48] U. Kruger, Q. Chen, X. Wang, J.S. Qin, An alternative pls algorithm for the monitoring of industrial process, IEEE American Control Conference 6 (2001) 4455–4459. [49] A. Benaicha, G. Mourot, J. Ragot, K. Benothman, Fault detection and isolation with interval principal component analysis, in: Proceedings Engineering & Technology, vol. 1, 2013, pp. 162–167. [50] J. Jackson, G. Mudholkar, Control procedures for residuals associated with principal component analysis, Technometrics 21 (1979) 341–349. [51] T. Ait-Izem, M.-F. Harkat, M. Djeghaba, F. Kratz, Sensor fault detection based on principal component analysis for interval-valued data, Quality Engineering (2017) 1–13. [52] T. Ait-Izem, M.-F. Harkat, M. Djeghaba, F. Kratz, On the application of interval PCA to process monitoring: a robust strategy for sensor FDI with new efficient control statistics, Journal of Process Control 63 (2018) 29–46. [53] S. Mika, B. Schölkopf, A.J. Smola, K.-R. Müller, M. Scholz, G. Rätsch, Kernel PCA and de-noising in feature spaces, in: Advances in Neural Information Processing Systems, 1999, pp. 536–542. [54] M. Kramer, Nonlinear principal component analysis using autoassociative neural networks, AICHE Journal 37 (1991) 23–243. [55] S. Tan, M. Mavrovouniotis, Reduction data dimensionality through optimizing neural network inputs, AIChE Journal 41 (6) (1995) 1471–1480. [56] D. Dong, T. McAvoy, Nonlinear principal component analysis based on principal curves and neural networks, Computers and Chemical Engineering 20 (1) (1996) 65–78. [57] B. Schölkopf, A. Smola, K.-R. Müller, Kernel principal component analysis, in: 7th International Conference on Artificial Neural Networks, ICANN 1997, Lausanne, Switzerland, 1997, pp. 583–588. [58] B. Schölkopf, A. Smola, K. Müller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation 10 (5) (1998) 1299–1319. [59] J.-M. Lee, C. Yoo, S.W. Choi, P.A. Vanrolleghem, I.-B. Lee, Nonlinear process monitoring using kernel principal component analysis, Chemical Engineering Science 59 (1) (2004) 223–234. [60] S.W. Choi, C. Lee, J.-M. Lee, J.H. Park, I.-B. Lee, Fault detection and identification of nonlinear processes based on kernel PCA, Chemometrics and Intelligent Laboratory Systems 75 (1) (2005) 55–67. [61] Z. Ge, C. Yang, Z. Song, Improved kernel PCA-based monitoring approach for nonlinear processes, Chemical Engineering Science 64 (9) (2009) 2245–2255. [62] S. Qin, Survey on data-driven industrial process monitoring and diagnosis, Annual Reviews in Control 36 (2) (2012) 220–234. [63] P.D.U.P. Giordani, A least squares approach to principal component analysis for interval valued data, Chemometrics and Intelligent Laboratory Systems 70 (2) (2004) 179–192. [64] H. Wold, Estimation of principal components and related models by iterative least squares, Multivariate Analysis (1966) 391–420. [65] P. Geladi, B. Kowalski, Partial least-squares regression: a tutorial, Analytica Chimica Acta 185 (1986) 1–17. [66] M.-F. Harkat, M. Mansouri, M. Nounou, H. Nounou, Enhanced data validation strategy of air quality monitoring network, Environmental Research 160 (2018) 183–194.

Linear and nonlinear interval LV approaches for fault detection

219

[67] M. Mansouri, M.-F. Harkat, M. Nounou, H. Nounou, Midpoint-radii principal component analysis-based EWMA and application to air quality monitoring network, Chemometrics and Intelligent Laboratory Systems (2018). [68] A. Costa, B. Pimentel, R. Souza, K-means clustering for symbolic interval data based on aggregated kernel functions, in: 22nd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2010, IEEE, 2010, pp. 375–376. [69] B. Pimentel, A. Costa, R. Souza, A partitioning method for symbolic interval data based on kernelized metric, in: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ACM, 2011, pp. 2189–2192. [70] A.F. da Costa, B.A. Pimentel, R.M. de Souza, Clustering interval data through kernel-induced feature space, Journal of Intelligent Information Systems 40 (1) (2013) 109–140. [71] L. Billard, Dependencies and variation components of symbolic interval-valued data, Selected Contributions in Data Analysis and Classification (2007) 3–12. [72] E.d.A.L. Neto, F.d.A.de Carvalho, Centre and range method for fitting a linear regression model to symbolic interval data, Computational Statistics & Data Analysis 52 (3) (2008) 1500–1515. [73] M.A. Domingues, R.M. de Souza, F.J.A. Cysneiros, A robust method for linear regression of symbolic interval data, Pattern Recognition Letters 31 (13) (2010) 1991–1996. [74] R. Rosipal, L.J. Trejo, Kernel partial least squares regression in reproducing kernel Hilbert space, Journal of Machine Learning Research 2 (2001) 97–123. [75] Y. Xie, Y. Zhang, Q. Jia, L. Zhai, Fault detection based on probabilistic kernel partial least square regression for industrial processes, Journal of Chemical Engineering of Japan 51 (1) (2018) 89–99.

CHAPTER 6

Model-based approaches for fault detection Contents 6.1. Introduction 6.2. State estimation 6.2.1 State estimation problem formulation 6.2.2 State estimation techniques 6.2.2.1 Extended Kalman filter (EKF) 6.2.2.2 Unscented Kalman filter (UKF) 6.2.2.3 Particle filter (PF) 6.3. Fault detection-based state estimation approaches 6.3.1 Fault detection using multiscale EWMA chart 6.3.1.1 EWMA chart 6.3.1.2 Multiscale EWMA chart 6.3.2 Application to wastewater treatment plant 6.3.2.1 State estimation results 6.3.2.2 Fault detection results 6.4. Fault detection-based state estimation approach 6.4.1 Fault detection using optimized weighted SS-DEWMA chart 6.4.2 Optimized WSS-DEWMA and application to fault detection 6.4.2.1 Application 1: synthetic example 6.4.2.2 Application 2: Cad System in E. coli (CSEC) 6.5. Conclusions References

221 225 225 226 226 226 228 230 230 230 231 233 234 238 240 240 245 245 246 252 255

6.1 Introduction Fault detection is usually the first step in process monitoring. For example, in biological systems (like wastewater treatment plants (WWTP) and Cad System in E. coli (CSEC)), where the state variable cannot be measured directly, state estimation-based fault detection techniques need to be developed to enhance the monitoring of these systems. Model-based state estimation is one of the most widely applied methods for estimation of unmeasured states based on available process model. The state estimation technique can be applied for residual generation, which is used for fault detection (FD) purposes. For instance, proper operation of wastewater treatment plants requires good understanding of their behavior and tight monitoring of their key Data-Driven and Model-Based Methods for Fault Detection and Diagnosis https://doi.org/10.1016/B978-0-12-819164-4.00015-7

Copyright © 2020 Elsevier Inc. All rights reserved.

221

222

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 6.1 Plots of samples of normal and faulty signals.

variables to achieve the desired effectiveness of operation and to ensure maintaining the desired safety standards and protocols. Therefore the main objective of this chapter is developing an improved model-based fault detection technique that aims at enhancing the monitoring of industrial systems. The objectives of this chapter are twofold. Firstly, a state estimation technique that can accurately estimate the state variables in such systems will be developed, and secondly, a new fault detection chart will be proposed. To deal with scenarios where the process model is available and has a pre-defined structure (obtained using material and energy balances), the state variables are estimated using state estimation techniques. The state estimation techniques include extended Kalman filter (EKF) [1,2], unscented Kalman filter (UKF) [3,4], central difference Kalman filter (CDKF) [5], square-root unscented Kalman filter (SRUKF) [6], square-root central difference Kalman filter (SRCDKF) [7], and particle filtering (PF) [8]. The PF has shown good improvement and provides a significant advantage over Kalman filter-based techniques and can be applied to nonlinear models with non-Gaussian errors. After estimating the unknown variables and computing the monitored residuals using the state estimation methods, the sensor fault detection problem will be addressed, in which we assume that there is no change in the relations between pairs of variables (no fault in the process). The term sensor fault states a disparity between the ideal value that a sensor should indicate under normal operation conditions and the value that it actually states. This does not necessarily mean that the sensor itself is faulty. This difference may be caused by an abrupt fault (i.e., offset or bias), incipient or evolutive fault (i.e., drift), or change in the precision (i.e., change in the variance). An example of each fault appears in Fig. 6.1.

Model-based approaches for fault detection

223

The model-based fault detection methods, mainly the Shewhart chart [9], exponentially weighted moving average (EWMA) chart [10,21], cumulative sum (CUSUM) chart [22], and generalized likelihood ratio test (GLRT) chart [23,24], have been used to improve the fault detection (FD) capabilities. The model-based FD methods generally depend upon the system dynamic structure. Thus the selected measurements are compared with mathematical model under fault-free conditions. For important industrial parameters, estimation is necessary for nonmeasurable quantities before being able to apply monitoring. Whereas the Shewhart chart considers solely the present data sample to evaluate performance, the CUSUM and EWMA charts consider a weighted sum of past observations. The CUSUM chart provides same weight for all past observations, whereas the EWMA chart gives more importance to the more recent observations [25–27]. Both CUSUM and EWMA charts perform almost equally in detecting small mean shift, but the EWMA chart is somewhat easier to set up and operate. Moreover, since EWMA statistic is a weighted average of all previous and present observations, it is less sensitive to the normality assumption [25,28]. Thus, in this chapter, we propose two enhanced control charts based on EWMA to address the problem of fault detection. The EWMA chart usually has two parameters, the control width L and the smoothing parameter λ, which must be specified by the user so that it would be optimal at detecting a specific change size. Thus we will propose an optimized EWMA based on the best selection of L and λ. The multiobjective optimization (MOO) is addressed using three objective functions: i) missed detection rate (MDR), ii) false alarm rate (FAR), and iii) ARL1 values. The developed chart provides quick and good detection. The idea behind the developed EWMA is computing a new chart that takes into account the current and the previous data information by giving more weight to the more recent data. In addition, the multiscale nature of the data provides a representation that can be made robust to noises and errors and has a great impact on the quality of fault detection. Hence, we propose to combine the EWMA chart with wavelet-based multiscale representation to improve the monitoring effectiveness. The first novel chart is the so-called multiscale EWMA (MS-EWMA) chart. The developed MS-EWMA strategy is addressed so that the state variables are estimated using PF method and the faults are detected using MS-EWMA chart.

224

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Another improved EWMA chart, called Max-Double EWMA (MDEWMA), was proposed in the literature and has shown higher detection performances than the classical EWMA chart in detecting minor and moderate shifts in the mean and/or variance [29]. The M-DEWMA chart considers the highest of the absolute values for two EWMA statistics, one controlling the mean, and the other for the variance. It has been presented that the M-DEWMA chart performed higher than the Max-EWMA chart in detecting shifts in the mean and/or variance. The authors of [30,31] developed an enhanced single chart named Sum of Squares-DEWMA (SS-DEWMA) chart, which aims at detecting shifts of all sizes in the mean and/or variance. It has been shown that the SSDEWMA chart performed higher than the M-DEWMA chart in detecting shifts in the mean and/or variance, and both of them outperformed the classical EWMA [30,31]. Therefore the second proposed chart, which considers the sum of weighted squared values for EWMA charts and improves the classical SSDEWMA chart, will be developed. The proposed detection chart consists of developing a weighted version that makes tradeoffs between the two EWMA statistics. However, the weighted SS-DEWMA (WSS-DEWMA) chart has two tuning parameters, the weight α and smoothing parameter λ, which should be optimized. To do that, an enhanced WSS-DEWMA that optimizes the two parameters will be proposed. The second improved chart is the so-called optimized WSS-DEWMA (OWSS-DEWMA) chart. The proposed OWSS-DEWMA approach is addressed so that the state variables are estimated using PF method and OWSS-DEWMA is applied for fault detection. The detection performances of the proposed strategies are compared to those using the classical techniques in terms of missed detection rate (MDR) and false alarm rate (FAR). The monitoring capability is assessed using a synthetic example and two biological systems: wastewater treatment plants (WWTPs) and Cad System in E. coli (CSEC). When the simulated CSEC model is used, the developed chart is aimed to detect single and multiple faults through monitoring the key variables (cadaverine, transport proteins, enzymes, lysine, and regulatory proteins), whereas the performance of the WWTP is monitored through monitoring six state variables, which are the chemical oxygen demand, dissolve oxygen concentration, active heterotrophic biomass, ammonia concentration, nitrate concentration, and active autotrophic biomass.

Model-based approaches for fault detection

225

6.2 State estimation 6.2.1 State estimation problem formulation Here the objective is estimating the state vector xk , given the measurements vector yk and the dynamic model (6.1). Many techniques have been developed to solve this state estimation problem, which include the extended Kalman filter (EKF), unscented Kalman filter (UKF), particle filter (PF), and others. In this section, we propose to use the PF approach to estimate the nonlinear state variables. The PF has shown good enhancement and provides significant benefits over EKF and UKF techniques and can be used for estimation of nonlinear processes with non-Gaussian noises. A brief introduction of EKF, UKF, and PF techniques is presented in next section. Here the state estimation problem is formulated for a general system model. Let a nonlinear state space model be described as follows: x˙ = F (x, u, θ, w ), y = R(x, u, θ, v), w ∼ N (0, Q), v ∼ N (0, R),

(6.1)

where x ∈ Rn is a vector of the state variables, u ∈ Rp is a vector of the input variables, θ ∈ Rq is an unknown parameter vector, y ∈ Rm is a vector of the measured variables, w ∈ Rn is a white noise vector with covariance Q, v ∈ Rm is a measurement noise vector with covariance R, and F and L are nonlinear functions. Discretizing the state space model (6.1), the discrete model can be written as follows: xk = F(xk−1 , uk−1 , θk−1 , wk−1 ), yk = L(xk , uk , θk , vk ),

(6.2)

which describes the state variables at any time step k in terms of their values at the previous time step k − 1. The process and measurement noise vectors are assumed to have the following properties: E[wk ] = 0, E[wk wkT ] = Qk , E[vk ] = 0, and E[vk vkT ] = Rk , and F and L are nonlinear functions. Here the objective of this state estimation problem is estimating the state vector xk , given the measurement vector yk and the dynamic model (6.1). Many techniques have been developed to solve this state estimation problem, which include the extended Kalman filter (EKF), unscented Kalman filter (UKF), particle filter (PF), and others. In this section, we propose to

226

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

use the PF approach to estimate the nonlinear state variables. The PF has shown good enhancement and provides significant benefits over EKF and UKF techniques and can be used for estimation of nonlinear processes with non-Gaussian noises. A brief introduction to EKF, UKF, and PF techniques is presented in the next section.

6.2.2 State estimation techniques 6.2.2.1 Extended Kalman filter (EKF) As the name indicates, EKF is an extension of the Kalman filter (KF) [32], where the model is linearized to estimate the covariance matrix of the state vector. As in KF, the state vector xk is estimated by minimizing a weighted covariance matrix of the estimation error, i.e., E[(xk − xk )M(xk − T  xk ) ], where M is a symmetric nonnegative definite weighting matrix. If all the states are equally important, M can be taken as the identity matrix, xk )(xk − xk )T ]. Such a which reduces the covariance matrix to P = E[(xk − minimization problem can be solved by minimizing the following objective function:  1  J = Tr E[(xk − xk )(xk − xk )T ] 2

(6.3)

subject to the model defined in Eqs. (6.1). To minimize the above objective function (6.3), EKF estimates the state vector using a two-step algorithm: prediction and estimation (or update), which are described next (see Algorithm 1).

6.2.2.2 Unscented Kalman filter (UKF) The unscented Kalman filter (UKF) is an extension of the Kalman filter to deal with nonlinear systems. It utilizes the unscented transformation, which is a method for calculating statistics (such as the mean and covariance matrix) of a random variable that undergoes a nonlinear mapping. Assume that a random variable x ∈ RL with mean x¯ and covariance Px is transformed by a nonlinear function y = f (x). To find the statistics of y, define 2L + 1 sigma vectors as follows: X0 = x,



Xi = x + ( (L + λ)Px )i , i = 1, . . . , L , Xi

 = x − ( (L + λ)Px )i ,

i = L + 1, . . . , 2L ,

(6.4)

Model-based approaches for fault detection

227

Algorithm 1 Extended Kalman filter algorithm. Initialization step:  x0 = E[x],

Px0 = E[(x − x0 )(x − x0 )T ]. Prediction step:  xk|k−1

= F( xk−1|k−1 , uk−1 ),

 yk|k−1

= L( xk|k−1 , uk ).

Estimation (Update) step: Pk|k−1 = Ak−1 Pk−1|k−1 + Gk−1 QPTk−1 , Kk = Pk|k−1 CTk (Ck Pk|k−1 CTk + Hk RHTk )−1 , Pk|k = (I − Kk Ck )PTk|k−1 ,  xk|k

=  xk|k−1 + Kk ( yk|k−1 − yk ).

Return the augmented state estimation  xk . ∂F ∂F ∂L Here A ≈ ∂ x |x , C ≈ ∂ x |x , G ≈ ∂ w |x , and H ≈ linearized system model at every time step.

∂L |x ∂v 

are the matrices of the

where λ = e2 (L + k) − L is a scaling parameter, L is the dimension of the √ state z, ( (L + λ)Px )i denotes the ith column of the matrix square root (L + λ)Px , and the constant 10−4 < e < 1 determines the spread of the sigma points around x. The constant k is a secondary scaling parameter, which is usually set to zero or to 3 − L [33]. These sigma points are then propagated through the nonlinear function, that is, Yi = f (Xi ), i = 0, . . . , 2L ,

(6.5)

and the mean and covariance matrix of y can be approximated as the weighted sample mean and covariance of the transformed sigma points of Yi as follows: y ≈

2L  i=0

Wi(m) Yi ,

(6.6)

228

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

and

2L 

Px ≈

Wi(c) (Yi − y)(Yi − y)T ,

i=0

where the weights are given by Wi(m) = W0(c) = and,

λ , λ+r λ + (1 − e2 + ξ ), λ+r

Wi(m) = Wi(c) =

(6.7)

1 , i = 0, . . . , 2L . 2(λ + r )

The parameter ξ is utilized to incorporate prior knowledge about the distribution of x. It has been found that for Gaussian and non-Gaussian variables, the unscented transformation results in approximations that are precise up to the third and second orders, respectively [33]. The UKF algorithm includes two steps, prediction and update. In the prediction step the predicted state estimate xˆ −k and the predicted covariance matrix estimate Pk are calculated. Then in the update step the updated state estimate xˆ k and the updated covariance matrix estimate Pk (after calculating the innovation residual Pxk yk and the optimal Kalman gain Kk ) are calculated. The UKF algorithm does not always provide a satisfactory performance, especially for highly nonlinear and high-dimensional processes. In such cases, more powerful techniques, such as particle filters, are used. An introduction to particle filtering is presented next.

6.2.2.3 Particle filter (PF) The advantage of the PF is that it is not restricted by the linear and Gaussian assumptions of the system and data, which makes it applicable to a wide range of systems. The PF basic form is simple; however, it may be computationally expensive. Its objective is to compute the conditional probability (posterior distributions) of the state variables, given some noisy and possibly partial observations. Given process observations (measurements) and a dynamical system describing the evolution of the state variables to be estimated, the state estimation problem can be dealt with as an optimal filtering problem [34], within a Bayesian context, in which the posterior distribution p(xk |y0:k ) is recursively estimated. The PF method is used to approximate a posterior distribution, in which the dynamical model is defined as an observation

Model-based approaches for fault detection

229

model p(yk |xk ) and a state evolution model p(xk |x0:k−1 ) = p(xk |xk−1 ). In a Bayesian framework the state estimation phase can be performed by recursively estimating the filtering distribution function p(xk |y0:k ) and the predictive distribution function p(xk |y0:k−1 ) as 

p(xk |y0:k−1 ) =

p(xk |xk−1 )p(xk−1 |y0:k−1 )dxk−1 ,

Rn

(6.8)

p(yk |xk )p(xk |y0:k−1 ) , p(yk |y0:k−1 )

p(xk |y0:k ) = 

where p(yk |y0:k−1 ) = Rx p(yk |xk )p(xk |y0:k−1 )dxk . The nonlinear form of the model can lead to intractable integrals when calculating the state distribution function p(xk |xk−1 ). Thus Monte Carlo approximation is used to solve the filtering problem, where the joint posterior distribution function p(x0:k |y0:k ) is approximated by a set of particles (weighted samples) {x(0i:)k , (ki) }N i=0 as follows [35]: pˆ N (x0:k |y0:k ) =

N 

(ki) δx(i) (d x0:k )/ 0 :k

i=0

N 

(ki) ,

(6.9)

i=0

where (ki) are the importance weight of the kth sample, δx(i) (d x0:k ) is the 0 :k Dirac delta function of the corresponding sample, and N is the total number of samples. Using the same set of samples, the posterior distribution p(xk |y0:k ) of interest can be approximated as [36] pˆN (xk |y0:k ) =

N 

(ki) δx(i) (d xk )/

i=0

N 

k

(ki) .

(6.10)

i=0

To avoid the problem of degeneracy in the PF technique, resampling is used [36], and the state estimate  xk can be obtained using a Monte Carlo methodology as follows [35]:  xk =

N 

(ki) x(ki) ,

(6.11)

i=0

where (ki) is given by [36]: (ki) ∝

p(y0:k |x(0i:)k )p(x(0i:)k ) . p(x(0i:)k |y0:k )

(6.12)

230

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

The state and the measurement residuals are computed as Rsk = xk − xk and Rmk = yk −  yk , respectively.

6.3 Fault detection-based state estimation approaches 6.3.1 Fault detection using multiscale EWMA chart 6.3.1.1 EWMA chart The EWMA chart was established by Roberts in 1959 and named as Geometric Moving Average (GMA) chart [37]. Later, the GMA chart became popularly referred to as the EWMA chart [38]. Like CUSUM chart [22], EWMA chart is capable of detecting smaller fault shifts in the mean if compared to the Shewhart chart [9]. The EWMA chart is computed using the m sensor residuals evaluated from state estimation technique. In EWMA chart, we have m vectors Zj , j = 1, . . . , m. They are computed using [39] Zi,j = λXi,j + (1 − λ)Zi−1,j , i = 1, . . . , N and j = 1, . . . , m,

(6.13)

where N is the number of process variables, m is the number of sensor measurements, λ denotes smoothing parameter between 0 and 1, which changes the memory of the detection statistic, and X is the data matrix of N process variables and n measurements. The control limits for jth chart (UCLj is the upper control limit, whereas LCLj is the lower control limit) are calculated as [40]

UCLj = μ0 + L σj

LCLj = μ0 − L σj

λ [1 − (1 − λ)2i ], 2−λ λ

2−λ

[1 − (1 − λ)2i ],

(6.14)

(6.15)

where L represents the control width of the EWMA chart, and σj is the in-control standard deviation of Xj , the initial value Z0,j is set equal to the process in-control mean (or target value) μ0,j of the jth chart. Since X is mean centered, Z (Eq. (6.13)) is also centered, and thus the chart Zj , j = 1 . . . m, has mean equal and variance approximately equal, the control limits (UCL and LCL) are computed as UCL =

1 UCLj , m j

(6.16)

Model-based approaches for fault detection

LCL =

1 LCLj . m j

231

(6.17)

When the EWMA statistic Z is between the control limits (UCL and LCL), there is no fault, and if EWMA statistic Z exceeds control limits, then fault is declared in the system.

6.3.1.2 Multiscale EWMA chart Here, the multiobjective optimization is the problem of choosing a most preferred solutions (Lˆ and λˆ ) when three objective functions i) missed detection rate (MDR), ii) false alarm rate (FAR), and iii) Average Run Length (ARL1 ) are to be simultaneously minimized. A central difficulty in such problems is that, unlike in single-objective maximization problems, there is no obvious or simple way to define the concept of a most preferred solution. The multiobjective optimization problem is defined by a set of two decision variables x = (x1 = L , x2 = λ) and a set of three objective functions f = (f1 = MDR, f2 = FAR, f1 = ARL1 ). The objective functions are functions of the decision variables. The aim of optimization is to minimize y = f (x) = (f1 (x), f2 (x), f3 (x)). To compare two candidate solutions in the multiobjective optimization problems, there have been defined the concepts of Pareto dominance and Pareto optimal solutions. Based on the Pareto dominance we can also introduce the optimality criterion for the problem to be solved. In the current work, after selecting the Pareto optimal or nondominated solutions, we choose the solution (Lˆ and λˆ ) that tradeoffs FAR, MDR, and ARL1 . The optimized EWMA statistic is computed using the optimal values Lˆ and λˆ . The general flow-chart of the multiobjective optimization process is shown in Fig. 6.2. The optimized EWMA algorithm is presented in Algorithm 2. The EWMA chart assumes that the process residuals evaluated in the training phase follow a normal distribution. In practice, measured data may not necessarily follow a normal distribution. Modeling errors in the model residuals can also be a source of non-Gaussian errors. Multiscale decomposition is able to help address the issue of non-Gaussian errors as it provides detail signals that are closer to normal (Gaussian) at different scales. Thus wavelet-based multiscale representation of data will be applied to enhance the performance of EWMA.

232

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 6.2 General flow-chart of the multiobjective optimization process.

Algorithm 2 Optimized EWMA algorithm. 1. 2. 3. 4. 5. 6. 7.

Set L = 0.1, 0.2, . . . , 3 and λ = 0.05, 0.1, . . . , 1. Apply the EWMA chart for all L and λ. Compute MDR, FAR and ARL1 for all L and λ. Generate the Pareto front, Extract the nondominated solution. Compute the corresponding Lˆ and λˆ . Compute the optimized EWMA chart and its control limits LCL and UCL using Lˆ and λˆ .

The proposed multiscale EWMA monitoring technique is performed in two phases. In the first phase, fault free training data are normalized so that they have zero mean and unit variance and are then decomposed at multiple scales using wavelet-multiscale decomposition. Then the EWMA chart is used for the detail signals at different scales and to the last scaled signal. The control limits are calculated at all scales and used then to threshold the wavelet coefficients of detailed signals. If any wavelet coefficient violates the control limits at a certain scale, all the wavelet coefficients at that scale are retained (see Fig. 6.3). If no violation of the limits occur at a certain scale, then all wavelet coefficients at that scale are ignored. The retained detail signals and the last scaled signals are then reconstructed to get the final reconstructed signal. Finally, EWMA is applied on the reconstructed signal to obtain the final multiscale EWMA detection statistic and the control limits. In the second phase, the testing data are decomposed at multiple scales via the same wavelet filters applied in the first phase after normalizing the data using the same mean and standard deviation obtained in training. The con-

Model-based approaches for fault detection

233

Figure 6.3 Multiscale EWMA strategy.

trol limits obtained from the training phase are then applied to the detailed signals of the testing data at the respective scales and also to the last scaled signal. At any scale, the wavelet coefficients that violate the control limits are retained, whereas others that do not violate the limits are ignored. Then a reconstructed signal from all the retained coefficients is found. Finally, the previously obtained control limits from the reconstructed training data are applied on the EWMA statistic of the reconstructed testing data to detect possible faults (see Fig. 6.3). The multiscale EWMA algorithm is illustrated schematically in Fig. 6.3. The validation of the developed technique is done using a simulated COST wastewater treatment BSM1 model.

6.3.2 Application to wastewater treatment plant The idea behind the developed multiscale optimized EWMA algorithm is incorporating the advantages brought forward by PF state estimation technique with those of EWMA chart, multiobjective optimization, and multiscale representation. The algorithm of the proposed technique is presented in Algorithm 3 and Fig. 6.4. The objective of our contribution is to combine PF state estimation technique with the multiscale representation-based EWMA chart

234

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 6.4 PF-based MS-EWMA fault detection strategy.

to detect the faults in the residual vector Rm obtained from the PF using a simulated COST wastewater treatment BSM1 model. Algorithm 3 PF-based MS-EWMA fault detection algorithm. Input: yk • Training Data – Estimate the state variables using PF – Compute the measurement residual Rm – Compute the MS-EWMA chart Z and its control limits LCL and UCL • Testing Data – Estimate the state variables using PF – Compute the measurement residual Rm – Compute the MS-EWMA chart Z – If LCL < Z < UCL, then the process is operating under normal operating conditions. Else, a fault is declared.

6.3.2.1 State estimation results The validation of the developed state estimation-based fault detection technique is done using a well-defined simulated activated sludge wastewater treatment plant model. The used benchmark has been first developed by the International Association of Water Quality (IAWQ) and then modified by the European Co-Operation in the field of Scientific and Technical Research (COST) [41–43].

Model-based approaches for fault detection

235

To perform this illustrative example, wastewater treatment data are needed, which are generated using the simulated dynamic model shown in [44]. In this dynamic model, the influent data associated with the dray conditions are used (see the dry influent data files in [41], where the influent flow rate varies in the range 15000–35000 m3 /d. The performance of the plant is monitored through monitoring the six state variables, which are the chemical oxygen demand (XCOD ), dissolve oxygen concentration (SO ), active heterotrophic biomass (XB,H ), ammonia concentration (SNH ), nitrate concentration (SNO ), and active autotrophic biomass (XB,A ). Thus the data generated by the wastewater benchmark model contain [44]: • Inputs: 6 state variables: XCOD , SO , XB,H , SNH , SNO , and XB,A . • Outputs: 4 sensor measurements: XCOD , SO , SNH , and SNO . The dynamic model is used to generate 1400 observations for each variable, which were initially assumed to be noise-free. Then all variables were contaminated with zero-mean Gaussian noise having a signal-to-noise ratio of 20 to simulate real measurements. The noisy data set was then split into two subsets, training and testing, each consisting of 700 observations. To estimate the six state variables of the COST WWTP, the performances of UKF and PF methods are compared. The state vector X, measured output vector y, and input vector u used in this example are [44] X = [XDCO , SO , XB,H , SNH , SNO , XB,A ]T , y = [XDCO , SO , SNH , SNO ]T ,

(6.18)

u = [XDCO,in , Qa , Qin ]T , all six state variables (XDCO , SO , XSNH , XSNO , XBH , and XBA ) are estimated using the dynamic model and measurements (XDCO , SO , XSNH , and XSNO ) of only three variables, where Qa is the bioreactor air, and Qin refers to the influent flow rates; in addition, all parameters are assumed to be constant and known (given in [41]). The state variables are simulated by changing the input in the dry influent data, as described in the files [41], and using initial values of [20, 10, 20, 6, 650, 160]. Then all state variables were contaminated with zero-mean Gaussian noise (having a standard deviation of 5% of the standard deviation of each state variable) to simulate real measurements. The results of the state estimations using UKF and PF are indicated in Figs. 6.5 and 6.6, and the mean square errors (MSE) between the noise-free and the estimated state variables are summarized in Table 6.1. The MSE is defined as follows:

236

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 6.5 State estimation of the variables (A) XDCO , (B) SO and (C) XBH using UKF and PF.

Model-based approaches for fault detection

237

Figure 6.6 State estimation of the variables (A) SNH , (B) SNO and (C) XBA using UKF and PF.

238

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 6.1 Comparison of the MSE for the UKF and PF techniques. Technique XDCO SO XB,H SNH SNO XB,A UKF 0.7012 0.7144 0.7194 0.7463 0.7162 0.7067 PF 0.6541 0.6716 0.6677 0.7098 0.6739 0.6805

MSE = E



Xˆ − Xnf

2 ,

(6.19)

ˆ and E are the noise-free and estimated state variable vectors where Xnf , X, and expectation operator, respectively. The results presented in Figs. 6.5 and 6.6 show that the PF outperforms the UKF. These advantages of the PF technique will be used to improve monitoring of WWTP.

6.3.2.2 Fault detection results Each variable in the model has 1400 observations, which were initially assumed to be noise-free. Then all variables were contaminated with zeromean Gaussian noise having a signal-to-noise ratio of 20 to simulate real measurements. The data were then divided into two subsets, training and testing, each composed of 700 observations. The performance of the proposed detection chart MS-EWMA is tested using simulated data, which are generated using the simulated COST wastewater treatment BSM1 model, and is compared to Shewhart and EWMA charts. In this section, we study two types of faults, abrupt fault (i.e., offset or bias) and incipient or evolutive fault (i.e., drift). Here a bias fault of magnitude equal to 2 of the standard deviation of the second variable SO is added in the testing set between samples 500 and 700, Zj , j = 1, 2, 3, 4, present the four statistics computed using the sensor residuals obtained from the PF method. Figs. 6.7(A), 6.7(B), and 6.7(C) and Table 6.2 show the FD comparison between Shewhart, EWMA, and MS-EWMA methods. We can see from Figs. 6.7(A), 6.7(B), and 6.7(C) and Table 6.2 that the MS-EWMA method shows improved FD results over EWMA and Shewhart methods in terms of false alarm and missed detection rates because of the advantages of the multiscale application. Next, a drift fault with slope value 0.005 is injected in the second state SO at [500 to 700] sample times. The resulting figures show a good improvement of the MS-EWMA chart in terms of MDR and FAR values

Model-based approaches for fault detection

239

Figure 6.7 Monitoring a bias fault in SO using (A) Shewhart, (B) EWMA, and (C) MSEWMA methods.

(see Fig. 6.8(C)), when compared to EWMA chart (see Fig. 6.8(B)) and Shewhart method (see Fig. 6.8(A)) in terms of MD and FA rates. Table 6.3 presents the detection comparison between the five charts. We can also show from this table that the proposed MS-EWMA chart

240

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 6.2 Summary of MDR (%) and FAR (%) for bias fault. Chart/Fault Detection Metric MDRs (%) FARs (%) Shewhart 62 5.2 EWMA 10 10.20 MS-EWMA 2.5 2

outperforms the classical detection charts in terms of false alarm and missed detection rates. Here we investigate the effect of shift in the mean fault on the detection results of the proposed MS-EWMA chart where the mean shift s varies from 0.5 to 3 and consider different fault sizes for different values using a Monte Carlo simulation of 1000 runs. Table 6.4 shows that as s increases, the MDR, and FAR decreases, and the proposed technique still provides good detection results.

Effect of change rate parameter value on the detection performance The change rate of the parameter value is slow compared to the system dynamics. We can approximate the drift term Zi = a ∗ (i − i0 ), i ≥ i0 , where a is a constant parameter that defines a linear rate of change, and i0 is the time point at which the incipient fault first occurs. To study the effect of change value of parameter a on the detection performance of MS-EWMA method, the FAR and MDR are computed using a Monte Carlo simulation of 1000 runs where a varies from 0.001 to 0.01. Table 6.5 shows that the developed MS-EWMA technique has good detection results and provides small FAR and MDR values under different simulation conditions.

6.4 Fault detection-based state estimation approach 6.4.1 Fault detection using optimized weighted SS-DEWMA chart The SS-DEWMA single-variable chart was developed by Teh et al. [30,31]. It is a superior alternative to M-DEWMA chart for monitoring shifts in both process mean and variance simultaneously [30,31]. The SS-DEWMA chart is simple to build and understand. Suppose that the underlying process comprises of a sequence of individual observations for samples of size j = 1, 2, . . . , ni taken at time i = 1, 2, . . . and following a normal distribution such that Xij ∼ N (μ0 + aσ0 , b2 σ02 ), where μ0 and σ0 are respectively

Model-based approaches for fault detection

241

Figure 6.8 Monitoring a drift fault in SO using (A) Shewhart, (B) EWMA, and (C) MSEWMA methods.

the in-control mean and in-control standard deviation of the process. If a = 0 and b = 1, then the process is statistically in-control. If not, then is ¯ i and S2 represent the mean and varia shift or alternation. Assume that X i ance of sample i, respectively. Both statistics are independent of each other.

242

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 6.3 Summary of MDR (%) and FAR (%) for drift fault. Chart/Fault Detection Metric MDRs (%) FARs (%) Shewhart 10.75 6.2 EWMA 8.375 4.8 7.125 2.8 MS-EWMA Table 6.4 Summary of MDR (%) and FAR (%) for different values of s. s/Fault Detection MDRs (%) FARs (%) Metric 0.5 12.49 3.81 8.15 2.8 1 1.5 3.85 1.6

0 0 0

2 2.5 3

1 0.75 0

Table 6.5 Summary of MDR (%) and FAR (%) for different values of a. a/Fault Detection FARs (%) MDRs (%) Metric 0.001 47.26 9.02 0.002 25.87 10.22 26.37 12.83 0.003 11.94 9.22 0.004 19.90 6.81 0.005 9.45 3.01 0.006 0.007 10.45 3.01 0.008 9.95 4.01 5.47 4.61 0.009 6.97 3.21 0.01

¯ are independent normal random variables such that The sample means X 2 2 2 ¯ i ∼ N (μ + aσ, b σ ), whereas (ni − 1) 2Si 2 are independent chi-square ranX ni b σ dom variables with ni − 1 degrees of freedom (DoFs). In the structure of the SS-DEWMA chart, the following couple of independent statistics is defined:

Ui =

X¯ i − μ √ σ/ ni

(6.20)

Model-based approaches for fault detection

243

and Vi =

−1



(ni − 1)Si2 F ; ni − 1 , σ2

(6.21)

where −1 denotes the inverse of the standard normal distribution function, and F(w ; v) denotes the chi-square distribution function with v DoFs [45]. When the process is in-control, both Ui and Vi in Eqs. (6.20) and (6.21) are independent standard normal random statistics. The distributions of both Ui and Vi do not depend on the sample size ni . The two plotting statistics of EWMA chart are exponentially weighted combinations of the current and past observations, which are computed from Ui and Vi , each one for mean and variance, as follows: Yi = (1 − λ)Yi−1 + λUi for i = 1, 2, . . .

(6.22)

xi = (1 − λ)xi−1 + λVi for i = 1, 2, . . . ,

(6.23)

and

where λ represents smoothing constant such that 0 < λ ≤ 1. When the process is in-control, initial values of Yi and xi are usually set as Y0 = 0 and x0 = 0. Both Yi and xi are independent as Ui and Vi are independent. Using Eqs. (6.22) and (6.23), we can compute the below statistics: Wi = (1 − λ)Wi−1 + λYi , for i = 1, 2, . . .

(6.24)

Qi = (1 − λ)Qi−1 + λxi , for i = 1, 2, . . .

(6.25)

and

Similarly, if process is in-control, then both Wi and Qi are initially set to zero. Based on the two DEWMA statistics in Eqs. (6.24) and (6.25), Khoo et al. [29] defined the M-DEWMA statistic as Mi = max(|Wi |, |Qi |) for i = 1, 2, . . . .

(6.26)

Teh et al. [30] have proposed an improved univariate and single chart called Sum of Squares Double EWMA (SS-DEWMA) defined as Li = W2i + Q2i for i = 1, 2, . . . .

(6.27)

244

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

The SS-DEWMA chart has shown better detection performance when compared to the M-DEWMA chart [30] in detecting minor and moderate shifts in the mean and/or variance. Thus, in the current work, we will apply the SS-DEWMA chart for fault detection purposes. Note that Li consists of nonnegative values because of the approach in adopting the maximum absolute values of the two DEWMA statistics. To improve the detection abilities of SS-DEWMA, an optimized weighted SS-DEWMA (OWSS-DEWMA) is developed. Its chart is given by Oi = α W2i + (1 − α)Q2i for i = 1, 2, . . . ,

(6.28)

where α is the weighted parameter. The two parameters α and λ are to be jointly optimized using a multiobjective optimization scheme (presented in Algorithm 4). Algorithm 4 Optimized WSS-DEWMA algorithm. 1. Set α = 0, 0.2, . . . , 1 and λ = 0, 0.2, . . . , 1. 2. Compute the WSS-DEWMA chart and its control limit for all α and λ. 3. Compute MDR, FAR, and ARL1 for all α and λ. 4. Generate the Pareto front. 5. Extract the nondominated solutions. 6. Compute the corresponding αˆ and λˆ . 7. Compute the optimized WSS-DEWMA chart and its control limit UCL (UCLsd ) using αˆ and λˆ .

The OWSS-DEWMA chart requires solely the higher value UCL, which is defined as [30] 

UCLsd = E(Oi ) + Ksd V (Oi ),

(6.29)

where E(Oi ) and V (Oi ) are the mean and variance of Oi , respectively, given that the process is in-control, whereas Ksd is a constant controlling the width of UCLsd .

Model-based approaches for fault detection

245

A formula providing a quick computation of UCLsd for the initial state SS-DEWMA chart based on Ksd values is expressed as [30,31] UCLsd = 2(1 + Ksd ) × 

λ4

1 − (1 − λ)2

3 Kλ,i ,

(6.30)

where Kλ,i = 1 + (1 − λ)2 − (i2 + 2i + 1)(1 − λ)2i + (2i2 + 2i − 1)(1 − λ)2i+2 − i2 (1 − λ)2i+4 .

Algorithm 5 illustrates the main steps of the proposed OWSS-DEWMA fault detection chart. Algorithm 5 Proposed fault detection algorithm. Input: Data matrices X and Y. • Modeling Phase 1. Generate residuals using particle filter (PF). • Training Phase 1. Compute the OWSS-DEWMA chart. 2. Compute the OWSS-DEWMA control limit UCL (UCLsd ). • Testing Phase 6. Compute the new residuals for new sample time. 7. Compute the OWSS-DEWMA chart. 8. If the computed chart violate its threshold, then the process is considered out of control, and a fault is declared. 9. Else, there is no fault, and the process is operating under normal operating conditions.

6.4.2 Optimized WSS-DEWMA and application to fault detection The validation of the developed technique is done using two applications: the first application is a synthetic example, and the second one is a simulated Cad System in E. coli (CSEC) model.

6.4.2.1 Application 1: synthetic example The efficiency of the OWSS-DEWMA chart is first validated through a numerical example and compared to SS-DEWMA and EWMA charts.

246

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Consider the following simulation example with n = 1000 measurements. The monitored variable is described by the following relation: X = bσ + μ + aσ × randn(1, n).

(6.31)

As training data, samples are generated under normal conditions. To evaluate the fault detection performance, two types of faults are simulated. The first one is a mean shift or bias with magnitude of a of the standard deviation σ of the variable X, and the second one is a variance change with magnitude of bσ introduced in X from sample 100 to 300. In case where a = 0 and b = 1, there is no fault if the process has no fault. Figs. 6.9(A) to 6.9(C) show the time evolution of EWMA, SSDEWMA, and OWSS-DEWMA charts where a = 0.5 and b = 1.1. From these figures it is clear that the OWSS-DEWMA presents better detection performances when compared to the SS-DEWMA chart in terms of false alarm rate (FAR), and both of them outperform the classical EWMA chart. Table 6.6 gives a summary of FAR and missed detection rate (MDR) for the three approaches. Next, the validation of the developed state estimation-based fault detection technique is done using a simulated CSEC model.

6.4.2.2 Application 2: Cad System in E. coli (CSEC) Applying particle filter to estimate state variables

Here the state variables are estimated from noisy measurements using PF method. Its performances are evaluated and compared to the classical EKF and UKF methods using CSEC model. The CSEC entails cytoplasmic protein CadA and the transmembrane proteins CadB and CadC, where CadA is the decarboxylase that converts Lys (lysine) into Cadav (cadaverine) in a reaction that consumes the intracellular H + resultant in the consumption of a cytoplasmic proton, CadB is the protein that exports the Cadav and imports the Lys, and CadC is the positive regulator of cadBA that detects the external conditions, which also ensures that the enzyme CadA and the protein CadB are made only under normal conditions of lysine abundance and low pH [46,47]. The qualitative model of CSEC (simplified) is illustrated in Fig. 6.10. The qualitative dynamic model that shows the relationship between the variables enzyme CadA, transport protein CadBA, lysine Lys, and cadaverine Cadav is given

Model-based approaches for fault detection

247

Figure 6.9 Monitoring a fault in X using (A) EWMA, (B) SS-DEWMA, and (C) OWSSDEWMA charts.

248

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 6.6 MDR (%) and FAR (%) evaluation. Chart/Fault Detection Metric MDRs (%) EWMA 3.9801

SS-DEWMA OWSS-DEWMA

0 0

FARs (%) 13.3917 3.7509

0

Figure 6.10 Qualitative model of the CSEC (simplified).

by d[CadA] dt d[cadBA] dt d[Cadav] dt d[Lys] dt

= α1 [Cadav]g13 − β1 [CadA]h11 , = α2 [CadA]g21 − β2 [cadBA]h22 , = α3 [cadBA]g32 − β3 [Cadav]h33 [Lys]h34 ,

(6.32)

= α4 [CadA]g41 − β4 [Lys]h44 ,

where the model parameters are presented in Table 6.7. For a description of CSEC in more detail, see [47–49]. Such algorithms are already addressed by applying them to simulated time-series metabolic data as concentrations of four metabolites related to the branched pathway given in Fig. 6.10, that is, the enzyme CadAk , the

Model-based approaches for fault detection

249

Table 6.7 CSEC parameters. Parameter α1 α2 α3 α4

Value 12 8 3 2

Parameter β1 β2 β3 β4

Value 10 3 5 6

Parameter g13 g21 g32 g41

Value −0.8 0.5 0.75 0.5

Parameter h11 h22 h33 & h34 h41

Value 0.5 0.75 0.5& 0.2 0.8

Figure 6.11 Estimation of state variables using various state estimation techniques.

transport protein CadBk , the regulatory protein CadCk , and lysine Lysk . Hereafter we will assume that the state vector to be estimated is 

xk = CadAk CadBk CadCk Lysk

T

,

and all model parameters (α1 , α2 , α3 , α4 , β1 , β2 , β3 , β4 , g13 , g21 , g32 , g41 , h11 , h22 , h33 , h44 ) are considered to be known. The simulation outcomes for estimating the four states CadA, CadB, CadC, and Lys by EKF, UKF, and PF are exposed in Figs. 6.11(A, B, C). It obvious from Fig. 6.11 that EKF provides the lowest performance with respect to other approaches. This is due to the restricted capability of EKF to precisely estimate the mean and covariance matrix of the estimated states through linearization of the nonlinear process model. The figures depict as well that the PF delivers a substantial enhancement compared to the UKF, which is due to the fact that, by using UKF, linearizing the process

250

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 6.8 RMSE of estimated states using EKF, UKF, and PF methods. CadBk CadCk Lysk Technique CadA EKF 0.067 0.12 0.12 0.031 UKF 0.06 0.094 0.11 0.02 PF 0.00074 0.0014 0.0009 0.0013

model does not necessarily provide good estimates of the mean of the state vector and the covariance matrix of the estimation error utilized in state estimation. From Table 6.8 we see that PF method is sufficient in the estimation of CSEC state variables. Although EKF and UKF are the most widely used estimations in biological systems, the linearization and Jacobian matrix computation can lead to some limitations. The PF has shown good improvement and provides a significant advantage over EKF and UKF techniques and can be applied to nonlinear models with non-Gaussian errors. The comparison of the performance RMSE of EKF, UKF, and PF is illustrated in Table 6.8, which shows that PF is more accurate in estimation and provides a lower RMSE value when compared to EKF and UKF. Applying OWSS-DEWMA to detect faults

The time evolutions of the four variables CadA, CadBA, Lys, and Cadav can be obtained using the dynamic system (6.32). The generated data has four variables and 800 measurements, the training data contains 400 samples (used for state estimation), and the testing data has 400 samples (used to fault detection). The amount of fault in mean shift is aσ of the relevant value, and the amount of variance change is bσ of relevant value. When applying the multiscale representation, it is necessary to choose the best decomposition depth to get good detection with lower MDR and FAR. In the current work the best decomposition depth is equal to 3. In this section, we use the simplest wavelet transform, the so-called Haar wavelet transform. In this case, faults (mean shift of 1.5σ and variance change of 0.5σ ) at [200 to 400] sample times, respectively, are introduced in the cadaverine Cadav. The fault detection results are shown in Table 6.9 and Figs. 6.12(A) to 6.12(C). SS-DEWMA chart (see Fig. 6.12(B)) shows good improvement with respect to EWMA chart (Fig. 6.12(A)). We can also see that the developed OWSS-DEWMA chart delivers better results with respect to

Model-based approaches for fault detection

251

Figure 6.12 Monitoring a multiple faults in cadaverine Cadav using (A) EWMA, (B) SSDEWMA and (C) OWSS-DEWMA charts.

252

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Table 6.9 MDR (%) and FAR (%) evaluation. Chart/Fault Detection Metric MDRs (%) EWMA 5.4726

SS-DEWMA OWSS-DEWMA

0 0

Table 6.10 MDR (%) and FAR (%) evaluation. Chart/Fault Detection Metric MDRs (%) EWMA 52.2388 SS-DEWMA 0.21 OWSS-DEWMA 0.035

FARs (%) 18.5930 7.55 1.2

FARs (%) 23.5678 16.24 2.23

SS-DEWMA chart (Figs. 6.12(B) and 6.12(C)). The two previous charts provide a performance that is superior to EWMA (Fig. 6.12(A)) charts based on FAR and MDR indicators. At this point, one fault with 2σ of variance change at [200 to 400] is added to the cadaverine Cadav. Various methods of fault detection (EWMA, SS-DEWMA, and OWSS-DEWMA) are compared in Table 6.10 and Figs. 6.13(A)–6.13(C). We can easily conclude that SSDEWMA chart provides a better monitoring with respect to EWMA chart in terms of FAR and MDR indicators. Additionally, we can observe from these illustrated results that the OWSS-DEWMA (Fig. 6.13(C)) provides the best performance with respect to SS-DEWMA (Fig. 6.13(B)) and EWMA (Fig. 6.13(A)). Now we consider a mean shift of size 1σ and a variance change of size 2σ at [150 to 250] (in Cadav) and [300 to 400] (in Lys). The obtained monitoring results using the same four methods are illustrated in Table 6.11 and Figs. 6.14(A) to 6.14(C). The OWSS-DEWMA chart (Fig. 6.14(C)) shows better detection improvements than the SS-DEWMA chart. The SSDEWMA chart (Fig. 6.14(B)) provides a better performance with respect to the classical EWMA chart (Fig. 6.14(A)). More details on the results obtained are presented in [11–21,50,51]. State estimation results using Bayesian approaches are obtained in [52–56].

6.5 Conclusions In this chapter, we proposed an enhanced fault detection-based EWMA techniques to improve monitoring of biological processes. To do that, novel

Model-based approaches for fault detection

253

Figure 6.13 Monitoring a fault in Cadav using (A) EWMA, (B) SS-DEWMA and (C) OWSSDEWMA charts.

254

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

Figure 6.14 Monitoring a multiple faults in Cadav and Lys using (A) EWMA, (B) SSDEWMA and (C) OWSS-DEWMA charts.

Model-based approaches for fault detection

255

Table 6.11 MDR (%) and FAR (%) evaluation. Chart/Fault Detection Metric MDRs (%) FARs (%) EWMA 1.4950 20.2020 SS-DEWMA 0.21 13.1 0 2.34 OWSS-DEWMA

statistical strategies combining the benefits of EWMA chart and state estimation technique were developed. To deal with scenarios where the process model is available and has a predefined structure, the monitored residuals were computed using the state estimation technique. Then the improved fault detection-based EWMA charts were applied for fault detection. In the first part, we proposed an enhanced fault detection technique to improve monitoring of biological Cad System in E. coli (CSEC) process. The developed statistical strategy, called the optimized weighted sum of squares double EWMA (OWSS-DEWMA) statistic, was developed. In the second part a multiscale EWMA chart was applied to monitor wastewater treatment plant (WWTP) process and detect different types of faults including bias fault and drift fault. The detection performances of the proposed strategies are compared to those using the classical techniques in terms of missed detection rate (MDR) and false alarm rate (FAR). The monitoring capability is assessed using a synthetic example and two biological applications, biological WWTP and CSEC processes. When the simulated CSEC model is used, the developed chart is aimed to detect single and multiple faults through monitoring the key variables (cadaverine, transport proteins, enzymes, lysine, and regulatory proteins), whereas the performance of the WWTP is monitored through monitoring the state variables, which are the chemical oxygen demand, dissolve oxygen concentration, active heterotrophic biomass, ammonia concentration, nitrate concentration, and active autotrophic biomass.

References [1] D. Simon, Optimal State Estimation: Kalman, H∞ , and Nonlinear Approaches, John Wiley and Sons, 2006. [2] Y. Kim, S. Sul, M. Park, Speed sensorless vector control of induction motor using extended Kalman filter, IEEE Transactions on Industrial Applications 30 (5) (1994) 1225–1233.

256

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

[3] E. Wan, R.V.D. Merwe, The unscented Kalman filter for nonlinear estimation, in: Adaptive Systems for Signal Processing, Communications, and Control Symposium, 2000, pp. 153–158. [4] S. Sarkka, On unscented Kalman filtering for state estimation of continuous-time nonlinear systems, IEEE Transactions Automatic Control 52 (9) (2007) 1631–1641. [5] J. Zhu, N. Zheng, Z. Yuan, Q. Zhang, X. Zhang, Y. He, A slam algorithm based on the central difference Kalman filter, in: 2009 IEEE Intelligent Vehicles Symposium, IEEE, 2009, pp. 123–128. [6] R.V.D. Merwe, E. Wan, The square-root unscented Kalman filter for state and parameter-estimation, in: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 6, 2001, pp. 3461–3464. [7] M. NøRgaard, N.K. Poulsen, O. Ravn, New developments in state estimation for nonlinear systems, Automatica 36 (11) (2000) 1627–1638. [8] F. Gustafsson, F. Gunnarsson, N. Bergman, U. Forssell, J. Jansson, R. Karlsson, P. Nordlund, Particle filters for positioning, navigation, and tracking, IEEE Transactions on Signal Processing 50 (2) (2002) 425–437. [9] M. Hart, R. Hart, Shewhart control charts for individuals with time-ordered data, in: Frontiers in Statistical Quality Control, vol. 4, 1992, p. 123. [10] G.J. Ross, N.M. Adams, D.K. Tasoulis, D.J. Hand, Exponentially weighted moving average charts for detecting concept drift, Pattern Recognition Letters 33 (2012) 191. [11] M. Mansouri, R. Fazai, K. Abodayeh, V. Puig, M-I. Noori, H. Nounou, M. Nounou, Multiscale Gaussian process regression-based generalized likelihood ratio test for fault detection in water distribution networks, Engineering Applications of Artificial Intelligence 85 (2019) 474–491. [12] R. Fazai, K. Abodayeh, M. Mansouri, M. Trabelsi, H. Nounou, M. Nounou, Machine learning-based statistical testing hypothesis for fault detection in photovoltaic systems, Solar Energy 190 (2019) 405–413. [13] M. Chaabane, K. Abodayeh, M. Mansouri, A. Ben Hamida, H. Nounou, M. Nounou, Effective fault detection in structural health monitoring systems, Advances in Mechanical Engineering 11 (2019). [14] R. Fazai, M. Mansouri, M. Trabelsi, H. Nounou, M. Nounou, Online reduced kernel GLRT technique for improved fault detection in photovoltaic systems, Energy 179 (2019) 1133–1154. [15] M-F. Harkat, M. Mansouri, M. Nounou, H. Nounou, Fault detection of uncertain chemical processes using interval partial least squares-based generalized likelihood ratio test, Information Sciences 190 (2019) 265–284. [16] M-F. Harkat, M. Mansouri, M. Nounou, H. Nounou, Fault detection of uncertain nonlinear process using interval-valued data-driven approach, Chemical Engineering Science 205 (2019) 36–45. [17] R. Fazai, M. Mansouri, K. Abodayeh, H. Nounou, M. Nounou, Online reduced kernel PLS combined with GLRT for fault detection in chemical systems, Process Safety and Environmental Protection 179 (2019) 1133–1154. [18] I. Baklouti, M. Mansouri, A. Ben Hamida, H. Nounou, M. Nounou, Enhanced operation of wastewater treatment plant using state estimation-based fault detection strategies, International Journal of Control 11 (2019) 1–12. [19] M. Mansouri, R. Baklouti, M-F. Harkat, H. Nounou, M. Nounou, Kernel generalized likelihood ratio test for fault detection of biological systems, IEEE Transactions on NanoBioscience 17 (2018) 498–506.

Model-based approaches for fault detection

257

[20] O. Amri, M. Mansouri, A. Ben Hamida, H. Nounou, M. Nounou, Improved model based fault detection technique and application to humanoid robots, Mechatronics 53 (2019) 140–151. [21] M. Mansouri, A. Al-khazraji, M. Hajji, M.F. Harkat, H. Nounou, M. Nounou, Wavelet optimized EWMA for fault detection and application to photovoltaic systems, Solar Energy 167 (2018) 125–136. [22] E.S. Page, Continuous inspection schemes, Biometrika (1954) 100–115. [23] C. Botre, M. Mansouri, M. Nounou, H. Nounou, M.N. Karim, Kernel PLS-based GLRT method for fault detection of chemical processes, Journal of Loss Prevention in the Process Industries 43 (2016) 212–224. [24] M. Mansouri, M. Nounou, H. Nounou, K. Nazmul, Kernel PCA-based GLRT for nonlinear fault detection of chemical processes, Journal of Loss Prevention in the Process Industries 26 (1) (2016) 129–139. [25] D.C. Montgomery, R. Gerth, Introduction to statistical quality control, IIE Transactions 30 (6) (1998) 571. [26] P. Castagliola, G. Celano, S. Fichera, Monitoring process variability using EWMA, in: Springer Handbook of Engineering Statistics, Springer, 2006, pp. 291–325. [27] P.E. Maravelakis, P. Castagliola, An EWMA chart for monitoring the process standard deviation when parameters are estimated, Computational Statistics & Data Analysis 53 (7) (2009) 2653–2664. [28] L. Shu, W. Jiang, S. Wu, A one-sided EWMA control chart for monitoring process means, Communications in Statistics, Simulation and Computation 36 (4) (2007) 901–920. [29] M.B. Khoo, S. Teh, Z. Wu, Monitoring process mean and variability with one double EWMA chart, Communications in Statistics Theory and Methods 39 (20) (2010) 3678–3694. [30] S.Y. Teh, M.B. Khoo, Z. Wu, A sum of squares double exponentially weighted moving average chart, Computers & Industrial Engineering 61 (4) (2011) 1173–1188. [31] T.S. Yin, M.B. Khoo, L.C. Kit, Comparing the performances of the optimal SSDEWMA and Max-DEWMA control charts, Journal of Statistical Modeling and Analytics 1 (2) (2010) 1–9. [32] J. Lee, N. Ricker, Extended Kalman filter based nonlinear model predictive control, Industrial & Engineering Chemistry Research 33 (6) (1994) 1530–1541. [33] S. Julier, J. Uhlmann, New extension of the Kalman filter to nonlinear systems, Proceedings of SPIE 3 (1) (1997) 182–193. [34] B. Andrews, T. Yi, P. Iglesias, Optimal noise filtering in the chemotactic response of escherichia coli, PLoS Computational Biology 2 (11) (2006) e154. [35] M. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking, IEEE Transactions on Signal Processing 50 (2) (2002) 174–188. [36] A. Doucet, A. Johansen, A tutorial on particle filtering and smoothing: fifteen years later, in: Handbook of Nonlinear Filtering, 2009, pp. 656–704. [37] S.W. Roberts, Control chart tests based on geometric moving averages, Technometrics 1 (3) (1959) 239–250. [38] S.V. Crowder, M.D. Hamilton, An EWMA for monitoring a process standard deviation, Journal of Quality Technology 24 (1) (1992) 12–21. [39] J.S. Hunter, The exponentially weighted moving average, Journal of Quality Technology 18 (4) (1986) 203–210.

258

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

[40] D.C. Montgomery, Introduction to Statistical Quality Control, John Wiley & Sons, New York, 2005. [41] U. Européenne, Direction générale de la recherche, The COST Simulation Benchmark: Description and Simulator Manual, Directorate-General for Research, 2002. [42] C. Rosén, J. Lennox, Multivariate and multiscale monitoring of wastewater treatment operation, Water Research 35 (14) (2001) 3402–3410. [43] M. Fuente, D. Garcia-Alvarez, G. Sainz-Palmero, P. Vega, Fault detection in a wastewater treatment plant based on neural networks and PCA, in: 2012 20th Mediterranean Conference on Control & Automation (MED), IEEE, 2012, pp. 758–763. [44] J. Alex, J. Beteau, J. Copp, C. Hellinga, U. Jeppsson, S. Marsili-Libelli, M. Pons, H. Spanjers, H. Vanhooren, Benchmark for evaluating control strategies in wastewater treatment plants, in: European Control Conference, vol. 99, Karlsruhe, Germany, 1999. [45] C.P. Quesenberry, On properties of q charts for variables, Journal of Quality Technology 27 (3) (1995) 184–203. [46] E.O. Voit, J. Almeida, Decoupling dynamical systems for pathway identification from metabolic profiles, Bioinformatics 20 (11) (2004) 1670–1681. [47] O.R. Gonzalez, et al., Parameter estimation using simulated annealing for s-system models of biochemical networks, Bioinformatics 23 (4) (2007) 480–486. [48] M. Mansouri, H. Nounou, M. Nounou, A.A. Datta, Modeling of nonlinear biological phenomena modeled by s-systems using Bayesian method, in: 2012 IEEE EMBS Conference on Biomedical Engineering and Sciences (IECBES), IEEE, 2012, pp. 305–310. [49] N. Meskin, H. Nounou, M. Nounou, A. Datta, Parameter estimation of biological phenomena: an unscented Kalman filter approach, IEEE/ACM Transactions on Computational Biology and Bioinformatics 10 (2) (2013) 537–543. [50] I. Baklouti, M. Mansouri, A.B. Hamida, H. Nounou, M. Nounou, Monitoring of wastewater treatment plants using improved univariate statistical technique, Process Safety and Environmental Protection 116 (2018) 287–300. [51] M. Mansouri, M.-F. Harkat, M. Nounou, H. Nounou, Midpoint-radii principal component analysis-based EWMA and application to air quality monitoring network, Chemometrics and Intelligent Laboratory Systems (2018). [52] M.M. Mansouri, H.N. Nounou, M.N. Nounou, A.A. Datta, Modeling of nonlinear biological phenomena modeled by s-systems, Mathematical Biosciences 249 (2014) 75–91. [53] M.M. Mansouri, H.N. Nounou, M.N. Nounou, A.A. Datta, State and parameter estimation for nonlinear biological phenomena modeled by s-systems, Digital Signal Processing 28 (2014) 1–17. [54] M. Mansouri, M.-F. Destain, Predicting biomass and grain protein content using Bayesian methods, Stochastic Environmental Research and Risk Assessment 29 (4) (2015) 1167–1177. [55] M. Mansouri, M.-F. Destain, An improved particle filtering for time-varying nonlinear prediction of biomass and grain protein content, Computers and Electronics in Agriculture 114 (2015) 145–153. [56] M. Mansouri, O. Avci, H. Nounou, M. Nounou, Iterated square root unscented Kalman filter for nonlinear states and parameters estimation: three DOF damped system, Journal of Civil Structural Health Monitoring 5 (4) (2015) 493–508.

CHAPTER 7

Conclusions and perspectives Contents 7.1. Conclusions 7.2. Perspectives and research proposals 7.2.1 Project 1: water distribution networks: modeling, sensor placement, leak and quality monitoring 7.2.2 Project 2: enhanced operation of wastewater treatment plants 7.2.3 Project 3: enhanced monitoring of photovoltaic systems 7.2.4 Project 4: enhanced data validation of an air quality monitoring networks

259 260 260 266 270 273

7.1 Conclusions The following broad conclusions are reached from the analysis of the simulation and experimental results: • Developed latent variable-based hypothesis testing fault detection techniques, when the process model is not available, that can enhance monitoring processes represented by linear or nonlinear input-space models (such as PCA) or input–output models (such as PLS). These techniques widened the applicability of hypothesis testing-based fault detection in practice. Also, kernel PCA (kPCA) and kernel PLS (kPLS) used to deal with nonlinearities in the process data. • Developed multiscale latent variable-based hypothesis testing fault detection techniques using multiscale representation to help deal with uncertainty in the data and minimize its effect on fault detection. • Developed interval PCA (IPCA) and interval PLS (IPLS) fault detection methods to account for uncertainty in the data. The advantages of IPCA and IPLS were combined with those of hypothesis testing by developing interval PCA- and PLS-based hypothesis testing fault detection methods, which allowed incorporating prior knowledge about the data variability into the monitoring problem to further enhance the quality of fault detection. • Developed model-based detection techniques that can improve monitoring processes using state estimation-based fault detection approaches. Once the process model is available, the state variables are estimated using particle filter (PF). Enhanced univariate charts based on exponenData-Driven and Model-Based Methods for Fault Detection and Diagnosis https://doi.org/10.1016/B978-0-12-819164-4.00016-9

Copyright © 2020 Elsevier Inc. All rights reserved.

259

260

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

tially weighted moving average (EWMA) are applied to the monitored residuals obtained from PF for fault detection purposes. • Demonstrated the effectiveness of the proposed strategies by conducting simulation and experimental studies on the following problems: 1. Simulated examples: – Synthetic data – Continuously stirred tank reactor – Tennessee Eastman process – Wastewater treatment plant – Cad System in E. Coli system – Simulated distillation column – Simulated photovoltaic systems 2. Real air quality monitoring network data 3. Real photovoltaic data

7.2 Perspectives and research proposals 7.2.1 Project 1: water distribution networks: modeling, sensor placement, leak and quality monitoring Proposal summary Environmental, safety, and health issues have gained great importance worldwide. These issues are closely related to the availability and quality of water and can be used in several industrial and domestic applications. Water is a unique commodity because nothing else can substitute for it, especially in areas exposed to drought weather conditions. Also, in most water distribution systems (WDS), it is estimated that about 10% to 30% of the water is lost in transportation from treatment plants to consumers. Therefore proper operation of these systems is crucial to maintain the desired efficiency of water distribution. Thus this project aims at enhancing the operation of WDS by developing innovative hydraulic and water quality modeling, leak and contaminant monitoring, and sensor placement techniques that are capable of improving the performance of these systems. Specifically, four main parts (Part A, Part B, Part C, and Part D) will be addressed in this project. The objective of Part A is to enhance hydraulic and water quality modeling in WDS by developing various multivariate, nonlinear, uncertain, and multiscale modeling frameworks. The hydraulic model describes the behaviors of water pressure, flow, and consumers demand, whereas the water quality modeling describes the behavior of the contaminant concentrations (i.e., chlorine, pH, turbidity). For hydraulic modeling, the empirical data-based

Conclusions and perspectives

261

modeling techniques will be developed. Latent variable regression (LVR) are well-known empirical data-based modeling techniques. LVR modeling methods are multivariate techniques that aim to reduce data dimensionality and rely on the definition of a linear transformation of the data through an orthonormal matrix calculated on the basis of the dataset itself. However, most practical systems are nonlinear, multivariate, and uncertain. To make an extension to nonlinear systems, kernel LVR (KLVR) models will be used. KLVR methods are widely used nonlinear models that can handle nonlinear components. The multiscale nature of the WDS data provides a representation that can be made robust to noises and errors and have a great impact on the quality of monitoring. Hence we propose to combine modeling frameworks with wavelet-based multiscale representation of WDS data to improve the hydraulic modeling effectiveness. Therefore, to enhance the hydraulic modeling, LVR, KLVR, and multiscale KLVR will be applied to improve the modeling performance. It is known that water quality systems are dynamic. Therefore dynamic LVR and KLVR modeling methods will be developed to take into consideration the dynamic nature of the network. To deal with scenarios where the process model is available and has a predefined structure, the state variables are estimated using state estimation techniques. The estimation techniques include the extended Kalman filter (EKF), unscented Kalman filter (UKF), and particle filtering (PF). The PF has shown good improvement and provides a significant advantage over EKF and UKF techniques and can be applied to nonlinear models with non-Gaussian errors. However, it ignores the information given by the measurements when the sampling phase is performed. Thus an improved PF will be developed in this project to incorporate information from recent process measurements, which will help improve the use of the classical PF. Moreover, the collected data from WDS are often affected by noises and measurement uncertainties. These uncertainties have a negative impact on the established models and thus on the monitoring performance. For more precision in representing the real data, this uncertainty can be treated by considering an interval-valued representation instead of a single-valued representation. In this case the determination of model requires new approaches adapted for the interval-valued data. Therefore we extend the developed hydraulic and water quality modeling methods to deal with interval-valued data. In Part B, we propose an enhanced data-based and model-based leak and contaminant monitoring techniques in WDS. The first objective of Part B is developing a new technique for leak or contaminant detection in WDS. The developed technique exploits the benefits

262

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

of the exponentially weighted filter with the generalized likelihood ratio test and multiobjective optimization to detect the leak or the contaminant source in the network. The idea behind the developed chart is to compute a new chart that takes into consideration the current and the previous data information by giving more weight to the more recent data. Once it has been determined that there are leaks or contaminants in the network, they must be identified. For example, the developed chart combined with the sensitivity analysis will be used to identify the leak location. To identify the contamination source, an enhanced technique that integrates the advantages of the developed chart with genetic algorithm will be developed. The objective of Part C is optimizing the sensor placement in WDS. To do that, we develop a new sensor location technique that combines the advantages of the chart and multiobjective optimization. The optimal sensor location is selected so that the detection rate, detection redundancy, and isolation rate are maximized and the detection speed is minimized for leak and contaminant monitoring. In Part D a software for modeling, monitoring, and sensor placement will be developed. The software will be available at the end of this project for use by researchers interested in continuing work in this direction and by practitioners who would like to use it to improve the performance of WDS. The proposed frameworks will be evaluated through three applications: the first one using EPANET simulator, the second one using a real data from ASTAD water plants at Qatar Foundation, and the third one using a benchmark WDS available at TAMU-Qatar.

Technical objectives Water distribution systems are essential for providing quality water for domestic and industrial applications. Proper operation of water distribution systems requires good understanding of their behavior and tight monitoring of their key variables to achieve the desired effectiveness of operation (to produce the sought water quality) and to ensure maintaining the desired safety standards and protocols. Therefore the main objective of this project is to propose a general framework for enhancing the operation of water distribution systems by developing: • Part A: Enhanced hydraulic and quality modeling techniques in water distribution systems: 1. by developing various multivariate, nonlinear, interval-valued, and multiscale hydraulic modeling frameworks.

Conclusions and perspectives

263

2. by developing various multivariate, nonlinear, interval-valued, and multiscale water quality modeling techniques. • Part B: Enhanced leak and contaminant monitoring techniques in water distribution systems as follows: 1. Develop a new leak detection and isolation techniques using optimized exponentially weighted-GLRT-based sensitivity analysis. 2. Develop a new contamination event detection and isolation techniques using optimized exponentially weighted-GLRT-based genetic algorithms. • Part C: Optimal sensor placement techniques in water distribution systems. • Part D: Software implementation for modeling, monitoring and sensor placement in water distribution systems. To achieve the first objective of Part A, which is enhancing the hydraulic models in water distribution networks, multivariate, nonlinear, and multiscale methods will be developed. Several modeling techniques that can accurately predict the behavior of water distribution network will be developed. These techniques include the latent variable regression (LVR), such as principal component analysis (PCA) and partial least squares (PLS). Both PLS and PCA methods assume that the data lay on a linear subspace; however, most practical systems are nonlinear, multivariate, and uncertain. To make the extension to nonlinear systems, kernel LVR (KLVR) models including kernel PCA (KPCA) and kernel PLS (KPLS) will be used. The multiscale nature of the data collected from water distribution networks have a great impact on the quality of monitoring. Thus we propose to merge the developed hydraulic models with wavelet-based multiscale representation to enhance the modeling performances. For example, in this project, multiscale LVR and multiscale KLVR methods will be developed to enhance the hydraulic models. The second objective of Part A is to propose an enhanced water quality modeling techniques. To do that, the developed LVR, KLVR, multiscale LVR, and multiscale KLVR methods can be also applied to water quality modeling. Since the contaminant variables have dynamic behavior and the techniques presented above are only suitable for use under static or weakly dynamic conditions, dynamic LVR, KLVR, multiscale LVR, and multiscale KLVR methods will be developed. To deal with scenarios where the process model is available, state estimation techniques will be used. Particle filtering (PF) is a nonlinear and non-Gaussian state estimation technique, which has been successfully applied in various applications and has shown

264

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

better performances when compared to the linearization-based techniques, such as extended Kalman filter (EKF) and unscented Kalman filter (UKF). However, the classical PF estimation technique uses the prior distribution when performing the sampling phase, which ignores the information given by the likelihood function and the information from recent process measurements affecting the estimation performances. Therefore, in this project, an improved PF will be developed to address this issue and improve the estimation convergence and accuracy. The proposed IPF will be based on Kullback–Leibler divergence (KLD) to generate a better importance sampling distribution and thus will be referred to as KLD-particle filter. However, collected data are generally affected by uncertainties in the system parameters, measurement errors, sensor inaccuracies, and computation errors. These types of data can be studied by considering interval-valued data techniques. These challenges can affect the modeling abilities in water distribution systems. Therefore the developed hydraulic and water quality modeling methods based on single-valued data will be extended to interval-valued data. Thus interval KLVR, interval multiscale KLVR, interval dynamic KLVR, and interval dynamic multiscale KLVR modeling methods will be developed to account for uncertainties in water distribution systems. In Part B, to achieve the first task, which is enhancing leak monitoring performances, a new improved chart that combines the advantages of exponentially weighted (EW) filter with those of the GLRT chart will be developed for leak detection. However, EW filter usually has a main parameter λ that must be specified by the user to improve detection for a specific change size. Thus an optimized EW (OEW) based on the best selection of smoothing parameter will be proposed. A multiobjective optimization (MOO) is applied to compute the optimal value λˆ . The MOO is addressed using three objective functions: i) missed detection rate (MDR), ii) false alarm rate (FAR), and iii) average run length (ARL1 ) values. The developed chart (called OEW-GLRT) will provide quick and good detection rates. The idea behind the developed OEW-GLRT is computing a new chart that takes into account the current and previous data information in a decreasing exponential fashion giving more weight to the more recent data. Therefore, in this task, single- and interval-valued model-based OEWGLRT technique will be developed to detect the leak in water distribution networks. Once it has been determined that there is a leak in the network, it should be isolated. Thus we develop new techniques that aim at identifying the leak location using the developed OEW-GLRT-based sensitivity matrix analysis. The sensitivity analysis matrix is determined from pressure residuals

Conclusions and perspectives

265

generated considering all possible leak locations. The residuals are obtained from the comparison of the observed pressures and the estimated ones given by the model. The most similar behavior between the actual residuals and the leak sensitivity matrix, which contains the effect of each possible leak, determines the most probable leak location. Therefore, to identify the location of the leak, angle, correlation, Euclidean distance, and least-square optimization metrics will be used. The second task in Part B focuses on water quality monitoring. The contaminant detection and isolation problem is often used for water quality monitoring. Thus the developed single- and interval-valued model-based OEW-GLRT chart will be used to enhance the contaminant detection abilities. To localize the contaminant source, a multiobjective Pareto optimization-based pollution matrix will be used. The pollution matrix, which presents the impact of different contaminants on the nodes, is computed using the measured contaminant concentrations and their estimations obtained from the developed water quality models. The objective of Part C is optimizing the sensor placement for leak and contaminant monitoring in water distribution systems. In fact, the optimal positions of sensors must be well selected to enhance monitoring performances. Thus, in this project, we develop new optimization techniques that exploit the advantages of the developed modeling/monitoring techniques and genetic algorithms to enhance the performances of water distribution systems. The developed sensor placement algorithms will help make improvements in monitoring water distribution systems. To select the optimal sensor placement, we build a multiobjective optimization scheme containing four objective functions (detection rate (DR), detection speed (DS), detection redundancy (DRE), and isolation rate (IR)). This scheme gives a tradeoff between the four metrics. Thus the optimal sensor location is selected so that DR is maximized, DS is minimized, DRE is minimized, and IR is maximized. To deal with modeling uncertainties, parameter uncertainty of pipe roughness coefficients, and uncertainty in water demands, the developed sensor placement methods based on single-valued data will be extended to interval-valued data. The last part of this proposal is to implement a software for modeling, monitoring, and sensor placement in water distribution systems. Since modeling, monitoring, and sensor placement are inherent parts of water distribution systems, software implementation of more advanced techniques helps to enhance the operation of these systems more efficiently. It will be shown that the developed software is capable of producing compact

266

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

modeling, monitoring, and sensor placement frameworks that are especially tailored for WDS operation purposes. The developed modeling, monitoring, and sensor placement frameworks will be evaluated using EPANET simulator and real data from ASTAD water plants at Qatar Foundation. Also, the validation of the developed techniques will be done using benchmark water distribution system available at Texas A&M University at Qatar, under the operation of Prof. Ahmed Abdel-Wahab. The proposed project falls under Technological Readiness Levels of TRL 4, “Lab Testing/Validation of Alpha Prototype Component/Process”. A framework for enhancing the hydraulic and water quality modeling, leak and contaminant monitoring, and sensor placement will be developed in this project. The developed software will be available at the end of this project for use by researchers interested in continuing work in this direction and by practitioners who would like to use them to improve the performance of water distribution systems. This application will be facilitated through our collaborators and consultants from Technical University of Catalonia (UPC) and Massachusetts Institute of Technology (MIT).

7.2.2 Project 2: enhanced operation of wastewater treatment plants Proposal summary Environmental, health, and safety concerns are of major importance worldwide. These concerns are closely tied to the availability and quality of water that can be used in various domestic and industrial applications. Water is a precious commodity, especially in regions that are exposed to drought or severe weather conditions, such as Qatar. Also, the high demand for water due to the rapidly increasing rate in world population and the global economic development is another key factor that greatly affects the availability of water. The water resources in Qatar continue to decrease and are becoming insufficient to meet its civil and industrial needs. Due to the low levels of precipitation (annual average of 82 mm) and high evaporation rate (average of 2,200 mm per annum), Qatar relies on groundwater sources (which are not abundant), desalinated seawater (which is expensive), and treated water. Thus the reuse of treated wastewater is becoming an absolute necessity, not only to preserve the environment, but also to avoid being completely dependent on the limited groundwater resources or desalinated

Conclusions and perspectives

267

seawater. Wastewater treatment offers significant water savings while providing great financial benefits as it is much less expensive than desalinated seawater. Wastewater treatment is used by various governmental and industrial entities in Qatar, such as Ashghal and Qatar Shell, which operate wastewater treatment plants that utilize different technologies, such as biological treatment or reverse osmosis. Proper operation of these wastewater treatment plants is crucial to maintain the sought effectiveness and desirable water quality. Therefore the main objective of this proposal is to develop a general framework for modeling, monitoring, and control techniques that aim at enhancing the operation of wastewater treatment plants. Specifically, the following objectives will be sought. First, different modeling techniques that can accurately predict the behavior of wastewater treatment plants will be developed. Some of these modeling techniques will be based on a predefined model structure, which is derived using material and energy balances, for which the model parameters are estimated from measurements of the plant variables using state estimation techniques, such as particle filtering (PF). In fact, an improved PF method will be developed to better handle the nonlinear and high-dimensional state estimation problem involved in modeling wastewater treatment plants. When such a model structure is not available, another empirical modeling strategy will be developed, where the model will be estimated entirely from wastewater plant data. Some of the well-known empirical modeling techniques include the latent variable regression (LVR) models. In this proposal, we propose to develop dynamic and multiscale LVR models that account for the dynamic nature and the uncertainty of measurements obtained from wastewater treatment plants. Even though the developed modeling techniques can be applied on any type of wastewater treatment plant, in this project, they will be validated using a biological treatment process that has been widely used in the water quality research community as a benchmark process model. The second objective of this project is to develop effective monitoring techniques that can ensure normal and safe operation of wastewater treatment plants. Two types of monitoring strategies will be developed. One strategy will rely on fault detection techniques, where anomalies in the process variables are quickly detected to ensure safe handling of such faults. Some of the proposed fault detection techniques include LVR-based multivariate detection indices (e.g., Q statistic) and generalized likelihood ratio test (GLRT)-based methods. In fact, GLRT fault detection methods that can detect changes in the mean or variance of the measured variables will be developed. The other monitoring strategy will aim at developing techniques that detect

268

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

drifts in the operating conditions from the nominal operating region where the process was intended to operate at. These techniques will rely on monitoring the model parameters to make sure that they are within a close vicinity of their nominal values. The developed modeling and monitoring techniques will be validated using both simulated data (to theoretically assess their effectiveness) and real data from a biological treatment pilot plant at Qatar Shell.

Technical objectives Wastewater treatment is essential for providing quality water for domestic and industrial applications, and it is performed by governmental and industrial entities all over the world. Proper operation of wastewater treatment plants requires good understanding of their behavior and tight monitoring and control of their key variables to achieve the desired effectiveness of operation (to produce the sought water quality) and to ensure maintaining the desired safety standards and protocols. Therefore the main objective of this project is developing a general framework for enhancing the operation of wastewater treatment plants by developing: • modeling techniques that can accurately predict the behavior of wastewater treatment plants and help understand their behavior, and • monitoring techniques that can detect sensor faults or deviations from a plant’s intended operating region. To achieve the first objective, which is developing accurate wastewater treatment plant modeling techniques, two modeling approaches will be followed. The first modeling approach will utilize a predefined dynamic model structure derived from the basic principles (i.e., mass and energy balances) and then estimate the model parameters from measurements of the process variables using state estimation techniques, such as particle filtering (PF). Particle filtering is a Bayesian estimation method that has been successfully utilized in various applications and has shown advantages over other classical state estimation techniques, such as the extended Kalman filter (EKF) and unscented Kalman filter (UKF), especially when dealing with high-dimensional problems. In fact, an improved PF will be developed in this project to deal with the nonlinear, complex, and highdimensional models of wastewater treatment plants. The idea behind the proposed improved PF is that it will utilize a sampling distribution that incorporates information from recent process measurements, which will help improve the convergence and accuracy of the estimated model parameters. This modeling approach, however, is feasible when a dynamic model

Conclusions and perspectives

269

structure can be derived using the conservation laws and our physical understanding of the process. In some cases, however, deriving such dynamic model structures for complex systems, such as wastewater treatment plants, is a challenging task. In such cases, we propose another empirical modeling approach, which relies on the availability of measurements of the process variables. Specifically, we propose to develop linear and nonlinear dynamic latent variable regression (LVR) models that can account for the multivariate dynamic and nonlinear nature of wastewater treatment plants. For example, in this project, dynamic kernel principal component analysis (DkPCA) and dynamic kernel partial least square (DkPLS) models will be developed to deal with scenarios where the process variables may not be all measurable. Furthermore, the performances of all developed LVR modeling techniques will be enhanced using multiscale representation, which is a powerful data analysis tool that has been shown to improve several types of models. Even though the developed models (using both proposed modeling approaches) are applicable to a wide range of wastewater treatment plants, they will be validated in this project using simulated and real biological wastewater treatment plant data. The simulated biological plant model used in this project is a benchmark model developed by the European CoOperation in the field of Scientific and Technical Research (COST) and has been extensively used in water quality research. The second objective of this project is developing effective monitoring techniques that can help detect anomalies in key measured wastewater treatment plant variables (such as the chemical oxygen demand, dissolve oxygen concentration, ammonia concentration, and others) and also to detect drifts in the process operating conditions from their normal or nominal values. To detect sensor faults or anomalies in the process measurements, different fault detection indices will be utilized, which include the LVR-based Q statistic and the generalized likelihood ratio test (GLRT). GLRT is a statistical fault detection method that relies on maximizing the probability of detection for a given false alarm rate. In this project, various GLRT fault detection statistics will be developed, which include indices that aim at detecting shifts in the mean of the models residuals (which will help detect malfunctioning sensors that may get stuck at some point during the process operation) or aim at detecting changes in the variance of the residuals (which will help detect changes in the uncertainty of the measuring devices). Another monitoring objective is to detect drifts in the wastewater treatment process from its nominal operating region even if there are no sensor faults occurring. This can be done by monitoring the values of estimated model parameters

270

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

obtained using state estimation. So, control charts (which are sensitive to small and slow changes in the process operating conditions), such as exponentially weighted moving average (EWMA), multivariate EWMA, and multiscale EWMA charts, will be developed and used in this regard. Again, the developed monitoring techniques will be validated using simulated and real wastewater plant data obtained from the COST benchmark wastewater plant model and Qatar Shell, respectively.

7.2.3 Project 3: enhanced monitoring of photovoltaic systems Proposal summary Effective operation of various engineering systems requires tight monitoring of some of their key process variables. For example, detection of anomalies in photovoltaic (PV) power systems is crucial for their efficient application to convert solar energy to usable power. Most practical processes, however, are multivariate, that is, involve many variables that need to be monitored at the same time. In a previous research effort, we have developed PCA- and kernel PCA (kPCA)-based GLRT fault detection schemes, in which PCA and kPCA have been used as a modeling framework for fault detection. In this project, our objective is three-fold: to improve the performance of the GLRT, to extend it applicability to a wide range of practical systems, and to apply the developed techniques to enhance monitoring PV systems. First, to improve the performance of the GLRT, a new statistical fault detection method based on combining the advantages of the exponentially weighted moving average (EWMA) filter with those of the GLRT will be developed. The developed method, which is called EWMA-based GLRT, will provide improved properties, such as smaller missed detection and false alarm rates and smaller average run length. The second objective of this project is to extend the applicability of the developed GLR methods to a wide range of practical systems. Most real systems are nonlinear, multivariate, and are best represented by input–output type of models. Latent variable models, such Partial Least Squares (PLS), have been widely used to represent such systems. Therefore, in this project, linear and nonlinear PLS-based GLRT and EWMA-based GLRT methods will be developed to widen the applicability of these techniques in practice. For nonlinear systems, kernel PLS (kPLS), which is capable of dealing with high-dimensional nonlinear data, will be used to make such an extension. Also, in most practical situations, fault detection is needed online, that is, as the data are measured. The nonlinear latent variable models, however, are batch, that is, they require the entire data sets to be available a priori.

Conclusions and perspectives

271

Therefore recursive kPCA and kPLS modeling schemes will be developed to extend the advantages of the GLRT methods for online systems. Also, practical data are usually contaminated with measurement errors. Therefore multiscale representation of data, which is an effective tool for dealing with measurement noise, will be used to further enhance the performances of the fault detection techniques developed in this project. Finally, the developed fault detection techniques will be utilized in practice to help improve various applications. First, they will be used to enhance monitoring the operation of grid-connected photovoltaic power systems through monitoring some of the key variables involved in these systems. Validation of the developed techniques will be made using real PV system data obtained in the “Smart Grid Center” at Texas A&M University at Qatar. Also, the univariate fault detection techniques will be used to provide more accurate detection of aberrations in genomic copy number data, which will help pave the road to better patient-specific diagnosis of diseases and more personalized medicine.

Technical objectives Process monitoring is becoming increasingly important in various applications to ensure safe, reliable, efficient, and economical operation of many engineering systems. For example, failure in a power system is not only inefficient from a productivity point of view, but it can be disastrous from a safety perspective. Also, effective detection of aberrations in genomic data has been shown to help provide better diagnosis of major diseases, such as cancer, which can help lead to more targeted (or personalized) medicine. Various fault detection techniques have been developed and used in practice. The statistical hypothesis testing-based fault detection methods, such as the generalized likelihood ratio test (GLRT), have been shown to be among the most effective univariate techniques. In a previous effort, we have extended the GLRT to handle multivariate and nonlinear systems by developing principle component analysis (PCA)- and kernel PCA (kPCA)-based GLRT methods. This project has three main objectives: • Enhance the performance of the GLRT by developing a new GLRT statistic with improved fault detection abilities, • Widen the applicability of the GLRT methods to a broad range of systems and model structures by developing various multivariate, nonlinear, recursive, and multiscale modeling frameworks, • Utilize the developed fault detection techniques in important practical applications by enhancing the monitoring of photovoltaic (PV) power

272

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

systems and by making a step forward toward better genomic data-based personalized medicine. To achieve the first objective, which is enhancing the effectiveness of the GLRT method, a new method that combines the advantages of the exponentially weighted moving average (EWMA) filter with those of the GLRT method (to further enhance its performance) will be developed. The developed method, which will be referred to as EWMA-based GLRT, will provide fast and effective detection while maintaining a low false alarm rate. The idea behind the developed EWMA-based GLRT is computing a new GLRT statistic that integrates current and previous data information in a decreasing exponential fashion giving more weight to the more recent data. This will help provide a more accurate estimation of the GLRT statistic and provide a stronger memory, which will enable better decision making with respect to fault detection. The second objective of this proposal is to widen the applicability of the GLRT methods to a wide range of systems. Most practical systems are nonlinear multivariate and are best described using input–output models, such as partial least squares (PLS) regression models. Therefore, to achieve this objective, linear and nonlinear PLS-based GLRT fault detection methods will be developed. To make the extension to nonlinear systems, kernel PLS (kPLS) will be used. Kernel latent variable regression (LVR) models rely on of transforming the data into a higher-dimensional space, in which the data become linear, making the kernel-based approach for modeling nonlinear systems an attractive choice. Unfortunately, LVR models are batch, that is, they require the availability of the process data before constructing the model. In most situations, however, fault detection is needed online, that is, as the data are collected from the process. Therefore recursive kPCA and kPLS modeling techniques will be developed to extend the advantages of the GLRT to online processes. Also, measured data are usually contaminated with errors, which degrade their quality and limit their usefulness. Multiscale representation of data has been shown to improve the performances of various fault detection methods. Therefore multiscale representation will be used to further enhance the effectiveness of the fault detection methods developed in this project. The third objective of this project is to utilize the developed fault detection techniques to 1) improve monitoring the operation of PV systems, and 2) better utilize genomic copy number data for more effective diagnosis of diseases. Grid-connected PV systems are among the top power technologies with the highest rate of growth. Therefore their proper operation and safe handling are top priorities. Various key variables will be monitored in PV

Conclusions and perspectives

273

systems, which include the voltage and frequency of the grid, the voltage and the current of the AC and DC converters, and climate data, such as the temperature and irradiance. Tight monitoring of these variables will help provide more effective and less interrupted energy supplies. In this application the developed fault detection methods will be applied and validated using real data from the “Smart Grid Center” at Texas A&M University at Qatar, which is operated under Prof. Haitham Abu Rub. One of the PIs on this project, Dr. Mohamed Trabelsi (assistant research scientist at the Smart Grid Center). On the other hand, the univariate fault detection methods developed in this project, such as the EWMA-based GLRT, will be used to help detect abnormal changes (aberrations) in genomic copy number data. Medical practitioners can use such information in the diagnosis of various genetic diseases. Therefore improved detection of such genetic aberrations can help medical doctors provide case-specific treatment plans based on such knowledge, which can be an important step forward toward more personalized medicine.

7.2.4 Project 4: enhanced data validation of an air quality monitoring networks Proposal summary Many human activities produce primary pollutants like nitrogen oxides (NO2 and NO), sulfur dioxide, and volatile organic compounds formed in the lower atmosphere by chemical or photochemical reactions secondary pollutants like ozone. A number of these pollutants are likely to cause problems for both human health and ecological systems. To perform air quality management, air quality monitoring networks have the following missions: the production of data (pollutant concentration and a range of meteorological parameters related to pollution events), including the network management, the diffusion of data for permanent information of population and public authorities, and surveillance in reference to norms. To the crossing of economical, sanitary and ecological, social, scientific, and technical interests, the data validity and credibility of the delivered information are essential. Sensor data validation is therefore an issue of great importance for the development of reliable environmental monitoring and management systems. Till now, the problem of sensor data validation is performed either using “outlier” detection methods, which only identify those extreme values out of measurement range or manually by an operator. Unfortunately, this approach is too subjective and impractical in real-time due to high network dimensionality and the large amount of collected

274

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

data. However, in the field of fault diagnosis, more modern methods have been developed. Model-based diagnosis relies on information redundancy concepts. Its principle is generally based on consistency checking between an observed behavior of the process provided by sensors and an expected behavior provided by a mathematical representation of the process. The analytical redundancy is an explicit input–output relationship that may be difficult to obtain (high process nonlinearity, complexity of the process, and high process dimensionality). As an alternative, methods based on latent variable (LV) (PCA, PLS, kernel PCA, kernel PLS) methods that are data-driven could be very attractive for failure detection. The LV approach is used to model normal process behavior, and faults are then detected by referencing the observed behavior against this model. Because of the nice features of LV, this method can handle high-dimensional and correlated process variables. By way of their interaction the different pollutants constitute a dynamic chemical system strongly influenced by atmospheric conditions. The physico-chemical mechanisms taking place are poorly understood, but it clearly appears that these processes are multivariable and strongly nonlinear. Furthermore, most existing models take into account the atmospheric chemistry of one hundred reactions, also the emissions of primary pollutants, as well as vertical and horizontal exchanges linked to movements of the atmosphere. These models therefore combine a large number of equations with numerous parameters of inaccessible and unknown quantities. These models then are very complex, computationally costly, and, above all, need measurements that are seldom available in air quality monitoring networks. So, the air quality monitoring network is a sensor data validation problem, which needs the following steps: • Process modeling, • Sensor fault detection, • Sensor fault isolation, and • Fault identification and replacement values for faulty measurements. In this project the main objectives are improving the performance of the latent variable process modeling, monitoring, and diagnosis methods, extending their applicability to a wide range of practical systems, and applying the developed techniques to enhance the operation and performance of air quality systems. The real situations are more complex than those encountered on simulation or on a laboratory pilot, the latter being generally well controlled, and the assumptions made being more easily testable. In addition, a laboratory pilot remains easily instrumental, and most variables are accessible to measurement. Models of physical phenomena introduce

Conclusions and perspectives

275

many parameters, some of which are difficult to obtain and calibrate from a practical point a view. To overcome this problem, it is necessary to make a significant effort in modeling task. This effort can be achieved by taking into account models and measurement inaccuracies and uncertainties. Therefore, to improve the performance of the linear and nonlinear input (PCA and kernel PCA) and input–output (PLS and kernel PLS) modeling techniques, new statistical modeling methods, based on combining the advantages of LV methods with those of interval-valued methods, will be developed. The developed methods, which are called LV (PCA, PLS, kernel PCA, kernel PLS) and interval LV (ILV) (interval PCA, interval PLS, interval kernel PCA, interval kernel PLS), will provide improved properties, such as smaller interval modeling errors using single and interval data. Thus the first objective of this proposal is to develop a robust LV and interval LV models for environmental process of air quality, which can correctly estimate all involved variables concentrations. The second objective is to improve the performance of monitoring by using an enhanced statistical fault detection methods that exploit the advantages of the generalized likelihood ratio test (GLRT) with those of the single- and interval-valued LV methods. The developed methods, called LV (PCA, PLS, kernel PCA, kernel PLS) and interval LV (ILV) (interval PCA, interval PLS, interval kernel PCA, interval kernel PLS)-based GLRT, will provide improved properties, such as smaller missed detection and false alarm rates and smaller average run length. The third objective is to extend the reconstruction principal method proposed in the literature, which is generally used in the case of PCA-based single-valued data to deal with input–output model by using PLS method and nonlinear models by using kernel PCA and kernel PLS. Hence PLS-, kernel PCA-, and kernel PLS-based reconstruction methods will be developed. Also, the developed methods will be extended to deal with interval-valued data, and thus interval PLS-, kernel PCA-, and kernel PLS-based reconstruction methods will be proposed to address the fault isolation problem for environmental process of air quality and which can correctly estimate all involved variables concentrations. The fourth objective of this proposal is the correction of the system. Once the faulty variable is identified, a corrective action is necessary to give replacement values for faulty measurements. Since the reconstruction approach uses the other variables to reconstruct the considered one, the faulty variable is then reconstructed from the other variables (single fault case), and the reconstructions are taken as replacement measurements. It should be noted that this approach is easily extended to the multiple simultaneous faulty case

276

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

under some specific conditions (reconstruction conditions). Therefore the developed single and interval PLS-, kernel PCA-, and kernel PLS-based reconstruction methods will be proposed to achieve the correction phase since the data collected from AQMNs are generally noisy and of multiscale nature, which can greatly affect the quality of monitoring. Hence we propose to integrate wavelet-based multiscale representation of AQMN data with the proposed techniques. The various techniques developed in this project will be applied to enhance air pollution monitoring. We will seek to address the monitoring and diagnosis problems of the concentrations of various air pollutants, such as ozone, sulfur dioxide, carbon monoxide, nitrogen oxides, and dust particles. Practical air pollution data from different countries (e.g., Qatar, United States, and France) will be used to design monitoring systems that can provide early alert mechanisms in the cases of abnormal changes in the concentrations of these pollutants.

Technical objectives Process monitoring is becoming increasingly important in various applications to ensure safe, reliable, efficient, and economical operation of many engineering systems. For example, failure in environmental processes is not only inefficient from a productivity point of view, but it can be disastrous from a safety perspective. Effective detection of aberrations in such processes has been shown to help provide better diagnosis of major diseases, which can help lead to more targeted treatment development. Abnormal atmospheric pollution levels negatively affect the public health, animals, plants, and climate and damage the natural resources. Therefore monitoring air quality is also crucial for the safety of humans and the environment. Thus the main aim of this proposal is to develop enhanced operation and performance of air quality monitoring process. In addition, real-world data analysis is often affected by different types of errors such as measurement errors, computation errors, and imprecision related to the method adopted for estimating the data. The uncertainty in the data, which is strictly connected to the errors mentioned, may be treated by considering the interval of values into which the data may fall, rather than a single value for every data. The true value of the quantity is a concept. In almost all cases the true value cannot be measured, and the collected data on a process are only approximations given by sensors and thus are imprecise. This is due mainly to the uncertainties induced by measurement errors or determined by specific experimental conditions. Statistical methods have been mainly developed for the analysis of single-valued variables. However, in real life, there are

Conclusions and perspectives

277

many situations in which the use of these variables may cause severe loss of information. Dealing with quantitative variables, there are many cases where a more complete information can be surely achieved by describing a set of statistical units in terms of interval data. For example, daily temperatures registered as minimum and maximum values offer a more realistic view on the variations of weather conditions with respect to the simple average values. Another example can be given by air quality data (O3 , NO2 , and NO), where each concentration measurement is taken as a mean of several measurements over 15 minutes (sample time), and the minimum and maximum concentrations, recorded over 15 minutes, represent a more relevant information for experts for evaluation of tendency and variability of pollutants concentrations. Therefore, in this project, we propose to develop an enhanced fault detection and isolation using latent variable-based single and interval data methods and then use these developed techniques to improve monitoring Air-Quality Monitoring Network (AQMN) process. The main objectives of this project is to enhance the operation and performance of AQMN through the development and integration of innovative modeling, monitoring, and diagnosis techniques. In this project, several objectives will be sought. The first objective is to improve the performance of the linear and nonlinear input (PCA and kernel PCA) and input–output (PLS and kernel PLS) modeling techniques by using single- and interval-valued methods. The developed methods, which are called LV (PCA, PLS, kernel PCA, kernel PLS) and interval LV (ILV) (interval PCA, interval PLS, interval kernel PCA, interval kernel PLS), will provide improved properties, such as smaller interval modeling errors using single and interval data. Therefore the aim of this task is to develop a robust LV and interval LV models for environmental process of air quality and which can correctly estimate all involved variables concentrations. The second objective of this proposal is to improve the performance of monitoring using a new statistical fault detection method, which combines the advantages of generalized likelihood ratio test (GLRT) with those of the single- and interval-valued LV methods. The developed methods PCA, PLS, kernel PCA, kernel PLS and interval PCA, interval PLS, interval kernel PCA, interval kernel PLS-based GLRT will provide improved properties, such as smaller missed detection and false alarm rates and smaller average run length using single and interval data. The third objective is to develop robust fault identification techniques; to do that, we propose to extend the reconstruction PCA method-based single valued-data to deal with nonlinear models by using kernel PCA method

278

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

and input output models by using PLS and kernel PLS methods. Hence, kernel PCA-, PLS-, and kernel PLS-based reconstruction methods will be developed. Also, the developed methods will be extended to deal with interval-valued data, and thus interval kernel PCA-, interval PLS-, and interval kernel PLS-based reconstruction methods will be proposed to address the fault isolation problem for environmental process of air quality and which can correctly estimate all involved variables concentrations. The fourth objective is a correction of the system. Once the faulty variable is identified, a corrective action is necessary to give a replacement values for faulty measurements. Since the reconstruction approach uses other variables to reconstruct the considered one, the faulty variable is then reconstructed from the other variables (single fault case), and the reconstructions are taken as replacement measurements. It should be noted that this approach is easily extended for the multiple simultaneous faulty case under some specific conditions (reconstruction conditions). Therefore the developed single and interval kernel PCA-, PLS-, and kernel PLS-based reconstruction methods will be proposed to achieve the correction phase since the data collected from AQMNs are generally noisy and of multiscale nature, which can greatly affect the quality of monitoring. Hence we propose to integrate wavelet-based multiscale representation of AQMN data with the proposed techniques. The developed techniques will be used to enhance modeling, monitoring, and diagnosis of the concentration levels of various air pollutants, such as ozone, nitrogen oxides, sulfur oxides, dust, and others. Real air pollution data from various countries (e.g., Qatar, United States, and France) will be used in this important application.

Appendix Applications Tennessee Eastman Process (TEP) The proposed method is applied to the well-known Tennessee Eastman process. The TE process is a benchmark of a chemical process that has been commonly used in process monitoring research [1–4]. The process consists of five main units: reactor, a condenser, a recycle compressor, a separator, and a stripper. The TE process contains eight components: G and H are the main products, A, C, D, and E are reactants, F is a byproduct, and B is an inert component. Details of this process are widely presented in the literature [1,5,2]. The process flow sheet is given in Fig. 1. The process contains 41 measured variables presented in Table 2 (19 composition measurements and 22 continuous process measurements) and 12 manipulated variables (Table 1) [2,6]. There are 21 different faults simulated on the TEP benchmark, and their descriptions are presented in Table 3. Interval-valued

Figure 1 Tennessee Eastman process. 279

280

Appendix

Table 1 Manipulated variables. Variable Description

XMV(1) XMV(2) XMV(3) XMV(4) XMV(5) XMV(6) XMV(7) XMV(8) XMV(9) XMV(10) XMV(11) XMV(12)

D Feed flow (stream 2) E Feed flow (stream 3) A Feed flow (stream 1) Total feed flow (stream 4) Compressor Recycle Valve Purge Valve (stream 9) Separator Pot Liquid Flow (stream 10) Stripper Liquid Product Flow (stream 11) Stripper Stream Valve Reactor Cooling Water Flow Condenser Cooling Water Flow Agitator Speed

data are generated by taking into account imprecision of measurement sensors. If we suppose that all sensors have an imprecision of 2%, then each measurement is represented as an interval-valued sample. Interval-valued data matrices, training and testing, of the TEP are generated.

Distillation column Distillation is one of the most common liquid–liquid separation processes and can be carried out in a continuous or batch system. Distillation can be used to separate binary or multicomponent mixtures. Many variables, such as column pressure, temperature, size, and diameter, are determined by the properties of the feed and the desired products. The plant is a linearized dynamic model of a continuous distillation column. The process model and simulation conditions are similar to those provided by [7]. The so-called column A with LV-configuration have 40 theoretical stages and separates a binary mixture with relative volatility of 1.5 into products of 99% purity. Fig. 2 represents the diagram of a simple distillation column, where the corresponding manipulated variables are summarized in Table 4. We simulated the distillation column plant for 2 hours under normal operating conditions (NOC). Generated data set contain 2000 samples. Table 4 shows the 12 process variables to be monitored. Measurements of each sensor are noisy and imprecise, with the uncertainty ratio for each variable

Table 2 Measured variables. Variable Description

XMEAS(1) XMEAS(2) XMEAS(3) XMEAS(4) XMEAS(5) XMEAS(6) XMEAS(7) XMEAS(8) XMEAS(9) XMEAS(10) XMEAS(11) XMEAS(12) XMEAS(13) XMEAS(14) XMEAS(15) XMEAS(16) XMEAS(17) XMEAS(18) XMEAS(19) XMEAS(20) XMEAS(21)

A Feed (Stream 1) D Feed (Stream 2) E Feed (Stream 3) Total feed flow (Stream 4) Recycle Flow (Stream 8) Reactor Feed Rate (Stream 6) Reactor Pressure Reactor Level Reactor Temperature Purge Rate (Stream 9) Product Sep. Temp. Product Sep. Level Product Sep. Pressure Prod. Sep. Underflow (Stream 10) Stripper Level Stripper Pressure Stripper Underflow (Stream 11) Stripper Temperature Stripper Stream Flow Compressor Work Reactor Cooling Water Outlet Temp.

Variable

Description

XMEAS(22) XMEAS(23) XMEAS(24) XMEAS(25) XMEAS(26) XMEAS(27) XMEAS(28) XMEAS(29) XMEAS(30) XMEAS(31) XMEAS(32) XMEAS(33) XMEAS(34) XMEAS(35) XMEAS(36) XMEAS(37) XMEAS(38) XMEAS(39) XMEAS(40) XMEAS(41)

Separator Cooling Water Outlet Temp Composition of A in Reactor Feed Composition of B in Reactor Feed Composition of C in Reactor Feed Composition of D in Reactor Feed Composition of E in Reactor Feed Composition of F in Reactor Feed Composition of A in Purge Gas Flow Composition of B in Purge Gas Flow Composition of C in Purge Gas Flow Composition of D in Purge Gas Flow Composition of E in Purge Gas Flow Composition of F in Purge Gas Flow Composition of G in Purge Gas Flow Composition of H in Purge Gas Flow Composition of D in Product Flow Composition of E in Product Flow Composition of F in Product Flow Composition of G in Product Flow Composition of H in Product Flow

282

Appendix

Table 3 Process faults of TEP. Fault Description

Type

1 2 3 4 5 6 7

Step Step Step Step Step Step Step

8 9 10 11 12 13 14 15 16 17 18 19 20 21

A/C Feed ration, B Composition (Stream 4) B Composition, A/C ration constant (Stream 4) D Feed temperature (Stream 2) Reactor cooling water inlet temperature Condenser cooling water inlet temperature Feed loss (Stream 1) C Header pressure loss, Reduced availability (Stream 4) A, B, C feed composition (Stream 4) D Feed temperature (Stream 2) C Feed temperature (Stream 4) Reactor cooling water inlet temperature Condenser cooling water inlet temperature Reaction kinetics Reaction cooling water valve Condenser cooling water valve Unknown Unknown Unknown Unknown Unknown Valve position constant (Stream 4)

Random Variation Random Variation Random Variation Random Variation Random Variation Slow drift Sticking Sticking Unknown Unknown Unknown Unknown Unknown Constant position

supposedly represented by 1% of its variability, and so interval-valued data are generated.

Air quality monitoring network Air pollution poses a significant threat to human health and people life quality. To protect public health, air quality is monitored through several techniques. The most famous techniques used for air quality monitoring are model-based approaches. Furthermore, most existing models take into account the atmospheric chemistry reactions and the emissions of primary pollutants. These models therefore use a large number of parameters, which are computationally costly and need measurements that are seldom available in air quality monitoring networks [8–10]. To perform air quality management, air quality monitoring networks have the following missions: the production of data (pollutant concen-

Appendix

283

Figure 2 Basic distillation column controlled with LV-configuration.

tration and parameters related to pollution events) including the network management, the diffusion of data for permanent information of population and public authorities, and surveillance in reference to norms. To achieve those missions, the data validity and credibility of the delivered information are essential. Sensor data validation is therefore an issue of great importance for the development of reliable environmental monitoring and management systems. Till now, the problem of sensor data validation is performed either using outlier detection methods, which only identify those extreme values out of measurement range or manually by an operator. Unfortunately, this approach is too subjective and impractical in real-time due to high network dimensionality and the large amount of collected data. So, the air quality monitoring network is a sensor data validation problem that needs i) process modeling, ii) sensor fault detection, iii) sensor fault isolation, and iv) correction. In this work, we will focus on the two first tasks, process modeling and sensor fault detection. In this work, we consider six measurement stations. The data matrix X contains 18 variables, x1 to x18 , corresponding, respectively, to ozone concentrations O3 and nitrogen dioxide (NO2 and NO) of each station. For process monitoring, 1500 observations are used, which correspond to collected data during two weeks.

284

Appendix

Table 4 Distillation column process variables. Variables Description

qF zF TF FM FV D B L V MD MB xB

fraction of liquid in feed feed composition [mole fraction] feed temperature [◦ C] feed molar flow [kmol/min] feed volumetric flow [kmol/min] distillate flow [kmol/min] bottom flow [kmol/min] reflux flow [kmol/min] boilup flow [kmol/min] condenser holdup [kmol] reboiler holdup [kmol] bottom composition [mole fraction]

It should be noted that each concentration measurement is taken as a mean of several measurements over 15 minutes (sample time). To evaluate tendency and variability of pollutants concentrations, more relevant information can be taken into account by considering the minimum and maximum concentrations, recorded over 15 minutes, or by considering the sensor measure precision. This means that data will be transformed from single-valued to interval-valued data. Fig. 3 shows the time evolution of ozone, NO2 , and NO concentrations for the first three stations. Fig. 4 shows time evolution of ozone concentration of the first station for single- and interval-valued data.

Continuously stirred tank reactor (CSTR) In this subsection, we present a controlled nonisothermal continuous stirred tank reactor (CSTR) in which the irreversible first-order reaction (A → B) takes place. Using material and energy balances, the following dynamic model describing the changes in the concentration and temperature inside the CSTR can be derived: dCA dt dT dt

= =

F (CA0 − CA ) − K0 e−E/RT CA , V F Q (−H )k0 −E/RT e CA − , (T0 − T ) + V ρ Cp V ρ Cp

(1) (2)

Appendix

285

Figure 3 Ozone concentrations for the first three stations.

Figure 4 Ozone concentrations for the first station, single-valued and interval-valued representations.

Q =

aFcb+1 Fc +



aFcb 2ρc Cpc

 (T − Tcin ) ,

(3)

where CA0 and CA are the concentrations of A in the inlet stream and inside the reactor, respectively, T0 and T are the inlet and reactor temperatures, respectively, F is the volumetric flow rate into and out of the reactor, FC is the coolant flow rate, Q is the heat transfer rate from the reactor to the

286

Appendix

coolant, E is the activation energy of the reaction, R is the universal gas constant, k0 is the reaction rate constant, V is the reactor volume, Hr is the heat of reaction, and ρ , ρc , Cp , and Cpc are the densities and specific heats of the reacting material and the coolant, respectively. This CSTR model has two state variables, the reactor temperature T and concentration CA , which are both assumed to be measurable. The two state variables T and CA are controlled using two proportional integral (PI) controllers by manipulating FC and F, respectively [11]. The parameters of the two PI controllers are KC1 = −0.8 and τ1 = 0.1 for the temperature controller and KC2 = 2 and τ2 = 0.1 for the concentration controller. The nominal values of the CSTR model parameters are presented in [12] (pages 897–908). The simulated CSTR model is used to generate training and testing data sets by changing the set-points of the two controllers in stepwise fashion. To better represent practical process data, the simulated data of the two state variables (concentration and temperature), which are assumed to be noise-free, are then contaminated with zero-mean Gaussian noise having standard deviations of σc = 0.005 and σT = 0.02, respectively.

References [1] J.J. Downs, E.F. Vogel, A plant-wide industrial process control problem, Computers & Chemical Engineering 17 (3) (1993) 245–255. [2] E.L. Russell, L.H. Chiang, R.D. Braatz, Tennessee Eastman process, in: Data-driven Methods for Fault Detection and Diagnosis in Chemical Processes, Springer, 2000, pp. 99–108. [3] M. Kano, S. Hasebe, I. Hashimoto, H. Ohno, Statistical process monitoring based on dissimilarity of process data, AIChE Journal 48 (6) (2002) 1231–1240. [4] G. Li, C.F. Alcala, S.J. Qin, D. Zhou, Generalized reconstruction-based contributions for output-relevant fault diagnosis with application to the Tennessee Eastman process, IEEE Transactions on Control Systems Technology 19 (5) (2011) 1114–1127. [5] P.R. Lyman, C. Georgakis, Plant-wide control of the Tennessee Eastman problem, Computers & Chemical Engineering 19 (3) (1995) 321–331. [6] M. Ammiche, A. Kouadri, A. Bakdi, A combined monitoring scheme with fuzzy logic filter for plant-wide Tennessee Eastman process fault detection, Chemical Engineering Science 187 (2018). [7] S. Skogestad, Dynamics and control of distillation columns: a tutorial introduction, Chemical Engineering Research and Design 75 (6) (1997) 539–562. [8] M.-F. Harkat, G. Mourot, J. Ragot, An improved PCA scheme for sensor FDI: application to an air quality monitoring network, Journal of Process Control 16 (6) (2006) 625–634. [9] A. Herrera-Gomez, M. Bravo-Sanchez, O. Ceballos-Sanchez, M. Vazquez-Lepe, Practical methods for background subtraction in photoemission spectra, Surface and Interface Analysis 46 (10–11) (2014) 897–905.

Appendix

287

[10] I. Stanimirova, V. Simeonov, Modeling of environmental four-way data from air quality control, Chemometrics and Intelligent Laboratory Systems 77 (1–2) (2005) 115–121. [11] F. Harrou, M. Nounou, H. Nounou, M. Madakyaru, Statistical fault detection using PCA-based GLR hypothesis testing, Journal of Loss Prevention in the Process Industries 26 (2013) 129–139. [12] T. Marlin, Process Control: Designing Processes and Control Systems for Dynamic Performance, second ed., McGraw-Hill, New York, 2000.

Index

A Abilities, 29, 72 detection, 30, 50, 59, 65, 117, 129, 153, 180, 185, 187, 189, 214, 244 fault detection, 12, 21, 26, 30, 50, 73, 83, 86, 99, 105, 132, 271 monitoring, 6, 131 Abnormal ozone levels, 37 measurements, 29 Abrupt fault, 222, 238 Air quality, 1, 160, 189, 273–275, 277, 278, 282, 283 monitoring, 5, 12, 276 Anomaly false, 26, 29 true, 26, 28–30 Applicability, 5, 38, 99, 187, 189, 214, 259, 270–272, 274 AQMN data, 276, 278

B Bayesian framework, 3, 229 Benchmark process, 175, 267

C Cadaverine, 224, 246, 250, 252, 255 Canonical variate analysis (CVA), 5 Centers principal component analysis (CPCA), 140 Central difference Kalman filter (CDKF), 222 Chemical process, 1, 12, 13, 37, 38, 41, 43, 49, 50, 64, 65, 80, 94, 107, 115, 210 monitoring, 190 reactors, 99 CIPCA, 137, 140, 142, 164 method, 143 model, 143, 149, 157 Complete-information principal component analysis (CIPCA), 137 Contaminant detection, 261, 265

Contaminant monitoring, 260–263, 265, 266 Contamination event detection, 263 Continuously stirred tank reactor (CSTR), 13, 23, 32, 37, 45, 63, 65, 69, 74, 100, 132, 284 Control charts, 189, 270 limit, 54, 160, 161, 173–175, 177, 178, 192, 195, 201, 230–233 estimation, 177 process, 119 Conventional GLRT, 86 method, 87 statistic, 84, 87 KPCA, 63 KPLS, 65, 74 PCA, 23, 31, 86, 89, 95 approach, 82 fault detection, 35, 36 method, 13, 29, 31 CPCA, 136, 137, 140, 141, 164 CPLS model, 171, 176 interval, 183 CSEC, 221, 224, 245, 246, 248, 255 model, 246 state variables, 250 CSTR, 13, 23, 32, 37, 41, 45, 50, 63, 65, 69, 74, 100, 109, 115, 121, 132 data, 32, 120 model, 32, 33 Cumulative percent variance (CPV), 16, 26 CUSUM, 4, 223 chart, 4, 160, 223, 230

D Damage detection performances, 131 Data analysis, 81, 99 tool, 269 filtering, 87, 96 stage, 116, 123 289

290

Index

interval, 137, 143, 173, 209 matrix, 13, 21, 22, 36, 140, 191, 198 mining procedure, 136, 189 multiscale representation, 79, 271, 272 ozone, 24 process, 21, 32, 80, 94, 119, 120, 259, 272 sample, 4, 18, 29, 30, 32, 192, 209, 223 set, 30, 51, 59, 64, 65, 94, 101, 116, 117, 120, 122, 128, 197, 198, 270 validation, 273 wastewater treatment, 235 Dataset, 136, 261 Decomposition depth, 82, 131, 250 Degree of freedom (DoF), 242, 243 Detection, 16, 22, 45, 56, 63, 65, 74, 86, 128, 132, 164, 168, 214, 239, 269–271 abilities, 30, 50, 59, 65, 117, 129, 153, 180, 185, 187, 189, 214, 244 accuracy, 35 charts, 95, 178, 240 effectiveness, 50, 115, 168, 190 efficiency, 97 improvements, 121 indicators, 41, 107 performances, 4, 97, 129, 131, 132, 149, 153, 154, 158, 164, 224, 240, 255 probability, 22, 36, 106, 114 rate, 36, 121, 262, 264, 265 for fault, 70, 168 redundancy, 262, 265 results, 129, 240 stage, 45 statistic, 98, 160, 230 system, 37 thresholds, 29, 35 Detection rate (DR), 265 Detection speed (DS), 265 DEWMA statistics, 243, 244 Discrete wavelet transform (DWT), 116, 123 Dissolve oxygen concentration, 224, 235, 255, 269 Distillation, 280 Dynamic kernel partial least square (DkPLS), 269

Dynamic kernel principal component analysis (DkPCA), 269

E Effectiveness detection, 50, 115, 168, 190 monitoring, 223 superior, 131 Eigenvectors, 56, 195, 196 Enhanced monitoring, 37, 54, 270 Environmental data, 80 monitoring, 273 processes, 1, 5, 6, 12, 38, 137, 275–278 Enzyme, 224, 246, 248, 255 EWMA, 4, 6, 50, 114, 189, 223, 232, 238, 246, 252, 255, 260, 270, 272 control chart, 160, 161 filter, 98 multiscale, 223, 233 multivariate, 270 performance, 231 representation, 61 statistic, 4, 160, 161, 223, 224, 231, 233 Exponentially weighted (EW), 264 Exponentially weighted moving average (EWMA), 4, 6, 50, 114, 189, 223, 260, 270, 272 Extended Kalman filter (EKF), 222, 225, 226, 261, 264, 268

F False alarm (FA), 86, 164 False alarm rate (FAR), 50, 59, 98, 115, 117, 150, 176, 199, 223, 224, 231, 246, 255, 264 Fault bias, 60, 119, 130, 149, 238 detection abilities, 12, 21, 26, 30, 50, 73, 83, 86, 99, 105, 132, 271 algorithms, 72 charts, 6, 59, 60, 117, 118, 172, 175, 177, 178, 199, 222 method, 32 performance, 41–43, 45, 50, 63, 65, 74, 80, 83, 84, 86, 92, 96, 100, 107, 109–111, 114, 136, 187, 214, 246

Index

probability, 12, 72, 99 process, 65 technique, 13, 38, 59, 65, 80, 103, 107, 115, 117, 267 diagnosis, 189 free data, 41, 232 measurements, 274, 275, 278 region, 86, 89, 94 samples, 176, 199 Fault detection (FD), 37, 64, 79, 221, 223

G Generalized likelihood ratio test (GLRT), 4, 6, 12, 20, 36, 37, 50, 57, 62, 64, 73, 96, 113, 175, 186, 189, 223, 267, 269, 271, 275, 277 Geometric Moving Average (GMA), 160, 230 GLR charts, 146, 147, 149, 154, 157 statistic, 41 test statistic, 40 GLRT approach, 55, 190 chart, 13, 50, 68, 83, 84, 112, 128, 195, 207–209, 212, 264 conventional, 86 fault detection, 86, 100, 195, 267, 269, 270 techniques, 6 hypothesis testing, 21 interval, 174 kernel, 50 methods, 271, 272 multivariate, 210, 213 performance, 63, 74 statistic, 72, 86, 87, 96–98, 115, 202, 271, 272 techniques, 69, 70, 73 univariate, 207, 210 GMA chart, 160, 230

H Hypothesis alternative, 19, 22, 23, 146, 147 null, 20–23, 146, 147, 161 Hypothesis testing, 5, 12, 18–20, 22, 23, 36, 45, 80, 82, 146, 147, 259

291

fault detection, 5, 13, 21, 38, 82, 99, 259 GLRT, 21 problem, 19, 23

I IKGLRT, 195 IKPCA, 190 models, 201, 202 IKPLS, 204 method, 210 Independent component analysis (ICA), 5 Interval CPLS model, 183 data, 137, 143, 173, 209 description, 137 estimation, 171 methods, 277 GLRT, 174 KLVR, 264 KPCA, 202, 204 model, 198, 199 KPLS, 204 method, 206, 213 PCA, 259, 275, 277 model, 144–146, 149, 153 PLS, 172, 275, 277 method, 174 model, 172 residuals, 145, 149, 161, 172–174, 177, 179–181, 185, 186, 207 Interval kernel GLR test (IKGLRT), 195 Interval Principal Component Analysis (IPCA), 136 IPCA, 136, 137, 259 Isolation rate (IR), 265

K Kalman filter (KF), 2, 226 Kernel GLRT, 50 approach, 57 chart, 54 fault detection chart, 57 multiscale, 114–116, 122 PCA, 49, 50, 56, 132, 190, 259, 263, 270, 271, 275, 277 method, 277

292

Index

PLS, 66, 73, 128, 129, 190, 204, 209, 259, 263, 270, 272, 275, 277 algorithm, 67 methods, 278 models, 204 Kernel Partial Least Squares (KPLS), 64, 65, 122 Kernel principal component analysis (KPCA), 50, 62 Key variables, 102, 222, 224, 255, 262, 268, 270, 272 KGLRT, 50, 57, 59, 60, 117, 119, 120 KLVR, 261, 263 interval, 264 methods, 261 multiscale, 263 modeling methods, 261 multiscale, 264 multiscale, 261, 264 KPCA, 49–51, 53, 62–64, 116, 132, 190, 199, 202, 259, 263, 270, 271 algorithm, 116 approach, 50 conventional, 63 interval, 202, 204 method, 50, 53, 63 model, 49, 59, 116, 117, 120, 190, 196, 197, 199 multiscale, 116 technique, 116 KPLS, 64–66, 122, 129, 132, 209, 259, 263, 270, 272 algorithm, 64, 123, 206 approach, 64, 73, 213 method, 66, 73, 74, 209 model, 123, 124, 127, 129 techniques, 69, 124

L Latent variable (LV), 5, 12, 18, 65, 80, 99, 113, 136, 185, 188, 270, 274 Latent variable regression (LVR), 5, 261, 263, 267, 269, 272 Leak detection, 263, 264 Likelihood ratio for fault detection, 12, 37, 50, 79 monotone, 19

Linear PCA model, 30, 32, 116 PLS, 64, 66, 72 PLS model, 122, 129 Lysine, 224, 246, 249, 255

M Measurements, 99, 117 fault, 274, 275, 278 ozone, 23–26 process, 2, 99, 269 Midpoints radii PCA (MRPCA), 137, 140, 141, 158, 162 Missed detection rate (MDR), 50, 59, 86, 98, 115, 117, 150, 164, 176, 199, 223, 224, 231, 246, 255, 264 Monitored residuals, 222, 255, 260 variables, 57, 117, 148, 163, 175 Monitoring abilities, 6, 131 air quality, 12, 276 capability, 224, 255 chemical processes, 185, 190 effectiveness, 223 environmental, 273 network air quality, 156, 162, 165, 187, 260, 274 ozone, 23, 31 performances, 121, 164, 168, 261, 264, 265, 275, 277 process, 1, 2, 12, 13, 32, 82, 100, 121, 136, 189, 221, 271, 276 PV systems, 270 scheme, 143 strategy, 267 techniques, 2, 49, 265, 267–269 MRPCA, 137, 140, 141, 164 approach, 164 methods, 140, 164, 165 model, 141, 161 technique, 164 MSKPCA, 114, 132 algorithm, 116

Index

MSKPLS, 114, 131, 132 algorithm, 124 for fault detection, 125 model, 125 MSPCA, 13, 80, 82, 84, 89, 95, 112, 116, 123 fault detection, 82 methods, 86, 89, 92 model, 84 Multiobjective optimization, 223, 231, 233, 244, 262, 264 problems, 231 process, 231 scheme, 265 Multiple faults, 19, 31, 33, 35, 61, 69, 73, 224, 255 Multiscale data prefiltering, 82 EWMA, 223, 233 chart, 230, 231, 255, 270 detection statistic, 232 monitoring technique, 232 kernel, 114–116, 122 PCA, 116 PLS, 132 KLVR, 261, 264 methods, 263 modeling methods, 264 KPCA, 116 KPLS, 131 algorithm, 123 LVR, 263, 267 nature, 80, 132, 223, 261, 263, 276, 278 PCA, 13, 82, 101, 112, 116 method, 80 model, 82 representation, 80, 82, 97, 99, 100, 114, 116, 119, 122, 131, 233, 250, 259, 269, 272 Multivariate charts for abnormal events detection, 136 EWMA, 270 GLRT, 210, 213 chart, 207, 210, 212 statistic chart, 147, 148 statistical approaches, 189 statistical modeling techniques, 13

293

N Noise, 15, 22, 79, 80, 87, 102, 126, 128 correlated, 80 Gaussian, 59, 117, 120, 149, 163, 176 measurement, 2, 21, 225, 271 non-Gaussian, 225, 226 white, 55, 80, 89, 225 zero-mean Gaussian, 32, 41, 109, 235, 238, 286 Nonisothermal CSTR model, 41, 69, 109 process, 60, 119, 130 Nonlinear behavior, 50, 189 extension, 116 fault detection, 62, 64, 73, 132, 214 mapping, 51, 66, 226 modeling technique, 114 optimization procedure, 50 PCA model, 30, 31 PLS, 64 models, 128 regression, 204 process variables, 64, 73 regression, 66 simulation example, 57, 116, 198 state variables, 225, 226 Nonlinearities, 6, 66, 114, 129, 204, 206, 259 process, 5 Normal operating condition (NOC), 136 Numerical data matrix, 140, 191

O Optimized EWMA, 223, 231 statistic, 231 Outlier detection method, 273, 283 Ozone, 24, 28, 29, 168, 276, 278 concentration data, 24 concentrations, 26, 29, 156, 165 data, 24 formation process, 26, 31 levels, 23, 25, 26, 28 measurements, 23–26 modeling, 25 monitoring, 23, 31 photochemical, 28, 29 stratospheric, 29

294

Index

P

R

Partial least squares (PLS), 5, 12, 38, 43, 63, 80, 136, 170, 175, 263, 270, 272 Particle filtering (PF), 3, 6, 222, 225, 228, 259, 261, 263, 267, 268 PCA model, 15, 16, 22, 29, 136, 143 interval, 144–146, 149, 153 linear, 30, 32, 116 multiscale, 82 nonlinear, 30, 31 Performance EWMA, 231 fault detection, 41–43, 45, 50, 63, 65, 74, 80, 83, 84, 86, 92, 96, 100, 107, 109–111, 114, 136, 187, 214, 246 GLRT, 63, 74 process, 12 results, 212 PF method, 223, 224, 228, 238, 246, 250, 267 Plant model, 269 Plant variables, 16, 267 wastewater treatment, 269 PLS model, 38, 39, 41, 65, 100, 104, 137, 186 linear, 122, 129 Principal component analysis (PCA), 5, 12, 13, 36, 37, 63, 80, 136, 263, 271 Principal components (PC), 15, 197, 198 Process control, 119 data, 21, 32, 80, 94, 119, 120, 259, 272 fault, 2, 22 detection, 65 measurements, 2, 99, 269 model, 2, 5, 12, 21, 36, 221, 250 modeling, 141, 143, 145, 189, 274, 283 monitoring, 1, 2, 12, 13, 32, 82, 100, 136, 140, 141, 189, 221, 271, 276 multiobjective optimization, 231 nonlinearities, 5 performance, 12 residuals, 102 variables, 2, 12, 15, 18, 41, 51, 64, 102, 143, 230, 267–269 wastewater treatment, 269

Radial Basis Function (RBF), 51 Reactor, 33, 42, 60, 69, 109, 119, 130 temperature, 34 Receiver Operating Characteristic (ROC), 35 Regressor data, 66 Regulatory protein, 249, 255 Residuals, 2, 21, 22, 32, 38, 59, 95, 98, 141, 146, 173, 174, 194, 207, 209, 230, 265, 269 interval, 145, 149, 161, 172–174, 177, 179, 180, 185, 186, 207 monitored, 222, 255, 260 process, 102

S Satisfactory performance, 3, 228 Sensor fault, 1, 29, 45, 60, 63, 65, 74, 119, 130, 153, 157, 167, 222, 268, 269 detection, 274, 283 isolation, 274, 283 Sequential Monte Carlo method (SMC), 3 Shewhart chart, 4, 160, 223, 230 Signal-to-noise ratio, 235, 238 Simulation example, 148, 149, 162, 163, 175, 198, 246 SPE charts, 59, 60, 117, 119, 121 Square-root central difference Kalman filter (SRCDKF), 222 Square-root unscented Kalman filter (SRUKF), 222 State estimation, 3, 221, 225, 250, 270 approach, 230, 240 methods, 222 phase, 229 problem, 225, 228 results, 234, 252 techniques, 3, 6, 221, 222, 226, 230, 233, 255, 261, 263, 267, 268 State variables, 2, 6, 32, 41, 222, 223, 225, 235, 246, 259, 261 CSEC, 250 Statistic detection, 160, 230 EWMA, 4, 160, 161, 223, 224, 231, 233 GLR, 41

Index

GLRT, 72, 86, 87, 96–98, 115, 202, 271, 272 Stirred tank reactor data, 41, 109 Stirred tank reactor process (CSTR), 50, 115 Sum of Squares Double EWMA (SS-DEWMA), 243 Support vector machine (SVM), 5 Synthetic data, 37, 57, 63, 80, 86, 116, 175, 190, 260 example, 50, 115, 126, 175, 224, 245, 255

T TE process, 210 monitoring, 95 Tennessee Eastman Process (TEP), 41, 43, 50, 80, 94, 100, 109, 110, 112, 115, 119–121, 132, 201, 260, 279 benchmark, 190, 214 data, 95, 120 Transport protein, 224, 246, 249, 255

U Uniformly best constant powerful (UBCP), 19 Uniformly most powerful (UMP), 19 Univariate charts, 212, 259 GLRT, 207, 210

295

chart, 207, 210 statistic chart, 146 weighted statistic chart, 147 Unscented Kalman filter (UKF), 2, 3, 222, 225, 226, 261, 264, 268

V Vertices principal component analysis (VPCA), 140

W Wastewater plant, 270 data, 267 Wastewater treatment, 233, 234, 238, 267, 268 data, 235 plant, 221, 224, 233, 255, 260, 266–269 data, 269 variables, 269 process, 269 Water distribution, 260, 262–265 performances, 265, 266 Water distribution systems (WDS), 260, 266 Water quality, 260, 261, 263, 267, 269 monitoring, 265 Wavelet coefficients, 80, 82, 100, 104, 124, 132, 232, 233 WDS data, 261 Window length (WL), 84