Smart Grid Resilience: Extreme Weather, Cyber-Physical Security, and System Interdependency 3031292898, 9783031292897

This book provides a comprehensive overview and in-depth discussion of smart grid resilience. It covers the three most c

262 117 11MB

English Pages 285 [286] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Smart Grid Resilience: Extreme Weather, Cyber-Physical Security, and System Interdependency
 3031292898, 9783031292897

Table of contents :
Preface
Contents
Part I Extreme Weather and Cascading Failure
1 Cascading Failures Under Extreme Temperatures
1.1 Introduction
1.2 Ambient Temperature in Blackout Modeling
1.2.1 Temperature Disturbance
1.2.2 Load Change Under Temperature Disturbances
1.2.3 Dynamic Line Rating
1.2.4 Probability of Line Tripping
1.2.5 Probability of Generator Tripping
1.3 Modeling Protection and Control Strategies
1.3.1 Undervoltage Load Shedding
1.3.2 Operator Re-dispatch
1.4 Timing of Events
1.5 Voltage Stability Margin Calculation
1.6 Blackout Model Considering Temperature
1.7 Simulation Results
1.7.1 Model Implementation and Parameter Settings
1.7.2 Typical Simulation Run Without Operator Re-dispatch
1.7.3 Typical Simulation Run with Operator Re-dispatch
1.7.4 Number of Simulations
1.7.5 Impact of Temperature Disturbances and Size of Selected Area
1.7.6 Identifying the Most Vulnerable Buses/Locations
1.7.7 Impact of Control Strategies
References
2 Cascading Failure Interaction Analysis
2.1 Introduction
2.2 Estimating Interactions Between Component Failures
2.3 EM Algorithm
2.3.1 A Coin-Flipping Example
2.3.2 Mathematical Foundation
2.4 Estimating Component Failure Interactions by EM Algorithm
2.5 Determining the Number of Cascades Needed
2.5.1 Lower Bound of M
2.5.2 Lower Bound of Mu
2.6 Results
2.6.1 Number of Cascades Needed
2.6.2 Interaction Matrix and Interaction Network
2.6.3 Identified Key Links and Key Components
2.6.4 Validation of Estimated Interactions
2.6.5 Cascading Failure Mitigation
2.6.6 Efficiency Improvement
References
3 Integrated Preventive and Emergency Responses
3.1 Introduction
3.2 Integrated Resilience Response
3.3 Mathematical Formulation
3.3.1 Preventive Response
3.3.2 Damage From Natural Disasters
3.3.3 Emergency Response
3.3.4 Integration of Preventive and Emergency Responses
3.4 Solution Methodology
3.4.1 NC&CG Decomposition-Based Algorithm
3.4.2 Computational Efficiency Improvement Techniques
3.5 Results on PJM Five-Bus System
3.6 Results on IEEE One-Area RTS-96 System
3.7 Results on IEEE Three-Area RTS-96 System
Appendix 1: PJM Five-Bus System
Appendix 2: IEEE One-Area RTS-96 System
References
Part II Cybersecurity of Smart Grid Monitoring
4 Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation
4.1 Introduction
4.2 Power System Dynamic Model
4.2.1 10th-Order Nonlinear Power System Model
4.2.2 Linearized Power System Model
4.3 Unknown Inputs & Attack-Threat Model
4.3.1 Modeling Unknown Inputs
4.3.2 Modeling Cyber Attacks
4.4 DSE under UIs and CAs
4.4.1 Sliding-Mode Observer for Power Systems
4.4.2 SMO Dynamics & Design Algorithm
4.5 Asymptotic Reconstruction of UIs & CAs
4.5.1 Estimating Unknown Inputs
4.5.2 Estimating CAs
4.5.3 Attack Detection Filter
4.6 Risk Mitigation—A Dynamic Response Model
4.6.1 Weighted Deterministic Threat Level Formulation
4.6.2 Dynamic Risk Mitigation Optimization Problem
4.6.3 Dynamic Risk Mitigation Algorithm
4.7 Case Studies
4.7.1 Scenario I: Dynamic Reconstruction of UI & DSE
4.7.2 Scenario II: DSE Under UIs & CAs
References
5 Comparing Kalman Filters and Observers Against Cyber Attacks
5.1 Introduction
5.2 4th-Order Nonlinear Power System Model
5.3 Model Uncertainty and Cyber Attacks
5.3.1 Model Uncertainty
5.3.2 Cyber Attacks
5.4 DSE Algorithms
5.4.1 Kalman Filters for Power System DSE
5.4.1.1 EKF
5.4.1.2 UKF
5.4.1.3 CKF
5.4.2 Nonlinear Observers for Power System DSE
5.5 Numerical Results
5.5.1 Scenario 1: Data Integrity Attack
5.5.2 Scenario 2: DoS Attack and Scenario 3: Replay Attack
5.5.3 Discussion on Model Uncertainty Estimation
5.5.4 Discussion on Cyber Attack Detection
5.5.5 Non-Gaussian Measurement Noise
5.5.6 Computational Efficiency
5.6 Summary
References
6 Self-Healing PMU Network Against Cyber Attacks
6.1 Introduction
6.2 Motivation for Self-Healing PMU Network
6.3 System Model
6.3.1 Power System Observability
6.3.2 Rules in Network Switches
6.4 Optimization Formulation
6.4.1 PMU Connection Status Constraints (PCSCs)
6.4.2 Power System Observability Constraints (PSOCs)
6.4.3 PDC Connection Space Constraints (PSCs)
6.4.4 PMU Reconnection Constraints (PRCs)
6.4.5 Switch Rule Space Capacity Constraints (SRSCCs)
6.4.6 Routing Policy Constraints (RPCs)
6.4.7 Endpoint Policy Constraints (EPCs)
6.4.8 Optimization for Self-Healing Mechanism
6.4.8.1 Stage 1: Recover System Observability
6.4.8.2 Maximize System Observability
6.5 Greedy Heuristic Algorithm
6.6 Case Study on IEEE 30-Bus System
6.7 Performance Evaluation of Stage 1
6.7.1 Impact of Scale of Attacks
6.7.2 Impact of Hardware Resources
6.8 Performance Evaluation of Stage 2
References
Part III Cyber-Physical Security for Distributed Energy Resources
7 Cyber-Physical Security Research Framework for Distributed Energy Resources
7.1 Introduction
7.2 Cyber-Physical Power System with Large-Scale DER Deployments
7.2.1 Generic Architecture of Power Systems with DERs
7.2.2 Challenges of Maintaining DER Cybersecurity
7.3 Overview of DER Cyber-Physical Security Research Framework
7.4 Potential Cyber Attacks on Cyber-Physical Power System with DERs
7.4.1 Cyber-Physical-Threat Modeling
7.4.2 Threat Scenarios Targeting DER
7.4.3 Attack Threat Ranking
7.5 Attack Impact Analysis and Metrics
7.6 DER Cyber-Physical Security Design Principles
7.7 Attack Resilience at Cyber, Physical Device, and Utility Layers
7.7.1 Cyber Layer Attack Resilience
7.7.1.1 Cyber Layer Attack Prevention
7.7.1.2 Cyber Layer Attack Detection
7.7.1.3 Cyber Layer Attack Response
7.7.2 Physical Device Layer Attack Resilience
7.7.2.1 Physical Device Layer Attack Prevention
7.7.2.2 Physical Device Layer Attack Detection
7.7.2.3 Physical Device Layer Attack Response
7.7.3 Utility Layer Attack Resilience
7.7.3.1 Utility Layer Attack Prevention
7.7.3.2 Utility Layer Attack Detection
7.7.3.3 Utility Layer Attack Response
References
8 Distributed Load Sharing Under Cyber Attacks
8.1 Introduction
8.2 Inverter-Based Microgrid Structure
8.2.1 Physical Layer
8.2.2 Cyber Layer
8.3 System Dynamic Model
8.3.1 Small-Signal Model
8.3.2 Active Power Reference
8.4 System Performance Under Attack
8.4.1 FDI Attack Against Distributed Load Sharing Control
8.4.2 Effects of FDI Attack on Microgrid Performance
8.5 Case Studies
8.5.1 Stable Region
8.5.2 System Performance Under Attack Strategy 1
8.5.3 System Performance under Attack Strategy 2
References
9 Deep Learning Based Attack Detection for Microgrid Control
9.1 Introduction
9.2 Distributed Control and FDI Attack
9.2.1 Cyber-Physical Representation of AC Microgrids
9.2.2 Secondary Control Problem Formulation
9.2.3 Distributed Control Algorithm
9.2.4 FDI Attack Against Distributed Control
9.3 Deep Learning Based Multi-label Attack Detection
9.3.1 Multi-label Classification Problem Formation
9.3.2 Data Preparation and Preprocessing
9.3.3 Deep Learning Models
9.4 Performance Evaluation
9.4.1 Test System and Control Performance
9.4.2 Deep Learning Performance Metrics
9.4.3 FDI Attack Detection Results
References
Part IV Smart Grid Resilience Under System Interdependency
10 Interdependency Between Power System Outages by Branching Process
10.1 Introduction
10.2 Estimating Multi-type Branching Process Parameters
10.3 Estimating Joint Probability Distribution of Total Outages
10.3.1 n-Type Branching Process
10.3.2 Two-Type Branching Process
10.3.3 Validation of Estimated Joint Distribution
10.4 Number of Cascades Needed
10.4.1 Determining Lower Bound for M
10.4.2 Determining Lower Bound for Mu
10.5 Estimated Parameters of Branching Processes
10.6 Estimated Joint Distribution of Total Outages
10.7 Predicted Joint Distribution from One Type of Outage
10.8 Estimated Propagation of Three Types of Outages
References
11 Interdependency Between Power System Outages by Coupled Interaction Model
11.1 Introduction
11.2 Coupled Interaction Matrix
11.2.1 Definition of Coupled Interaction Matrix
11.2.2 Definition of an Auxiliary Matrix
11.3 Estimating Coupled Interaction Matrix by EM Algorithm
11.4 Coupled Interaction Model for Cascading Failure Simulation
11.5 Critical Link Identification
11.6 Coupled Interaction Network of IEEE 300-Bus System
11.7 Validation of Coupled Interaction Model
11.8 Choosing Critical Links for Mitigation
11.9 Cascading Failure Mitigation
Appendix: Discretization Unit for Each Load Bus
References
12 Interdependency Between Smart Grid and Transportation Network
12.1 Introduction
12.2 Mathematical Modeling
12.2.1 Renewable Energy Investor Modeling
12.2.2 Conventional Generators
12.2.3 ISO Modeling
12.2.4 Driver Modeling
12.2.5 Market Clearing Conditions
12.3 Computational Approach
12.4 Results on Three-Node Test System
12.4.1 Effects on Equilibrium Prices
12.4.2 Effects on Renewable Investment
12.4.3 Effects on System Costs
12.4.4 Effects on Flow Distribution
12.5 Results on Sioux Falls Road Network and IEEE 39-Bus Test System
Appendix: Proofs
References
Index

Citation preview

Junjian Qi

Smart Grid Resilience

Extreme Weather, Cyber-Physical Security, and System Interdependency

Smart Grid Resilience

Junjian Qi

Smart Grid Resilience Extreme Weather, Cyber-Physical Security, and System Interdependency

Junjian Qi Electrical and Computer Engineering Stevens Institute of Technology Hoboken, NJ, USA

ISBN 978-3-031-29289-7 ISBN 978-3-031-29290-3 https://doi.org/10.1007/978-3-031-29290-3

(eBook)

© Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To Yaming, Noah, and Nolan

Preface

Smart grid resilience is an emerging topic that is of great concern in academia, industry, and society. This is largely due to several major new challenges faced by smart grid monitoring, protection, and control. These challenges include but are not limited to the increasing frequency of extreme weather conditions, the growing risk of cyber-physical security with more control and communication integrated into the system, and the increasingly tightening system interdependency, both within the smart grid itself and between smart grid and other critical infrastructure systems. One or a combination of these challenges could lead to extremely low probability yet very high impact extreme events in smart grid. This book is dedicated to better understanding these new challenges and hopefully providing some useful tools or methods to help analyze, evaluate, and finally enhance smart grid resilience. Specifically, this book has four parts and each focuses on a particular area related to smart grid resilience. • Part I covers extreme weather and cascading failure, in which Chap. 1 presents a blackout model that explicitly considers extreme temperature, Chap. 2 estimates the component interactions in cascading failure, and Chap. 3 introduces an integrated resilience response framework against extreme weather conditions. • Part II covers cybersecurity of smart grid monitoring, in which Chap. 4 introduces a dynamic state estimation based risk mitigation strategy, Chap. 5 compares Kalman filters and observers against cyber attacks, and Chap. 6 presents a selfhealing attack-resilient phasor measurement unit (PMU) network. • Part III covers cyber-physical security for distributed energy resources (DERs), in which Chap. 7 provides an overview of a DER cyber-physical security research framework, Chap. 8 discusses distributed load sharing under false data injection attacks, and Chap. 9 introduces a deep learning based attack detection method for microgrid control. • Part IV covers smart grid resilience under system interdependency, in which Chaps. 10 and 11 discuss the interdependency between power system outages, respectively, by multi-type branching process and a coupled interaction model,

vii

viii

Preface

and Chap. 12 further discusses the interdependency between smart grid and transportation network. I would like to extend my thanks to my collaborators (Drs. Kai Sun, Jianhui Wang, Ahmad F. Taha, Adam Hahn, Chen-Ching Liu, Gang Huang, Hui Lin, Chen Chen, Xiaonan Lu, Zhaomiao Guo, Heng Zhang, Wenchao Meng, Yufei Tang, Wenyun Ju, Leibao Wang, Bo Hu, Kaigui Xie, and many others) and my previous students (Drs. Sheik M. Mohiuddin and Seyyed Rashid Khazeiynasab) that have greatly contributed to the research covered in this book. I also wish to thank the Energy, Power, Control, and Networks (EPCN) program at National Science Foundation and U.S. Department of Energy’s Office of Electricity Delivery and Energy Reliability for their support of my research. Hoboken, NJ, USA

Junjian Qi

Contents

Part I Extreme Weather and Cascading Failure 1

2

Cascading Failures Under Extreme Temperatures . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Ambient Temperature in Blackout Modeling . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Temperature Disturbance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Load Change Under Temperature Disturbances . . . . . . . . . . 1.2.3 Dynamic Line Rating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Probability of Line Tripping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.5 Probability of Generator Tripping . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Modeling Protection and Control Strategies . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Undervoltage Load Shedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Operator Re-dispatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Timing of Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Voltage Stability Margin Calculation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Blackout Model Considering Temperature. . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.1 Model Implementation and Parameter Settings . . . . . . . . . . . 1.7.2 Typical Simulation Run Without Operator Re-dispatch . . 1.7.3 Typical Simulation Run with Operator Re-dispatch . . . . . . 1.7.4 Number of Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.5 Impact of Temperature Disturbances and Size of Selected Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.6 Identifying the Most Vulnerable Buses/Locations . . . . . . . . 1.7.7 Impact of Control Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 3 4 5 7 8 9 10 12 12 12 13 15 16 18 18 18 21 23

Cascading Failure Interaction Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Estimating Interactions Between Component Failures . . . . . . . . . . . . . 2.3 EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 A Coin-Flipping Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31 31 32 33 33

24 26 26 27

ix

x

Contents

2.3.2 Mathematical Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimating Component Failure Interactions by EM Algorithm . . . . Determining the Number of Cascades Needed . . . . . . . . . . . . . . . . . . . . . 2.5.1 Lower Bound of M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Lower Bound of Mu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Number of Cascades Needed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 Interaction Matrix and Interaction Network . . . . . . . . . . . . . . . 2.6.3 Identified Key Links and Key Components . . . . . . . . . . . . . . . 2.6.4 Validation of Estimated Interactions . . . . . . . . . . . . . . . . . . . . . . . 2.6.5 Cascading Failure Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.6 Efficiency Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36 36 38 38 39 40 40 41 42 43 44 45 46

Integrated Preventive and Emergency Responses . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Integrated Resilience Response. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Mathematical Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Preventive Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Damage From Natural Disasters . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Emergency Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Integration of Preventive and Emergency Responses . . . . . 3.4 Solution Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 NC&CG Decomposition-Based Algorithm . . . . . . . . . . . . . . . 3.4.2 Computational Efficiency Improvement Techniques . . . . . 3.5 Results on PJM Five-Bus System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Results on IEEE One-Area RTS-96 System . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Results on IEEE Three-Area RTS-96 System . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47 47 48 50 50 51 52 54 55 55 57 59 62 63 66

2.4 2.5

3

Part II Cybersecurity of Smart Grid Monitoring 4

Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Power System Dynamic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 10th-Order Nonlinear Power System Model . . . . . . . . . . . . . . 4.2.2 Linearized Power System Model . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Unknown Inputs & Attack-Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Modeling Unknown Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Modeling Cyber Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 DSE under UIs and CAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Sliding-Mode Observer for Power Systems . . . . . . . . . . . . . . . 4.4.2 SMO Dynamics & Design Algorithm . . . . . . . . . . . . . . . . . . . . . 4.5 Asymptotic Reconstruction of UIs & CAs . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Estimating Unknown Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71 71 72 72 76 76 77 78 79 79 80 82 82

Contents

5

6

xi

4.5.2 Estimating CAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.3 Attack Detection Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Risk Mitigation—A Dynamic Response Model . . . . . . . . . . . . . . . . . . . . 4.6.1 Weighted Deterministic Threat Level Formulation . . . . . . . 4.6.2 Dynamic Risk Mitigation Optimization Problem . . . . . . . . . 4.6.3 Dynamic Risk Mitigation Algorithm . . . . . . . . . . . . . . . . . . . . . . 4.7 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Scenario I: Dynamic Reconstruction of UI & DSE . . . . . . . 4.7.2 Scenario II: DSE Under UIs & CAs . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83 84 84 85 85 87 89 89 92 95

Comparing Kalman Filters and Observers Against Cyber Attacks . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 4th-Order Nonlinear Power System Model . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Model Uncertainty and Cyber Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Model Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Cyber Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 DSE Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Kalman Filters for Power System DSE . . . . . . . . . . . . . . . . . . . . 5.4.2 Nonlinear Observers for Power System DSE . . . . . . . . . . . . . 5.5 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Scenario 1: Data Integrity Attack . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Scenario 2: DoS Attack and Scenario 3: Replay Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Discussion on Model Uncertainty Estimation . . . . . . . . . . . . . 5.5.4 Discussion on Cyber Attack Detection . . . . . . . . . . . . . . . . . . . . 5.5.5 Non-Gaussian Measurement Noise . . . . . . . . . . . . . . . . . . . . . . . . 5.5.6 Computational Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99 99 100 102 102 103 104 104 107 111 113

Self-Healing PMU Network Against Cyber Attacks . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Motivation for Self-Healing PMU Network . . . . . . . . . . . . . . . . . . . . . . . . 6.3 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Power System Observability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Rules in Network Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Optimization Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 PMU Connection Status Constraints (PCSCs) . . . . . . . . . . . . 6.4.2 Power System Observability Constraints (PSOCs) . . . . . . . 6.4.3 PDC Connection Space Constraints (PSCs) . . . . . . . . . . . . . . . 6.4.4 PMU Reconnection Constraints (PRCs). . . . . . . . . . . . . . . . . . . 6.4.5 Switch Rule Space Capacity Constraints (SRSCCs) . . . . . . 6.4.6 Routing Policy Constraints (RPCs) . . . . . . . . . . . . . . . . . . . . . . . . 6.4.7 Endpoint Policy Constraints (EPCs) . . . . . . . . . . . . . . . . . . . . . . . 6.4.8 Optimization for Self-Healing Mechanism . . . . . . . . . . . . . . . .

125 125 126 129 129 130 131 133 133 134 135 136 136 136 137

115 115 116 119 120 120 122

xii

Contents

6.5 6.6 6.7

Greedy Heuristic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Case Study on IEEE 30-Bus System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Performance Evaluation of Stage 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7.1 Impact of Scale of Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7.2 Impact of Hardware Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Performance Evaluation of Stage 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

139 139 142 142 144 146 148

Part III Cyber-Physical Security for Distributed Energy Resources 7

8

Cyber-Physical Security Research Framework for Distributed Energy Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Cyber-Physical Power System with Large-Scale DER Deployments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Generic Architecture of Power Systems with DERs . . . . . . 7.2.2 Challenges of Maintaining DER Cybersecurity . . . . . . . . . . . 7.3 Overview of DER Cyber-Physical Security Research Framework 7.4 Potential Cyber Attacks on Cyber-Physical Power System with DERs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Cyber-Physical-Threat Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.2 Threat Scenarios Targeting DER. . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.3 Attack Threat Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Attack Impact Analysis and Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 DER Cyber-Physical Security Design Principles . . . . . . . . . . . . . . . . . . . 7.7 Attack Resilience at Cyber, Physical Device, and Utility Layers . . 7.7.1 Cyber Layer Attack Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.2 Physical Device Layer Attack Resilience . . . . . . . . . . . . . . . . . 7.7.3 Utility Layer Attack Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distributed Load Sharing Under Cyber Attacks . . . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Inverter-Based Microgrid Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Physical Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Cyber Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 System Dynamic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Small-Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Active Power Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 System Performance Under Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.1 FDI Attack Against Distributed Load Sharing Control . . . 8.4.2 Effects of FDI Attack on Microgrid Performance . . . . . . . . 8.5 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.1 Stable Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.2 System Performance Under Attack Strategy 1 . . . . . . . . . . . .

153 153 154 154 157 157 160 160 161 165 165 167 168 168 172 174 177 181 181 182 182 183 184 184 186 187 187 189 190 191 192

Contents

xiii

8.5.3 System Performance under Attack Strategy 2 . . . . . . . . . . . . . 193 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 9

Deep Learning Based Attack Detection for Microgrid Control . . . . . . . 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Distributed Control and FDI Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Cyber-Physical Representation of AC Microgrids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Secondary Control Problem Formulation . . . . . . . . . . . . . . . . . 9.2.3 Distributed Control Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.4 FDI Attack Against Distributed Control . . . . . . . . . . . . . . . . . . 9.3 Deep Learning Based Multi-label Attack Detection . . . . . . . . . . . . . . . . 9.3.1 Multi-label Classification Problem Formation . . . . . . . . . . . . 9.3.2 Data Preparation and Preprocessing . . . . . . . . . . . . . . . . . . . . . . . 9.3.3 Deep Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Performance Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.1 Test System and Control Performance. . . . . . . . . . . . . . . . . . . . . 9.4.2 Deep Learning Performance Metrics . . . . . . . . . . . . . . . . . . . . . . 9.4.3 FDI Attack Detection Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

201 201 202 202 203 204 205 206 206 206 207 208 208 210 210 211

Part IV Smart Grid Resilience Under System Interdependency 10

11

Interdependency Between Power System Outages by Branching Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Estimating Multi-type Branching Process Parameters . . . . . . . . . . . . . 10.3 Estimating Joint Probability Distribution of Total Outages . . . . . . . . 10.3.1 n-Type Branching Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.2 Two-Type Branching Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.3 Validation of Estimated Joint Distribution . . . . . . . . . . . . . . . . 10.4 Number of Cascades Needed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.1 Determining Lower Bound for M . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.2 Determining Lower Bound for Mu . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Estimated Parameters of Branching Processes . . . . . . . . . . . . . . . . . . . . . . 10.6 Estimated Joint Distribution of Total Outages . . . . . . . . . . . . . . . . . . . . . . 10.7 Predicted Joint Distribution from One Type of Outage. . . . . . . . . . . . . 10.8 Estimated Propagation of Three Types of Outages . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

217 217 218 221 221 222 223 224 225 225 226 228 230 233 234

Interdependency Between Power System Outages by Coupled Interaction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Coupled Interaction Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 Definition of Coupled Interaction Matrix . . . . . . . . . . . . . . . . . 11.2.2 Definition of an Auxiliary Matrix. . . . . . . . . . . . . . . . . . . . . . . . . .

237 237 238 239 240

xiv

Contents

11.3 11.4

12

Estimating Coupled Interaction Matrix by EM Algorithm . . . . . . . . . Coupled Interaction Model for Cascading Failure Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Critical Link Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Coupled Interaction Network of IEEE 300-Bus System . . . . . . . . . . . 11.7 Validation of Coupled Interaction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.8 Choosing Critical Links for Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.9 Cascading Failure Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

241

Interdependency Between Smart Grid and Transportation Network 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Mathematical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 Renewable Energy Investor Modeling . . . . . . . . . . . . . . . . . . . . . 12.2.2 Conventional Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.3 ISO Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.4 Driver Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.5 Market Clearing Conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Computational Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Results on Three-Node Test System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.1 Effects on Equilibrium Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.2 Effects on Renewable Investment . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.3 Effects on System Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.4 Effects on Flow Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Results on Sioux Falls Road Network and IEEE 39-Bus Test System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

261 261 264 264 266 266 267 269 270 273 274 275 276 276

244 246 249 251 252 254 258

277 280

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

Part I

Extreme Weather and Cascading Failure

Chapter 1

Cascading Failures Under Extreme Temperatures

1.1 Introduction Cascading failure is a common phenomenon in both natural and engineered systems, such as electric power systems [1], natural gas systems [2], transportation networks [3], disease transmission networks [4], and interdependent networks [5]. For example, there have been several large-scale blackouts, such as the 2003 U.S.Canadian blackout [6], the 2011 Arizona-Southern California blackout [7], and the 2012 Indian blackout [8], which have led to many component failures, extensive outage propagation, and significant economic losses and social impacts. It is thus critical to understand why and how cascading failures blackouts can happen and to further propose effective prevention and mitigation measures to greatly reduce the cascading risk and enhance the power grid resilience. In order to simulate and analyze cascading failures, many models with different levels of details have been developed [1], such as Manchester model [9], hidden failure model [10, 11], CASCADE model [12], OPA model [13], AC OPA model [14], dynamic model [15], cascading failure model with detailed protection systems [16], sandpile model [17], branching process model [18, 19], multi-type branching process model [20], and the interaction model [21–24]. However, most existing models mainly focus on the system itself, usually ignoring the interactions between the system and various external factors, such as extreme weather conditions. These factors are important for both initiating and propagation of cascading failures. For U.S.-Canadian blackout on August 14, 2003, the temperature was high (.31 ◦ C), causing load increase in FirstEnergy’s control area, transmission line tripping due to tree contact, and generator tripping due to increased reactive power outputs [6]. Another blackout occurred partly because of temperature disturbance on July 2, 1996. High loads in Southern Idaho and Utah due to high temperature (around .38 ◦ C) [25, 26] led to high demands and highly loaded transmission lines. In recent years, a few papers have investigated the impact of temperature on cascading failure risks. In [27], a stochastic model is proposed in which random © Springer Nature Switzerland AG 2023 J. Qi, Smart Grid Resilience, https://doi.org/10.1007/978-3-031-29290-3_1

3

4

1 Cascading Failures Under Extreme Temperatures

line failures are generated at constant failure rates and overloaded-line failures occur when the line temperature reaches the equilibrium temperature. In [28], an OPA model with slow process is proposed in which the line temperature evolution is modeled for calculating the line length and sag changes in order to evaluate the possibility for tree contact or damage. In [29], a PRA model is developed to consider the impact of wind speed and the evolution of line temperature. In [30], risk assessment of weather-related cascading outages is presented based on weatherdependent outage rates. In [31], historical outage data is used to estimate the effects of weather on cascading failure and bulk statistics of historical initial line outages are provided. However, most existing models have major limitations. First, the initiating events are usually generated by random sampling, which does not consider the important geographical correlations of the initiating events due to external weather conditions such as temperature disturbance. Also, the ambient temperature disturbance at various geographical locations in the system is not explicitly modeled, and the consequent demand changes and dynamic rating changes are not modeled, which, however, are very critical for understanding the initiating and propagation of cascading failures especially due to loss of voltage stability. Therefore, to better understand cascading failure it is needed a model that explicitly considers ambient temperature disturbances and their impacts on system operation and cascading failure risks. In this chapter, a cascading failure model is presented to explicitly consider ambient temperature disturbances and the subsequent demand change and dynamic line rating changes to take into account the correlations between different events such as line outage, generator tripping, and undervoltage of load buses [32]. Based on this cascading failure model an explanation is provided about why the failure can still be initiated and propagated even when the power system is initially .N −1 secure by considering the impact of ambient temperature disturbances and the correlations between different events. Risk assessment is performed for power systems based on the cascading failure model to investigate critical temperature change and critical area with temperature disturbance that could lead to significantly increased risk of cascading, identify the most vulnerable buses for temperature disturbances, and evaluate the effectiveness of different control strategies. The notations used in this chapter are listed in Table 1.1.

1.2 Ambient Temperature in Blackout Modeling Assume there are n buses in a power system, including a slack bus that is numbered as .is . The vector of ambient temperatures of all buses is .T = [T1 , T2 , . . . , Tn ] . The vector of ambient temperatures of the load buses is denoted by .T SL . For a transmission line .l : i → j that connects bus i and bus j and crosses M areas, its ambient temperature is assumed to be dependent on the temperatures of the M areas.

1.2 Ambient Temperature in Blackout Modeling

5

Table 1.1 Summary of notations .φc /.λc

Latitude/longitude of the selected load bus Lower/upper bound for latitude in a selected area Minimum/maximum latitude among all buses Lower/upper bound for longitude in selected area Minimum/maximum longitude among all buses A constant that defines the size of the selected area, .0 < γ ≤ 1 Ambient temperature where bus i is located Initial ambient temperature for all load bus Temperature/initial temperature of line .l : i → j

.φ/.φ .φ

min /φ max

.λ/.λ .λ

min /.λmax

.γ .Ti .T

0 0

.Tij /.Tij .Dij

R d d .Δφi,j /.Δλi,j .Pi /.Qi

0

.Pi

.Pi .Q /.Qi

i

0

.pfi /.pfi .Fij

0

d

.F ij /.F ij

0

rated

.Vij /.Vij /.Vij

trip .Pij trip .Pi sh sh .ΔPi /.ΔQi .Ksh

0

.PP /.PPi i

.η .o ¯ ij /.o¯ g .Δoij,m /.Δog,m

A

.SB /.SL /.SL /.SG /.Sline

Distance between buses i and j Radius of the earth Difference between the latitude/longitude of buses i and j Active/reactive power of load bus i Initial active power at bus i Active power capacity of generator i Lower/upper reactive power capacity for generator i Power factor of bus i/initial power factor of bus i under initial temperature Power flow of line .l : i → j Initial/dynamic rating of line .l : i → j Per unit voltage/initial per unit voltage/rated voltage of line .l : i → j Tripping probability of line .l : i → j Tripping probability of generator i Active/reactive power to be shed at bus i Load shedding constant Real power output of generator .GPi before/after re-dispatch A constant that compensates the flow adjustment error due to the nonlinear nature of the power flow Overload limit of line .l : i → j /generator g Accumulated overload in iteration m (between .tm−1 and .tm ) for line .l : i → j /generator g Set of buses, load buses, load buses in the selected area, generator buses, and transmission lines

1.2.1 Temperature Disturbance An ambient temperature disturbance is applied to an area .A = [φ, φ] × [λ, λ]. In order to make sure at least one load bus is inside the chosen area, one of the load buses is randomly selected and an area around this bus is chosen. The chosen area is set as .φ = φc − Δφ, .φ = φc + Δφ, .λ = λc − Δλ, and .λ = λc + Δλ where .Δφ > 0 and .Δλ > 0 determine how widespread the disturbance is and are chosen as

6

1 Cascading Failures Under Extreme Temperatures .

Δφ = γ (φ max − φ min ).

(1.1)

Δλ = γ (λmax − λmin ).

(1.2)

As a disturbance the ambient temperature of the load buses in the selected subsystem is changed by .ΔT . Obviously the ambient temperature of the transmission lines in the selected subsystem will also change by .ΔT . For a line .l : i → j that crosses the boundary of the selected area and lies in M areas, its length in area k with ambient temperature of .Tk is .dk and its temperature is determined by Tij =

.

M 1  Tk dk , Dij

(1.3)

k=1

where bus i is assumed to be inside the selected area while bus j is not and .Dij is the distance between buses i and j . As a simple case, for a line .l : i → j that only crosses two areas, its temperature is given as d1 d2 Ti + Tj , Dij Dij

Tij =

.

(1.4)

where .d1 is the length of the line in the selected area and .d2 is the length of the line out of the selected area. Then with a .ΔT change for bus i the ambient temperature of the line will change by ΔTij =

.

d1 ΔT . Dij

(1.5)

For calculating the distance between two buses, with specific latitude and longitude, assume that the earth is a sphere with a radius of 6378 km. The distance is calculated by using haversine formula in (1.6)–(1.9) [33]. Let the central angle .Θ between two buses i and j be: Θ=

.

Dij . R

(1.6)

The haversine formula (.hav of .Θ) is: d hav(Θ) = hav(Δφi,j ) + cos φi cos φj hav(Δλdi,j ),

.

(1.7)

and the haversine function of an angle .θ is:   θ . .hav(θ ) = sin 2 2

(1.8)

1.2 Ambient Temperature in Blackout Modeling

7

Finally, by applying the inverse haversine .hav−1 to the central angle .Θ, the distance .Dij is calculated as −1

Dij = 2 R sin

.

  2

sin

d Δφi,j

2

+ cos φi cos φj sin

2

Δλdi,j 2

 .

Note that a relatively large ambient temperature change could take a few hours during which some protections may operate. However, from the modeling perspective it may be too complicated if the temporal behavior of the ambient temperature change and its impact on the risk of cascading failures are considered. Therefore, a simplified scenario is considered in which the ambient temperature change happens immediately and the focus is more on what impact it will have on system operation and cascading failure risks.

1.2.2 Load Change Under Temperature Disturbances In power systems, load forecasting is used for day-ahead generation purchases and reactive power management. However, due to uncertainties the actual load may be different from the forecasted load. For example, several large operators in the Midwest consistently under-forecasted the load levels between August 11 and 14, 2003 [6]. Assume .T 0 is used for day-ahead load forecasting while the actual ambient temperature for a subsystem is .T 0 +ΔT which will lead to a deviation of actual load from forecasted load. The real power of a load bus i changes with its ambient temperature .Ti as Pi = LPi (Ti )P0i .

.

(1.9)

For simplicity and without loss of generality, assume .LPi (Ti ) = L(Ti ) where the same function L is used for all buses. According to [34–36], .L(Ti ) can be represented by a polynomial function as .L(Ti ) = a3 Ti 3 + a2 Ti 2 + a1 Ti + a0 . Figure 1.1 shows the fitted L function for the Greek interconnected power system based on data between January 1, 1993 and December 31, 2003 [35, 36]. It is seen that the variation of load with temperature is nonlinear and asymmetrically increasing for decreased or increased temperatures with a minimum at around ◦ .Tmin = 18.5 C [35, 36]. The load is more sensitive to higher temperature increase than to lower temperature decrease, mainly because several energy sources such as diesel, natural gas, electricity can be used for heating while practically only electricity can be used for cooling [35, 36]. A similar curve can be found in [34]. Assume the initial load to be the ambient temperature that leads to .L(T ) = 1. For example, based on the curve in Fig. 1.1 there will be two corresponding positive 0 = 9.91 ◦ C and .T 0 = 24.21 ◦ C, respectively. ambient temperatures, which are .Tlow high

8

1 Cascading Failures Under Extreme Temperatures

Fig. 1.1 Relationship between normalized electricity demand and temperature for the Greek power system based on daily data for the period 1/1/1993–12/31/2003 [35, 36]

In order to explore the effect of temperature increase (decrease), it is assumed that 0 (.T 0 = T 0 ). the ambient temperature corresponding to the initial load is .T 0 = Thigh low Under high/low temperatures, there will be more air conditioning loads which consume more reactive power and have lower power factors than other types of loads [6]. In order to take this into account, assume that the power factor linearly decreases with the temperature increase as  pfi =

.

pf

0 ,T ≥ T if T 0 = Thigh i min

pf

0 ,T ≤ T if T 0 = Tlow i min .

pf0i − ki (Ti − T 0 ),

pf0i + ki (Ti − T 0 ),

Then the reactive power of load bus i under temperature .Ti is obtained as:   Qi = Pi tan cos−1 (pfi ) .

.

(1.10)

1.2.3 Dynamic Line Rating Assume the initial rating of a transmission line .l : i → j is determined for Tij0 = (Ti0 + Tj0 )/2. When studying the scenario for high temperature, .Tij0 = 0 Thigh = 24.21 ◦ C. This is consistent with the fact that the approximate current carrying capacity is usually given for .25 ◦ C [37]. The dynamic rating of a line can depend on ambient temperature [38] and utility management vegetation [28]. According to [38], the effect of ambient temperature on dynamic rating expressed in Ampere is quasi linear. Therefore, for dynamic rating expressed in apparent power (MVA) there is

.

d

F ij = Vrated ij Vij (−kij Tij + cij ).

.

(1.11)

1.2 Ambient Temperature in Blackout Modeling

9

Different conductors may have different slopes .kij . For example, the slope for the AMS570 conductor is approximately .0.02 kA/◦ C [38]. For simplicity, the same slope .kij = 0.02 kA/◦ C is used for all lines. Then .cij can be easily obtained as 0

0 0 cij = F ij /(Vrated ij Vij ) + kij Tij . To consider utility vegetation management and the corresponding risk for line tripping due to a slow process involving transmission line temperature evolution, sag increase, and tree contact [28], (1.11) is modified to be

.

d

F ij = αij Vrated ij Vij (−kij Tij + cij ),

.

(1.12)

where .αij is uniformly sampled in .[α, 1] with .0 < α ≤ 1. When .αij = 1, the dynamic rating is only determined by ambient temperature. Otherwise, the dynamic rating will decrease.

1.2.4 Probability of Line Tripping d

For a line .l : i → j , let .Rij (T ) = Fij (T SL )/F ij (Tij ). Note that .Fij is a function of the ambient temperatures of the load buses and will change when there is a load change at any load bus due to ambient temperature change. By contrast, the dynamic d line rating .F ij is only a function of the local ambient temperature of the line .l : i → j. The tripping probability of the line can be written as a function of .Rij : trip

Pij

.

  = ft Rij (T ) .

(1.13)

The function shown in Fig. 1.2 is used for the tripping probability of the line, which can be written as Fig. 1.2 Probability of line tripping as a function of .Rij

10

1 Cascading Failures Under Extreme Temperatures

⎧ ⎪ p1 , ⎪ ⎪ ⎪ ⎨a eb1 Rij , 1 .ft = ⎪ a2 eb2 Rij , ⎪ ⎪ ⎪ ⎩ p3 ,

if Rij ≤ 1 if 1 < Rij ≤ 1 +

if 1 + < Rij ≤ K

(1.14)

if Rij > K,

where .b1 = (ln p1 − ln p2 )/(− ), .a1 = p1 /eb1 , .b2 = (ln p2 − ln p3 )/(1 + − K), and .a2 = p2 /eb2 (1+ ) . When .Rij ≤ 1, although there is no slow process involved a line may still be tripped by a small probability. In this way the factors that could lead to line tripping even if the dynamic line rating is not reached can be taken into account. For example, even if .Fij is far below .Fijd the line may be tripped by the protection due to lighting strikes, then the probability of line tripping is not equal zero, but is equal to a constant low value to describe a probability of the tripping of a line exposed to a hidden failure. Once .Rij > 1 it becomes possible for the line to be tripped such as due to tree contact or overheating and the probability of tripping thus quickly grows to a much higher value .p2 at .Rij = 1 + . Then between .1 +

and K the probability of tripping increases exponentially with parameters .a2 and .b2 . Finally, it reaches to a high probability .p3 ≤ 1 when .Rij is greater than K. .p3 is set to be 1.

1.2.5 Probability of Generator Tripping The change due to the ambient temperature change is .ΔP =

active power 0 ). This change will be supplied by all generators in proportion to (P − P A i i∈SL i their active power reserve. Specifically, for .i ∈ SG \is there is Pi − P0i Pi = P0i + ΔP. (Pi − P0i )

.

(1.15)

i∈SG

When reactive load is increased, the generators nearby will have to provide more reactive power. For example, in 2003 U.S.-Canadian blackout, Eastlake unit 5 in FirstEnergy’s Northern Ohio service area was generating high reactive power, because there were significant reactive power supply problems in the states of Indiana and Ohio. Due to high reactive output and over-excitation, this unit was tripped [6]. Automatic voltage regulator is assumed to be equipped for each generator to hold the terminal voltages. Since normally there is no automatic control action limiting the reactive power output of generators [39], the reactive power of a generator can be beyond the allowed capacity range .[Qi , Qi ] due to voltage regulation to relieve the overvoltage or undervoltage violations close to the generator. The increased possibility of generator tripping is considered due to over-excitation

1.2 Ambient Temperature in Blackout Modeling

11

Fig. 1.3 Probability of generator tripping as a function of .Qi

limiter operation. The tripping probability of the generator i can be written as a function of .Qi as: trip

Pi

.

  = fg Qi .

(1.16)

In particular, the following function as shown in Fig. 1.3 is used: ⎧ ⎪ p6 , ⎪ ⎪ ⎪ ⎪ ⎪ a3 eb3 Qi , ⎪ ⎪ ⎪ ⎪ ⎪ ⎪a4 eb4 Qi , ⎨ .fg = p4 , ⎪ ⎪ ⎪ ⎪ a5 eb5 Qi , ⎪ ⎪ ⎪ ⎪ ⎪ a6 eb6 Qi , ⎪ ⎪ ⎪ ⎩ p6 ,

if Qi ≤ K Qi if K Qi < Qi ≤ Qi −

if Qi − < Qi ≤ Qi if Qi < Qi ≤ Qi if Qi < Qi ≤ Qi +

if Qi + < Qi ≤ K Qi if Qi > K Qi ,

where .b3 = (ln p6 − ln p5 )/(K Qi − Qi + ), .a3 = p6 /e b4 (Qi − )

(1.17)

b3 K Q

i

, .b4 = (ln p5 −

, .b5 = (ln p4 − ln p5 )/(− ), .a5 = p4 /eb5 Qi , .b6 = ln p4 )/(− ), .a4 = p5 /e (ln p5 − ln p6 )/(Qi + − K Qi ), and .a6 = p5 /eb6 (Qi + ) . If the reactive power of any generator i lies in .[Qi , Qi ], it fails only by a very small probability .p4 in order to model any accidental failure. When .Qi falls out of .[Qi , Qi ], the probability for generator tripping quickly grows to a much higher value .p5 at .Qi − or .Qi + . Then from .Qi − to .K Qi or from .Qi + to .K Qi , the probability of tripping increases exponentially with parameters .a3 , .b3 and .a6 , .b6 , respectively. When .Qi ≤ K Qi or .Qi > K Qi it reaches a high probability .p6 for the most abnormal cases.

12

1 Cascading Failures Under Extreme Temperatures

1.3 Modeling Protection and Control Strategies 1.3.1 Undervoltage Load Shedding When the voltage of a load bus i is below a threshold .Vth for more than .τ seconds, a portion of the real power load will be shed [40]. The amount of active power to be shed is determined as: ΔPsh i = min(Ksh ΔVi , Pi ),

(1.18)

.

where .ΔVi = Vth − Vi > 0. The parameters are chosen as .Vth = 0.9 pu, .Ksh = 600 MW/pu, and .τ = 3 seconds. In order to preserve the power factor, the reactive power to be shed is calculated as [41]: ΔQsh i = Qi

.

ΔPsh i . Pi

(1.19)

1.3.2 Operator Re-dispatch 0

Note that the operator only has the initial branch flow capacity .F ij . If the branch 0

flow is greater than the dynamic rating but smaller than .F ij , the operator will 0

not perform any re-dispatch. Only when the line flow is higher than .F ij , will the operator be able to perform a shift-factor (S) based re-dispatch, which well reflects the actual operator behavior. 0 Assume branch .i → j is overloaded, i.e., .Fij > F ij . Select .n+ generators − .{GP1 , GP2 , . . . , GP + } with positive shift factors and .n generators n .{GN1 , GN2 , . . . , GN − } with negative shift factors. For the generators with positive n shift factors, without loss of generality, assume SP1 ≥ SP2 ≥ · · · ≥ SPn+ .

(1.20)

.

To reduce the line overloading most effectively, the generators with positive shift factors should be dispatched (i.e., decreasing their outputs) in the order of + .GP1 , GP2 , . . . , GP + . Specifically, for generator .GPi , .i = 1, . . . , n , its real power n output is re-dispatched as: 0

PPi =

.

P0Pi

+

η(F ij − Fij ) SPi

.

(1.21)

1.4 Timing of Events

13

If the generators with positive shift factors cannot eliminate the overloading, the generators with negative shift factors will be re-dispatched. Without loss of generality, assume SN1 ≤ SN2 ≤ · · · ≤ SNn− .

.

(1.22)

These generators are dispatched (i.e., increasing their outputs) in the order of GN1 , GN2 , . . . , GNn− by a similar approach to that for the generators with positive shift factors. If there are multiple overloaded branches, the above re-dispatch will be applied to each of them according to .Fij /Fij0 . The larger .Fij /Fij0 of a branch is, the earlier the overloading of this branch will be dealt with by the re-dispatch of generators. If necessary, multiple rounds of re-dispatch will be executed until the overloading of all branches is eliminated or the number of rounds reaches a limit.

.

1.4 Timing of Events Define a set of events for the .(k + 1)th iteration as .E = {e1 , e2 , . . . , em } where m is the number of potential events that could happen in iteration .k + 1. The events could be low voltage of a load bus, tripping of a line whose .Fij could be smaller d

or greater than its dynamic rating .F ij , tripping of a generator whose reactive power output is within/outside its lower and upper limits. Each type of events will fail after a specific amount of time which will be decided as follows. As mentioned in Sect. 1.3.1, when the voltage of a load bus i is below a presh defined threshold for more than .τ = 3 s, shed .ΔPsh i and .ΔQi of load at bus i. The re-dispatch in Sect. 1.3.2 is assumed to be finished in 1 minute. d If a line whose .Fij is smaller than its dynamic rating .F ij is tripped, there is no slow process involved and the line is disconnected by the protective relay after a very short time, which may include the relay operating time and the breaker operating time. Consider this time as .0.2 s [37, 42]. d When .Fij ≥ F ij , the line may be tripped for different reasons, such as tree contact caused by a slow process [6, 28], overheating, or mis-operation of zone 2 and zone 3 distance relays [6]. The time of line tripping under different mechanisms can vary significantly. For example, in the 2003 U.S.-Canadian blackout the StuartAtlanta 345-kV line tripping took 31 minutes due to tree contact. The backup zone 2 and zone 3 relay, however, can operate in a few seconds. The probability of a d line with .Fij ≥ F ij to fail can be determined based on Sect. 1.2.4. If such a line .l : i → j is to fail, it fails when its total accumulated overload exceeds a limit .o ¯ ij which represents the condition required for line tripping due to a number of processes such as the overheating of a transmission line or the sagging of the line to

14

1 Cascading Failures Under Extreme Temperatures

vegetation [28, 43]. Let .Δoij,0 = 0. The time for line .l : i → j whose .Fij is greater d

than .F ij in iteration .k + 1 can be calculated as: Δtij,k+1 =

.

o¯ ij −

k

m=0 Δoij,m , d Fij (tk ) − F ij (tk )

(1.23)

where the limit .o¯ ij is chosen so that a branch will trip after 20 seconds of being 50% above the branch flow limit. According to [39], generators usually have about 10–20% overload capability for up to 30 minutes. A similar approach is used to determine the time for generator tripping in every interval. When the reactive power of a generator g is in .[Qg , Qg ], it fails by a very small probability .p1 because of accidental failure. For these types of failures, the time is assumed to be .0.2 second [44]. If the reactive power of generator g moves out of .[Qg , Qg ], the probability for that generator to fail can also be determined by Sect. 1.2.5. If the generator is to fail, the time that is required for this generator to fail in iteration .k + 1 can be calculated as

Δtg,k+1 =

.

⎧ k o¯ g − m=0 Δog,m ⎪ ⎪ ⎪ Qg −Qg (tk ) , ⎨ ⎪

k ⎪ ⎪ ⎩ o¯g − m=0 Δog,m , Qg (tk )−Qg

if Qg < Qg (1.24) if Qg > Qg ,

where the threshold .o¯ g is chosen so that a generator will trip after 30 minutes of being 20% above/below the upper/lower reactive power limit and .og,0 = 0. min be the minimum time of all events in iteration .k + 1, and it can be Let .Δtk+1 calculated as   min .Δtk+1 = min t (ei ), i = 1, . . . , m , (1.25) where .t (ei ) is the time for event .ei and the time corresponding to the next event is min . thus .tk+1 = tk + Δtk+1 The .Δoij at iteration .k + 1 can be obtained by [43]:   d min Δoij,k+1 = max Fij (tk ) − F ij (tk ), 0 Δtk+1 ,

.

and .Δog at iteration .k + 1 can be calculated as

Δog,k+1 =

.

 ⎧ min , ⎪ Qg − Qg (tk ) Δtk+1 ⎪ ⎨

if Qg < Qg

  ⎪ ⎪ ⎩ Q (t ) − Q Δt min , g k g k+1

if Qg > Qg .

(1.26)

1.5 Voltage Stability Margin Calculation

15

1.5 Voltage Stability Margin Calculation Voltage instability has been responsible for several major blackouts, such as New York Power Pool disturbance on September 22, 1970 and Western systems coordination council (WSCC) transmission system disturbance on July 2, 1996. A system enters a state of voltage instability when a disturbance, such as a load increase or change in system conditions, causes a progressive and uncontrollable decline in voltage. In blackouts, the load increase due to temperature disturbance and the reactive power supply decrease due to the increased probability of tripping of the lines and generators that are geographically close to the load increase area both contribute to voltage instability. More importantly, the load increase and the increase of line tripping probability are correlated and both are related to the temperature disturbance, which may greatly increase the risk of cascading. After each change in the operating condition, calculate the voltage stability margin based on the QV index in [45]. Specifically, for a power flow model  .

    ΔP Δθ J Pθ J PV = , J Qθ J QV ΔQ ΔV

(1.27)

letting .ΔP = 0 there is Δθ = −J −1 Pθ J PV ΔV.

.

(1.28)

Substituting (1.28) into the .ΔQ equations in (1.27) leads to   ΔV. ΔQ = J QV − J Qθ J −1 J PV Pθ

.

(1.29)

Let .J R = J QV − J Qθ J −1 Pθ J PV and a voltage stability index (VSI) for the whole system can be defined as 

 det(JR )   , i = 1, . . . , N , .VSI = min adj(JR ) ii

(1.30)

where N is the number of buses, .adj(A) = det(A)A−1 , and .det(A) is the determinant of A. The VSI can be used to indicate how close the system is to voltage instability. The bigger VSI is, the more stable the system is. When VSI approaches zero the system will lose voltage stability [45].

16

1 Cascading Failures Under Extreme Temperatures

1.6 Blackout Model Considering Temperature Figure 1.4 illustrates the cascading failure model which can be implemented in the following ten steps. 1. Randomly select a load bus and an area around the selected load bus based on Sect. 1.2.1. Increase the ambient temperature of the selected area by .ΔT . 2. Increase the loads of the buses inside the selected area based on Sect. 1.2.2 and adjust the generators’ real power setpoints based on (1.15) in Sect. 1.2.5. 3. Calculate dynamic ratings of the lines inside the selected area and the boundary lines by (1.12) in Sect. 1.2.3.

Fig. 1.4 Flowchart of the blackout model

1.6 Blackout Model Considering Temperature

17

4. Run power flow. If the power flow cannot converge, go to Step 10; if the power flow converges but there is no voltage stability margin as defined in (1.30), go to Step 10. Otherwise, go to Step 5. 5. Check the voltage of the load buses and the load for the load buses whose voltages are less than .Vth is to be shed based on Sect. 1.3.1. 6. Check line flows. If there is any line whose line flow .Fij is greater than its initial rating .Fij0 , the generators are to be re-dispatched based on Sect. 1.3.2. 7. Find the probabilities of line tripping from Sect. 1.2.4 and generator tripping from Sect. 1.2.5 and decide the lines and generators that will be tripped. 8. Calculate the time for each event according to Sect. 1.4. The event from Steps 5–7 that has the minimum time will actually happen. 9. If any event occurs, go back to Step 4; otherwise, go to Step 10. 10. Stop the simulation. By utilizing this model, cascading failures can be realistically simulated to capture what has happened in previous blackouts. Even though the system is .N − 1 secure, cascading failure can still be initiated and then propagates in a large area of the system. This is mainly due to the following reasons. 1. Although load forecasting is used for day-ahead generation purchases and reactive power management, due to uncertainties such as unexpected temperature disturbances the actual load may be different from the forecasted load, thus changing system operating conditions. 2. Under different weather conditions the actual line rating could change significantly due to a number of processes such as the overheating of a transmission line or the sagging of the line to vegetation. An .N − 1 secure system under initial line ratings may not still be .N − 1 secure under the reduced dynamic line ratings such as due to temperature increases. 3. The initiating events have important geographical correlations due to external weather conditions such as temperature disturbance. Temperature increase can cause increased line flow by load increase and line rating decrease at the same time, greatly increasing the probability of tripping of lines inside or on the boundary of an area with temperature disturbances. Generators inside the area with temperature disturbance also have increased chance of being disconnected due to load increase and also reduction of reactive supply from the outside system after tie line disconnection. The cascading failure model can take into account the correlations between different events such as line outage, generator tripping, and undervoltage of load buses, which may lead to extensive outage propagation even if the system is initially .N − 1 secure without any temperature disturbance. As the penetration of renewable generation is quickly increasing, the future power system will be even more impacted by external factors such as weather conditions. This is because compared with conventional generation the renewable, mostly power electronics interfaced generation depends more on weather conditions and is more sensitive to the system disturbances such as voltage disturbances caused by transmission outages due to lightning or wildfire [46–48]. The blackout model

18

1 Cascading Failures Under Extreme Temperatures

discussed in this chapter can be further extended for the future power system with high penetration of renewable generation to evaluate the cascading failure risk, identify critical components that play important roles in outage initiating and propagation, and develop effective mitigation strategies to significantly reduce the cascading risk.

1.7 Simulation Results 1.7.1 Model Implementation and Parameter Settings The blackout model in Sect. 1.6 is implemented in Matlab for the RTS-96 3-area system [49] based on MATPOWER [50]. There are 73 buses and 120 branches in the system, and the total load is 8550 MW. Compared to the initial model, each reactor of 100 MVar at buses 106, 206, and 306 is split into two, 50 MVar at each extremity of the line. These reactors are considered to be automatically disconnected in case of the outage of the corresponding line. The pre-contingency steady state is based on a Preventive-Security-Constrained Optimal Power Flow so that the system is .N − 1 secure1 [51]. All tests are carried out on a 3.20 GHz Intel(R) Core(TM) i7-8700 based desktop. Set .k pf = 0.001 for all buses [6], .K = 1.5 in Sect. 1.2.4 for lines [52], .K Qi in Sect. 1.2.5 as .1.5Qi if .Qi < 0 and as .−0.5 if .Qi = 0, and .K Qi as .1.5Qi . Besides, .

is chosen as .0.01 for lines [52], and . = −0.01Qi and . = 0.01Qi for generators. Choose .p1 = 0.001, .p2 = 0.3, and .p3 = 1 [52]. For generators .p4 = 0.001 for considering hidden failures, .p5 = 0.3, and .p6 = 1. For re-dispatch set .η = 1.05 [53]. For calculating dynamic line rating, set .αij = 1.

1.7.2 Typical Simulation Run Without Operator Re-dispatch Buses 207 and 208 are selected as the internal buses and their initial temperatures 0 = 24.21 ◦ C. As a temperature disturbance, the temperature of the are set to be .Thigh 0 + 10 ◦ C. As in Table 1.2, after temperature selected area is increased to .T = Thigh increase the active and reactive powers at these two buses also increase. The line flows and the dynamic line ratings of the boundary and internal branches are listed d in Table 1.3. After temperature rise the line flows increase while .F ij decreases 0

compared to .F ij , and branches 207–208 and 208–209 become overloaded.

1 http://homepages.ulb.ac.be/~phenneau/CFWG_Benchmark.HTML.

1.7 Simulation Results

19

Table 1.2 Internal buses with their active and reactive powers in the typical case without operator re-dispatch Inside buses 207 208

0)

.P(Ti

(MW)

125 171

.P(Ti )

(MW) 189.79 259.63

0)

.Q(Ti

(MVAr)

25 35

.Q(Ti )

(MVAr)

48.08 66.76

Table 1.3 Initial flows, flow after temperature rises, and dynamic line rating of boundary and internal branches in the typical case without operator re-dispatch 0

0

d

Branch

.Fij (Tij )

.F ij

.Fij (Tij )

.F ij (Tij )

(207, 208) (208, 209) (208, 210)

53.24 96.41 82.77

175 190 190

148.89 175.07 161.81

147.40 173.21 173.45

Fig. 1.5 Process of a typical blackout in RTS-96 system simulated by the model without operator re-dispatch

0

If dynamic line rating is not considered, .Fij (Tij ) for all lines are less than .F ij and the probability for line tripping is as low as .p1 and the probability for generator tripping is equal to .p4 . However, if dynamic line rating is considered, things will be totally different. When operator re-dispatch is not modeled, the event sequence simulated from the model in Sect. 1.6 is shown in Fig. 1.5, in which line tripping is indicated by red dash lines, generator tripping is indicated by green dash circles, and the number next to the tripped line or generator indicate the sequence of the event. After the temperature of the selected area increases, branch 207–208 will be tripped by probability .0.3. In the simulation it is tripped after .991.40 s. This leads to undervoltage of bus 208 and the islanding of the generators at bus 207. Then the cascading gradually propagates to the other parts of the system, leading to a total of

20

1 Cascading Failures Under Extreme Temperatures

Fig. 1.6 Events for the typical case without operator re-dispatch (For generator outages the involved generator buses are shown. Since one generator bus can have several generators connected to it, it may appear more than once, such as generator bus 201) Fig. 1.7 Total load, VSI, and voltage at vulnerable buses for the typical case without operator re-dispatch: (a) Total load; (b) VSI; (c) Voltage at vulnerable buses

103

15 10 5 0

0

2

4

6

8

0

2

4

6

8

0

2

4

6

8

(a)

10

12 103

15 10 5 0

(b)

10

12 103

1 0.9 0.8 0.7

(c)

10

12 103

35 line outages and tripping of 19 generator at 12 generator buses. The line outages, generator outages, and undervoltage buses during the blackout are shown in Fig. 1.6. The total load during the blackout is shown in Fig. 1.7a. Three seconds after the first line tripping, part of the load at bus 208 is shed due to undervoltage. Due to islanding and undervoltage, load shedding becomes much faster after 8036 seconds.

1.7 Simulation Results

21

The VSI during the cascading event is shown in Fig. 1.7b. Under normal operating conditions VSI is 14.68, and when the blackout propagates it gradually decreases. At 11,454.2 s, VSI decreases from 2.02 to 1.39. Figure 1.7c shows the voltages of four vulnerable buses (buses 303, 305, 306, and 324) over time. It is seen that after the VSI reduces to 1.39 the voltages at the vulnerable buses begin to drop significantly before voltage collapse finally occurs at 12,657 s.

1.7.3 Typical Simulation Run with Operator Re-dispatch In this case, buses 304, 305, 306, 309, 310, and 314 are chosen as the internal 0 buses and their initial temperatures are .Thigh = 24.21 ◦ C. The temperature of the 0 + 10 ◦ C to simulate a temperature disturbance. selected area is raised to .T = Thigh As shown in Table 1.4, the active and reactive powers at the internal buses increase due to temperature rise. The line flows and dynamic line ratings of the boundary and internal branches are reported in Table 1.5. It is easily seen that the line flow of branch 314–316 becomes higher than its dynamic line rating and is to be tripped

Table 1.4 Internal buses with their active and reactive powers in the typical case with operator re-dispatch Inside buses 304 305 306 309 310 314

0)

.P(Ti

(MW)

74 71 141 175 195 194

Table 1.5 Initial flows, flow after temperature rises, and dynamic line rating of boundary and internal branches in the typical case with operator re-dispatch

.P(Ti )

(MW)

112.4 107.8 214.1 265.7 296.1 294.6

0)

.Q(Ti

(MVAr)

.Q(Ti )

15 14 28 36 40 39

(MVAr)

29.9 27.4 54.7 69.5 77.3 75.9

0

0

d

Branch

.Fij (Tij )

.F ij

.Fij (Tij )

.F ij (Tij )

(305, 301) (304, 302) (306, 302) (309, 303) (304, 309) (305, 310) (306, 310) (309, 308) (309, 311) (309, 312) (310, 311) (310, 312) (314, 316)

48.3 39.4 41.1 20.8 49.2 25.3 150.4 97.4 146.5 165.1 193.4 216.5 354.7

175 175 175 175 175 175 180 190 400 400 400 400 500

88.1 67.3 66.5 40.9 82.7 47.4 151.6 81.8 191.1 211.2 266.3 288.2 498.5

167.7 172.5 160.2 158.4 147.4 147.4 152.4 174.3 352.5 353.7 351.3 342.5 492.4

22

1 Cascading Failures Under Extreme Temperatures

Fig. 1.8 Process of a typical blackout in RTS-96 system when operator re-dispatch is modeled

Fig. 1.9 Events for the typical case with operator re-dispatch (For generator outages the involved generator buses are shown)

by probability .0.32. In the simulation, it is tripped after 798.34 s and initiates a cascading failure. The event sequence simulated from the model with operator redispatch is illustrated in Fig. 1.8, in which the selected area is denoted by blue dash lines, line tripping is denoted by red dash lines, generator tripping is specified by green dash circles, and the number next to the tripped line or generator indicates the sequence of the event. The tripping of line 314–316 leads to the overloading of line 312–323 which is tripped after 54.2 s. These events result in the undervoltage of buses 303, 309, and 324 at 855.54 s and the overloading of line 313–323 at 915.54 s. The operator re-dispatch at 975.54 s eliminates the overloading of line 313–323, but the generator at bus 314 within the selected area is tripped at 2415.75 s due to overexcitation and cascading failure spreads to the other parts of the system, causing a total of 19 line outages and the tripping of 14 generators at 11 generator buses. The operator re-dispatch, line outages, generator outages, and undervoltage buses during the blackout are illustrated in Fig. 1.9.

1.7 Simulation Results Fig. 1.10 Total load, VSI, and voltage at vulnerable buses for the typical case with operator re-dispatch: (a) Total load; (b) VSI; (c) Voltage at vulnerable buses

23 103

15 10 5 0

0

1

2

0

1

2

0

1

2

(a)

3

4

5

3

4

5

3

4

5

103

15 10 5 0

(b)

103

1 0.8 0.6

(c)

103

The total load during the blackout is shown in Fig. 1.10a. After two line trippings, undervoltage happens at buses 303, 309, and 324, and part of the load at these buses is shed due to undervoltage. The VSI during the cascading event is shown in Fig. 1.10b. At 5378.5 s, VSI drops from 2.72 to 0.97. Figure 1.10c illustrates the voltages of eight vulnerable buses (buses 303, 308, 309, 310, 311, 312, 313, and 324), some of which are inside the selected area. It is noticed that after VSI declines to 0.97 the voltages at the vulnerable buses begin to drop remarkably before voltage eventually collapses at 5381.5 s.

1.7.4 Number of Simulations Since many random factors could affect cascading failure simulation, set .γ = 0.07, ΔT = 11 ◦ C and run the model for different times in order to decide a number for which the variance of the simulation results is small enough. Figure 1.11 shows the average value and the standard deviation of the number of outages for different number of simulations. It is seen that after 10,000 simulations the average value of outages (both line outages and generator outages) stabilizes and the standard deviation of the number of outages decreases to a very small value. For the other parameters of .γ and .ΔT results are very similar and are thus not given. Therefore, the model is run for 10,000 times for each case.

.

24

1 Cascading Failures Under Extreme Temperatures 1.2 1

1.5

0.8

1

0.6 0.4

0.5

0.2 0 0.1 1

2

3

4

5

7.5

10

104

0 100

101

102

103

104

105

(b)

(a)

Fig. 1.11 Average value and standard deviations of the number of outages for different number of simulations: (a) Average value; (b) Standard deviation 2.5

0.16

2

0.14

1.5 0.12 1 0.1

0.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

(a)

0.08

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

(b)

Fig. 1.12 Average value of the number of outages under different temperature disturbances: (a) Increasing temperature; (b) Decreasing temperature

1.7.5 Impact of Temperature Disturbances and Size of Selected Area Set .γ = 0.07 and run the model in Sect. 1.6 for 10,000 times with randomly selected 0 or decreased from areas, in which the ambient temperature is increased from .Thigh 0 .T low by .ΔT . Figure 1.12 shows the average value of the number of outages under different temperature disturbances. It is seen that when increasing the temperature the total number of line and generator outages are low when .ΔT ≤ 11 ◦ C, and will grow quickly when .ΔT > 11 ◦ C. Therefore, .ΔT = 11 ◦ C can be inferred as the critical temperature disturbance. As presented in Fig. 1.12b, if the temperature is decreased, while the load is increased the dynamic line rating is also increased. Therefore, the total number of line and generator outages is always low. Figure 1.13 shows the total load shed for three different temperature increase disturbances and it is seen that the total load shed increases under larger temperature disturbances.

1.7 Simulation Results Fig. 1.13 Distribution of the load shed under different temperature increase disturbances

25 10-1

10-2

10-3

10-4 10-3

Fig. 1.14 Average value of the number of outages under different areas

10-2

10-1

100

2.5 2 1.5 1 0.5 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

Fig. 1.15 Critical temperature versus critical area size

16 15 14 13 12 11 10 9 8

0.02

0.04

0.06

0.08

The impact of the size of the selected area is also analyzed for temperature increase cases with .ΔT = 11 ◦ C, and a random area is selected with a different value of .γ . As the results in Fig. 1.14 indicate, by enlarging the selected area, the number of line and generator outages increases. The number of line and generator outages significantly increases for .γ > 0.07. Therefore, .γ = 0.07 can be inferred as the critical size. Figure 1.15 shows the pairs of critical temperature increase disturbances and critical area sizes. It can be seen that as the critical size of the selected area increases, the critical temperature disturbance decreases.

26

1 Cascading Failures Under Extreme Temperatures 3

0.25

2.5

0.2

2

0.15

1.5

0.1

1

208 308 305 210 209 306 309 310 110 104 105 109 205 206 108 106 103 303 304 203 119 204 220 319 120 320 219

0

(a)

0

208 308 305 210 209 306 309 310 110 104 105 109 205 206 108 106 103 303 304 203 119 204 220 319 120 320 219

0.05

0.5

(b)

Fig. 1.16 Identification of vulnerable buses: (a) Average value of the number of outages for load buses; (b) Ratio between the load shed and the total load for load buses

1.7.6 Identifying the Most Vulnerable Buses/Locations The vulnerability of the buses/locations depends on the temperature disturbance and the size of the selected area. For effectively identifying the most vulnerable locations, we run the model .10,000 times for all combinations of .ΔT = {8 ◦ C, 10 ◦ C, 11 ◦ C, 15 ◦ C} and .γ = {0.05, 0.06, 0.07, 0.08} around every load bus. By doing so the vulnerable buses can be identified under very diverse failure scenarios. Figure 1.16a shows the average value of the number of line and generator outages and Fig. 1.16b shows the average value of the ratio between the load shed and the total load. It is seen that the load buses 208, 308, 305, 210, 209, 306, 309, and 310 are more vulnerable than the other buses, and the temperature disturbances around these buses lead to much more line/generator outages and load shed.

1.7.7 Impact of Control Strategies For the typical case presented in Sect. 1.7.2, if the operator re-dispatch is modeled by considering the initial line rating, at 8972 s the operator re-dispatch is performed because the power flow of some branches exceeds their initial capacities and the potential outage of line 209–211 would take 104.4 s which is longer than 60 s, the time required for operator re-dispatch. After the operator re-dispatch the cascading failure stops. Besides, set .ΔT = 11 ◦ C, .γ = 0.07 and run simulations for 10,000 times for the cases with or without considering operator re-dispatch. Figure 1.17 shows the distributions of the number of line and generator outages. It is clear that by implementing the re-dispatch strategy using .Fij0 , the number of line and generator outages is decreased. Besides, by considering dynamic line rating for re-dispatch, the risk can be reduced much more significantly.

References

27

100

100

10-2

10-2

10-4 100

101

(a)

10-4 100

101

(b)

Fig. 1.17 Distribution of the number of line and generator outages with or without re-dispatch. (a) Line outage (b) Generator outage

References 1. K. Sun, Y. Hou, W. Sun, J. Qi, Power System Control Under Cascading Failures: Understanding, Mitigation, and System Restoration (Wiley-IEEE Press, New York , 2019) 2. P. Praks, V. Kopustinskas, M. Masera, Monte-Carlo-based reliability and vulnerability assessment of a natural gas transmission system due to random network component failures. Sustain. Resilient Infrastruct. 2(3), 97–107 (2017) 3. M. Theoharidou, M. Kandias, D. Gritzalis, Securing transportation-critical infrastructures: Trends and perspectives, in Global Security, Safety and Sustainability & e-Democracy (Springer, Berlin, 2011), pp. 171–178 4. S. Hong, H. Yang, T. Zhao, X. Ma, Epidemic spreading model of complex dynamical network with the heterogeneity of nodes. Int. J Syst. Sci. 47(11), 2745–2752 (2016) 5. S.V. Buldyrev, R. Parshani, G. Paul, H.E. Stanley, S. Havlin, Catastrophic cascade of failures in interdependent networks. Nature 464(7291), 1025–1028 (2010) 6. B. Liscouski, W. Elliot, Final report on the august 14, 2003 blackout in the United States and Canada: causes and recommendations. Rep. US Depart. Energy 40(4), 86 (2004) 7. F. E. R. Commission et al., Arizona-southern California outages on September 8, 2011: Causes and recommendations. FERC/NERC Staff Report on the September 8, 2011 Blackout (2012) 8. L.L. Lai, H.T. Zhang, C.S. Lai, F.Y. Xu, S. Mishra, Investigation on July 2012 Indian blackout, in International Conference on Machine Learning and Cybernetics, vol. 1 (IEEE, Piscataway, 2013), pp. 92–97 9. D.S. Kirschen, D. Jayaweera, D.P. Nedic, R.N. Allan, A probabilistic indicator of system stress. IEEE Trans. Power Syst. 19(3), 1650–1657 (2004) 10. A. Phadke, J.S. Thorp, Expose hidden failures to prevent cascading outages [in power systems]. IEEE Comput. Appl. Power 9(3), 20–23 (1996) 11. J. Chen, J.S. Thorp, I. Dobson, Cascading dynamics and mitigation assessment in power system disturbances via a hidden failure model. Int. J. Elect. Power Energy Syst. 27(4), 318–326 (2005) 12. I. Dobson, B.A. Carreras, D.E. Newman, A loading-dependent model of probabilistic cascading failure. Probab. Eng. Inform. Sci. 19(1), 15–32 (2005) 13. B.A. Carreras, D.E. Newman, I. Dobson, N.S. Degala, Validating OPA with WECC data, in 46th Hawaii International Conference on System Sciences (IEEE, Piscataway, 2013), pp. 2197–2204 14. S. Mei, Y. Ni, G. Wang, S. Wu, A study of self-organized criticality of power system under cascading failures based on AC-OPF with voltage stability margin. IEEE Trans. Power Syst. 23(4), 1719–1726 (2008)

28

1 Cascading Failures Under Extreme Temperatures

15. J. Song, E. Cotilla-Sanchez, G. Ghanavati, P.D. Hines, Dynamic modeling of cascading failure in power systems. IEEE Trans. Power Syst. 31(3), 2085–2095 (2015) 16. I. Dobson, A. Flueck, S. Aquiles-Perez, S. Abhyankar, J. Qi, Towards incorporating protection and uncertainty into cascading failure simulation and analysis, in IEEE International Conference Probabilistic Methods Applied to Power Systems (PMAPS) (2018), pp. 1–5 17. J. Qi, S. Pfenninger, Controlling the self-organizing dynamics in a sandpile model on complex networks by failure tolerance. EPL (Europhys. Lett.) 111(3), 38006 (2015) 18. J. Qi, I. Dobson, S. Mei, Towards estimating the statistics of simulated cascades of outages with branching processes. IEEE Trans. Power Syst. 28(3), 3410–3419 (2013) 19. I. Dobson, Estimating the propagation and extent of cascading line outages from utility data with a branching process. IEEE Trans. Power Syst. 27(4), 2146–2155 (2012) 20. J. Qi, W. Ju, K. Sun, Estimating the propagation of interdependent cascading outages with multi-type branching processes. IEEE Trans. Power Syst. 32(2), 1212–1223 (2017) 21. J. Qi, K. Sun, S. Mei, An interaction model for simulation and mitigation of cascading failures. IEEE Trans. Power Syst. 30(2), 804–819 (2015) 22. W. Ju, J. Qi, K. Sun, Simulation and analysis of cascading failures on an NPCC power system test bed, in IEEE PES General Meeting (2015), pp. 1–5 23. J. Qi, J. Wang, K. Sun, Efficient estimation of component interactions for cascading failure analysis by EM algorithm. IEEE Trans. Power Syst. 33(3), 3153–3161 (2018) 24. W. Ju, K. Sun, J. Qi, Multi-layer interaction graph for analysis and mitigation of cascading outages. IEEE J. Emerg. Sel. Top. Circ. Syst. 7(2), 239–249 (2017) 25. D.N. Kosterev, C.W. Taylor, W.A. Mittelstadt, Model validation for the August 10, 1996 WSCC system outage. IEEE Trans. Power Syst. 14(3), 967–979 (1999) 26. C.W. Taylor, D.C. Erickson, Recording and analyzing the July 2 cascading outage [Western USA power system]. IEEE Comput. Appl. Power 10(1), 26–30 (1997) 27. M. Anghel, K.A. Werley, A.E. Motter, Stochastic model for power grid dynamics, in 2007 40th Annual Hawaii International Conference on System Sciences (HICSS’07) (IEEE, Piscataway, 2007), pp. 113–113 28. J. Qi, S. Mei, F. Liu, Blackout model considering slow process. IEEE Trans. Power Syst. 28(3), 3274–3282 (2013) 29. P. Henneaux, P.-E. Labeau, J.-C. Maun, Blackout probabilistic risk assessment and thermal effects: impacts of changes in generation. IEEE Trans. Power Syst. 28(4), 4722–4731 (2013) 30. R. Yao, K. Sun, Towards simulation and risk assessment of weather-related outages. IEEE Trans. Smart Grid 10(4), 4391–4400 (2019) 31. I. Dobson, N.K. Carrington, K. Zhou, Z. Wang, B.A. Carreras, J.M. Reynolds-Barredo, Exploring cascading outages and weather via processing historic data (2017). Preprint arXiv:1709.09079 32. S.R. Khazeiynasab, J. Qi, Resilience analysis and cascading failure modeling of power systems under extreme temperatures. J. Modern Power Syst. Clean Energy 9(6), 1446–1457 (2021) 33. D. Rick, Deriving the haversine formula, in The Math Forum (1999) 34. P.J. Robinson, Modeling utility load and temperature relationships for use with long-lead forecasts. J. Appl. Meteorol. 36(5), 591–598 (1997) 35. S. Mirasgedis, Y. Sarafidis, E. Georgopoulou, D. Lalas, M. Moschovits, F. Karagiannis, D. Papakonstantinou, Models for mid-term electricity demand forecasting incorporating weather influences. Energy 31(2–3), 208–227 (2006) 36. S. Mirasgedis, Y. Sarafidis, E. Georgopoulou, V. Kotroni, K. Lagouvardos, D. Lalas, Modeling framework for estimating impacts of climate change on electricity demand at regional level: case of Greece. Energy Convers. Manag. 48(5), 1737–1750 (2007) 37. J.D. Glover, M.S. Sarma, T. Overbye, Power System Analysis & Design, SI Version (Cengage Learning, Boston, 2012) 38. A. Michiorri, H.-M. Nguyen, S. Alessandrini, J.B. Bremnes, S. Dierer, E. Ferrero, B.-E. Nygaard, P. Pinson, N. Thomaidis, S. Uski, Forecasting for dynamic line rating. Renew. Sustain. Energy Rev. 52, 1713–1730 (2015)

References

29

39. P. Kundur, N.J. Balu, M.G. Lauby, Power System Stability and Control, vol. 7 (McGraw-Hill, New York, 19940) 40. B. Otomega, T. Van Cutsem, Undervoltage load shedding using distributed controllers. IEEE Trans. Power Syst. 22(4), 1898–1907 (2007) 41. T. Van Cutsem, M. Glavic, W. Rosehart, J. Andrade dos Santos, C. Cañizares, M. Kanatas, L. Lima, F. Milano, L. Papangelis, R. Andrade Ramos et al., Test systems for voltage stability analysis and security assessment. IEEE Technical Report (2015) 42. Y. Xue, M. Thakhar, J.C. Theron, D.P. Erwin, Review of the breaker failure protection practices in utilities, in 65th Annual Conference for Protective Relay Engineers (2012), pp. 260–268 43. M.J. Eppstein, P.D. Hines, A “random chemistry” algorithm for identifying collections of multiple contingencies that initiate cascading failure. IEEE Trans. Power Syst. 27(3), 1698– 1705 (2012) 44. V. Doifode, M.V. Aranke, S.M. Choudhary, V.V. Dhengre, A.M. Thawkar, An overview of classes & grouping in generator protection at VIPL (Nagpur), in International Conference on Science and Engineering for Sustainable Development (ICSESD-2017) (2016) 45. H. Li, A. Bose, V.M. Venkatasubramanian, Wide-area voltage monitoring and optimization. IEEE Trans. Smart Grid. 7(2), 785–793 (2015) 46. North American Electric Reliability Council, 1,200 MW fault induced solar photovoltaic resource interruption disturbance report (2017) 47. R. Yan, T.K. Saha, F. Bai, H. Gu et al., The anatomy of the 2016 south Australia blackout: a catastrophic event in a high renewable network. IEEE Trans. Power Syst. 33(5), 5374–5388 (2018) 48. N. G. ESO, Technical report on the events of 9 August 2019 (2019) 49. R. Force, The IEEE reliability test system-1996. IEEE Trans. Power Syst. 14(3), 1010–1020 (1999) 50. R.D. Zimmerman, C.E. Murillo-Sanchez, R.J. Thomas, MATPOWER: steady-state operations, planning and analysis tools for power systems research and education. IEEE Trans. Power Syst. 26(1), 12–19 (2011) 51. P. Henneaux, E. Ciapessoni, D. Cirio, E. Cotilla-Sanchez, R. Diao, I. Dobson, A. Gaikwad, S. Miller, M. Papic, A. Pitto et al., Benchmarking quasi-steady state cascading outage analysis methodologies, in 2018 IEEE International Conference on Probabilistic Methods Applied to Power Systems (PMAPS) (IEEE, Piscataway, 2018), pp. 1–6 52. S.T. Lee, Estimating the probability of cascading outages in a power grid, in Proceedings of the 16th PSCC, Glasgow, Scotland (2008), pp. 14–18 53. R. Yao, S. Huang, K. Sun, F. Liu, X. Zhang, S. Mei, A multi-timescale quasi-dynamic model for simulation of cascading outages. IEEE Trans. Power Syst. 31(4), 3189–3201 (2015)

Chapter 2

Cascading Failure Interaction Analysis

2.1 Introduction Cascading failure simulations based on various models, such as the one in Chap. 1, can produce many samples of cascades [1]. The branching process can extract high-level statistical information from these cascades and can quantify the extent of outage propagation by a simple parameter called average propagation [2–5]. However, the branching process cannot be used to study how the outages propagate in the system from one component to another component in detail, because it does not retain any information about the network topology or the power flow. The recent study on the influence graph [6] and the interaction network [7–9] provide another more useful way to extract propagation patterns in the original cascades. In [7] an explicit study of the interactions between components failures obtained from detailed cascading failure simulation helps better understand the mechanisms of cascading failures. In [8] an interaction network is built for a Northeast Power Coordinating Council (NPCC) power system testbed, which represents the northeastern region of the EI system. Then in [9] a multi-layer interaction graph is proposed as an extension of a single-layer interaction network. In this multi-layer graph, each layer focuses on one of several aspects that are critical for the system operators’ decision support, such as the number of line outages, the amount of load shedding, and the electrical distance of the outage propagation. The interactions between component failures may vary under different operating conditions. For online cascading failure mitigation, the interactions should be quantified at least every 15 minutes for the most updated system operating conditions in order to well capture the propagation pattern of cascading failures, which would further require that the simulation of a detailed cascading failure model provide enough original cascades. Therefore, an interaction estimation method that requires a small number of original cascades is critical for the practical implementation of the online application.

© Springer Nature Switzerland AG 2023 J. Qi, Smart Grid Resilience, https://doi.org/10.1007/978-3-031-29290-3_2

31

32

2 Cascading Failure Interaction Analysis

In this chapter, we discuss an interaction estimation method based on the Expectation Maximization (EM) algorithm [10], which requires a much smaller number of original cascades to accurately estimate the interactions and identify the key links and key components, when compared with the method in [7].

2.2 Estimating Interactions Between Component Failures In power systems, the components can be branches such as the transmission lines or transformers. In order to estimate the interaction between component failures, we need to have a large amount of data that records the processes of cascading failures. The data, which comes either from utility outage records or cascading failure simulation models, can be grouped into different cascades (one cascade corresponds to one cascading failure process) and generations (one generation corresponds to one stage in a cascade). Assume we have a total of M cascades as

Cascade 1 Cascade 2 .. . . Cascade M

Generation 0

Generation 1

Generation 2

.· · ·

.F0

.F1

.F2

.· · ·

.F0

.F1

.F2

.· · ·

.

.

.

.

(1) (2)

(1) (2)

.. .

(M)

.F0

.. .

(M)

.F1

(1) (2)

.. .

(M)

.F2

.. . .· · ·

Here .Fg(m) is the set of the failed components in generation g of cascade m. After obtaining the original cascades, we can estimate the component interactions, which are defined as the interaction matrix .B ∈ Rn×n where n is the number of components in the system. The element of .B, .bij , is the empirical probability that component i failure causes component j failure. In order to estimate .B, we need to obtain an matrix .A ∈ Rn×n whose element .aij is the expected number of component j failure caused by component i failure among all successive generations of all cascades, because there is bij =

.

aij , Ni

(2.1)

where .Ni is the number of times that component i fails. The interaction matrix .B describes how the components in the system interact with each other. Its nonzero elements are called links. For example, for a nonzero element .bij , there is a link .l : i → j , representing that the source component i failure causes the destination component j failure with a positive probability. All of the links form a directed interaction network .G (C , L ) with the set of vertices .C and the set of links .L . This estimation of the .B matrix is very challenging mainly due to the following reasons.

2.3 EM Algorithm Fig. 2.1 Illustration for finding the cause of a component failure. © [2018] IEEE. Reprinted, with permission, from [10]

33 B

A

bAC

generation g

bBC C

generation g + 1

1. If we know matrix .A, then matrix .B can be very easily estimated by (2.1). 2. If the interaction matrix .B is known, for two consecutive generations in each cascade we can infer how probable one component failure causes another component failure, based on which we can get the matrix .A. For the very simple example shown in Fig. 2.1, if .bAC ≈ 1 while .bBC ≈ 0, it is much more probable that the failure of component C is caused by component A failure. In these two consecutive generations, the probability that component A failure causes component C failure is approximately 1 while that for component B failure causing component C failure is around 0. 3. However, neither .A nor .B is initially known. Therefore, the estimation of the interactions between component failures is actually a typical parameter estimation problem with incomplete data. In [7], the interaction matrix .B is directly estimated from an approximated matrix A that is not statistically inferred from .B but is only estimated based on a simple assumption that the component j failure in generation .g + 1 is caused only by the component in generation g that is in the previous generation of component j for the largest number of times. This assumption will inevitably ignore some interactions and thus can only get approximated estimations of .A and .B.

.

2.3 EM Algorithm Here the EM algorithm [11, 12] is introduced by a coin-flipping example. It is a simple yet effective method for performing maximum likelihood estimation of parameters when there is incomplete data.

2.3.1 A Coin-Flipping Example The coin-flipping experiment in [12] is adapted to illustrate how the EM algorithm works. Assume there are n coins denoted by .c1 , . . . , cn , and coin .ci lands on heads and tails with probability .θi and .1 − θi , respectively. We perform the following experiments for m times and based on the results of these experiments we want to estimate .θ = (θ1 , . . . , θn ).

34

2 Cascading Failure Interaction Analysis

1. Exactly one coin is selected from the n coins and each coin is selected by the same probability. 2. A total of K tosses are independently performed for the chosen coin. During the experiments we record .x = [x1 , . . . , xm ] where .xj = i if .ci is chosen     where .y = [y , . . . , y ] and in the j th set of tosses and .y = y  j1 jK j 1 , . . . , ym .yj k = 1 if the selected coin in the j th set of tosses lands on heads for the kth toss and .yj k = 0 otherwise. Note that in this parameter estimation problem there is complete data. This is because both the type of the coin used for each toss and the result of each coin toss are known. The parameters .θ can be estimated by the maximum likelihood estimation as K  

θˆi =

.

j |xj =i k=1 m  K 1(xj j =1

yj k ,

(2.2)

= i)

where .j |xj = i indicates the set of tosses with coin .ci and .1 equals to one if .xj = 1 and to zero otherwise. The estimated parameters .θˆ = (θˆ1 , . . . , θˆn ) from (2.2) maximize .log P (x, y; θ ), which is the logarithm of the joint probability of having the coin types .x and the observed result .y. Then we change the problem settings by only recording the result of the coinflipping .y but not the types of the coins .x. We refer to .x as hidden variables. Because we do not have the data for the types of the coins, in this parameter estimation problem there is incomplete data. The EM algorithm can perform parameter estimation under this new setting.   (0) Specifically, starting from some initial parameters .θˆ = θˆ1(0) , . . . , θˆn(0) , the EM algorithm uses the parameters in this iteration to calculate the probabilities for each possible case of the incomplete data. Then the classic maximum likelihood estimation method is modified to be able to consider these probabilities, based on (t+1) which the updated parameter estimates .θˆ can be obtained. The EM algorithm iterates between the E-step and M-step as follows until convergence: 1. E-step: Estimate a probability distribution of the incomplete data based on the parameter in this iteration. 2. M-step: Estimate the parameters using the completions obtained in the E-step. For the above coin-flipping example, when we consider the incomplete data case the EM algorithm can be formulated below.  1. E-step: For the j th set of tosses, the number of heads is .nhead = K k=1 yj k and j tail head that for tails is .nj = K − nj . Then the probability distribution of .xj is

2.3 EM Algorithm

35

 head  ntail j (t) nj 1 − θˆi(t) θˆi .P (xj = i) =  head   tail , n   (t) nj (t) nj θˆ 1 − θˆ 

l

l=1

(2.3)

l

and the corresponding expected number of heads for coin .ci is .Eihead = nhead P (xj = i). j 2. M-step: The parameters are updated as m  (t+1) = θˆi

.

j =1

K

nhead P (xj = i) j .

m  j =1

(2.4)

P (xj = i)

For example, we assume there are two coins, .c1 and .c2 , with .θ1 = 0.7, θ2 = 0.4. We set .m = 5, K = 10 and get the following data x = [1 0 0 1 1]

.



0 ⎢ ⎢0 ⎢ .y = ⎢ 0 ⎢ ⎢ ⎣1 1

1 0 1 0 1

0 1 1 1 1

00 10 10 11 01

1 1 0 1 1

0 0 0 0 0

0 0 1 1 1

1 0 0 1 1

⎤ 0 ⎥ 1⎥ ⎥ 0⎥ ⎥. ⎥ 1⎦ 0

(0) For the EM algorithm, the initial parameter is assumed to be .θˆ = (0.6, 0.5). Then in E-step the number of heads, the probability that a specific coin is chosen, and the expected numbers of heads for each coin in each set of the tosses are listed (1) (1) in Table 2.1. In M-step we update the parameters as .θˆ1 = 0.596 and .θˆ2 = 0.453. After 9 iterations we get .θˆ1 = 0.702 and .θˆ2 = 0.383, both of which are very close to the real parameters.

Table 2.1 Quantities in E-step

j

.nj

.P (xj

1 2 3 4 5 Total

3 4 4 8 7 –

0.266 0.352 0.352 0.733 0.647 –

head

= 1)

.P (xj

0.734 0.648 0.648 0.267 0.353 –

= 0)

.E1

.E2

0.798 1.409 1.409 5.868 4.531 14.014

2.202 2.591 2.591 2.132 2.470 11.987

head

head

36

2 Cascading Failure Interaction Analysis

2.3.2 Mathematical Foundation With complete data, the objective function of the maximum likelihood estimation (.log P (x, y; θ )) usually only has one global optimum, which can often be obtained in closed form, such as by (2.2) in the coin-flipping example [12]. However, with incomplete data, the modified maximum likelihood estimation has to find .θˆ that maximizes .log P (y; θ ), which usually has multiple local optima. If there are several local optima, there might be only one global optimum. In order to solve this problem, the EM algorithm converts the one single optimization problem of .log P (y; θ ) into a series of subproblems, each of which has an objective function with a unique global optimum. In choosesa  the E-step, it  (t) (t) ˆ = log P y; θˆ function .gt that lower bounds .log P (y; θ ) and satisfies .gt θ . ˆ (t+1) that maximizes .gt . Since In the M-step, it determines the updated parameter .θ       ˆ (t) , there is .log P y; θˆ (t) = gt θˆ (t) ≤ gt θˆ (t+1) = .gt matches .log P (y; θ ) at .θ   (t+1) log P y; θˆ , meaning that the objective function monotonically increases during each iteration [12]. The estimated parameter increases the likelihood function after each iteration until a local maximum is achieved. Similar to most optimization methods for nonconvex functions, there is no guarantee that the EM algorithm will converge to a global maximum. Starting from different initial parameters may get different solutions. Running the algorithm for multiple times by using different initial parameters may help to get the solution with global optimum. Although other numerical optimization methods, such as gradient descent or Newton’s method [13], can in theory be used to solve the optimization problem, the EM algorithm provides a simple, robust, and easy-to-implement tool for parameter estimation in models with incomplete data [12].

2.4 Estimating Component Failure Interactions by EM Algorithm Assume .Mu ≤ M original cascades are utilized to estimate the interactions between component failures. Here we discuss how to estimate the interaction matrix by applying the EM algorithm introduced in Sect. 2.3. The corresponding maximum likelihood estimation problem is to estimate the parameters .B in order to maximize .log P (A, y; B), which is the logarithm of the joint probability of having the specific interactions between components in any two successive generations among all used cascades that are represented in .A and the observed result .y as the .Mu original cascades. Specifically, the EM algorithm can be implemented in the following four steps.

2.4 Estimating Component Failure Interactions by EM Algorithm

37

1. Initialization: We set the initial interaction matrix as .B (0) . In order to avoid ignoring any useful information, we assume that any failed components in generation g is the cause of the component failures in generation .g + 1. Then the initial matrix .A(0) ∈ Zn×n can be obtained from all original cascades and (0) can be calculated from .A(0) by using (2.1). .B Note that the assumption here tends to overestimate the component interactions. This is because a component that fails before the failure of another component may not be the cause of that particular failure. However, it is appropriate to use .A(0) to get the initial guess for .B since in this way we will not miss any interaction between component failures. As mentioned in Sect. 2.3.2, running the algorithm for multiple times by using different initial parameters may help to get the solution with global optimum. However, we do not need to do so because the chosen .A(0) has a clear physical meaning and is a good initial parameter. This will be validated by the test results in Sect. 2.6. 2. E-step: Estimate .A(k+1) based on .B (k) . For any two successive nonzero generations .g, g + 1 of any cascade m, under the condition that component j has failed, the component j failure in generation (m) .g + 1 is caused by component .i ∈ Fg in generation g by probability (k+1)m,g

pij

.

(k)

=

bij  

1−

(m)

l∈Fg (m)

1 − blj(k)

.

(2.5)

(k+1)m,g

/ Fg , .pij = 0. For example, for the two consecutive generations If .i ∈ shown in Fig. 2.2, there is pAD =

.

bAD , 1 − (1 − bAD )(1 − bBD )(1 − bCD )

(2.6)

and .pAE , .pBD , .pBE , .pCD , and .pCE can also be written in a similar manner. The updated entry of .A(k+1) can be obtained as the summation over all consecutive nonzero generations for all cascades

B

A

bAD

bBD D

bAE

C

bCD

generation g

bCE

bBE E

generation g + 1

Fig. 2.2 Illustration for inferring the probability of one component failure in generation .g + 1 caused by a specific component failure in generation g. © [2018] IEEE. Reprinted, with permission, from [10]

38

2 Cascading Failure Interaction Analysis m

(k+1) .a ij

=

Mu G −2 

(k+1)m,g

pij

,

(2.7)

m=1 g=0

where .Gm is the number of generations with nonzero number of outages in cascade m. 3. M-step: Estimate .B (k+1) based on .A(k+1) . After .A(k+1) is obtained, the updated interaction matrix .B (k+1) can be calculated by using (2.1). 4. End: Iterate the E-step and M-step until    (k+1)  − B (k)  B F . < , √ N

(2.8)

where .||X||F is the Frobenius norm of a .u × v matrix .X defined as   v  u  .||X||F =  |Xij |2 ,

(2.9)

i=1 j =1

N = N=0 if .N=0 , the number of nonzero elements in .B (k+1) − B (k) , is greater than 0, otherwise .N = 1, . is the tolerance, and the .B (k+1) that satisfies (2.8) will be the estimated interaction matrix.

.

2.5 Determining the Number of Cascades Needed The minimum numbers of cascades are determined to guarantee that (1) we almost do not lose any interaction and (2) we get all of the dominant interactions that can produce cascades with similar statistics to the original cascades. They are the lower bounds of M and .Mu and are denoted by .M min and .Mumin , respectively.

2.5.1 Lower Bound of M More original cascades tend to contain more information about the property of cascading failures of a system, or more specifically the interactions between cascading outages of the components. The added information brought from the added cascades will make the number of identified links increase. However, the number of links will not always grow with the increase of the number of cascades but will saturate after the number of cascades is greater than some number .M min , which can be determined by gradually increasing the number of cascades, recording

2.5 Determining the Number of Cascades Needed

39

the number of identified links, and finding the smallest number of cascades that can lead to the saturated number of links. Assume there are a total of .NM different M ranging from very small number to very large number, which are denoted by .Mi , i = 1, 2, · · · , NM . The number of links for .Mi cascades is denoted by .card(L (Mi )), where .card(·) denotes the cardinality of a set, which is a measure of the number of elements of the set. For .i = 1, 2, · · · , NM − 2 we define σi = σ (card(L )i ),

(2.10)

.

where .card(L )i = [card(L (Mi )) · · · card(L (MNM ))] and .σ (·) is the standard deviation of a vector. The .σi for .i = NM − 1 and .i = NM are not calculated since we would like to calculate the standard deviation for at least three data points. Very small and slightly fluctuating .σi can be used to indicate that the number of links begins to saturate after .Mi and thus this .Mi is determined as .M min . The .M min original cascades can guarantee that the accuracy on statistical values of interest is good and thus can provide a reference solution.

2.5.2 Lower Bound of Mu The propagation capacity of the obtained interaction network .G (C , L ), .PCG , and that of the original cascades, .PCori , are defined as [7]: Mu  ∞ 

PCori (Mu ) =

.

m=1 g=1

 PCG (Mu ) =

l∈L

  (m) card Fg Mu

.

(2.11)

Il (Mu ) Mu

,

(2.12)

where .Il is the expected value of the number of failures propagated through link .l : i → j with its source vertex i as generation 0 failures, which can be calculated based on the interaction network [7]. The selected .Mumin should make sure the mismatch between .PCori (Mu ) and G ori .PC (Mu ) is acceptably small and at the same time the .PC (Mu ) has almost min stabilized. Therefore, we want to find the .Mu satisfying the following two conditions |ΔPC (Mu )| ≤ 1 PCori (Mu )

.

|PCori (Mu ) − PCori (Mu − ΔM)| ≤ 2 PCori (Mu − ΔM),

40

2 Cascading Failure Interaction Analysis

where .ΔPC (Mu ) = PCG (Mu ) − PCori (Mu ), and .1 and .2 are the acceptable tolerance. We start from a very small value for .Mu , such as .Mu0 = 100, and then increase it by .ΔM at each step until the conditions listed here are satisfied.

2.6 Results All tests are performed on a desktop computer with 3.2 GHz Intel(R) Core(TM) i7-4790S. In order to test the EM based interaction estimation method, a total of .M = 50,000 cascades are obtained by open-loop AC OPA simulation [14] on the IEEE 118-bus system [2, 4, 7].

2.6.1 Number of Cascades Needed   The . in (2.8) is chosen as .0.01. The number of links, .card L (Mi ) , obtained by using .Mi ’s from 100 to .50,000 is shown in Fig. 2.3. The results for both the EM based method and that in [7] are shown. When M is small, the number of links increases with the increase of M, indicating that more cascades can provide more information about the cascading outage propagation. However, when M reaches a threshold, the number of links will not grow anymore. Although the number of links for the two methods is different, they very consistently saturate at .Mi = 41,000, which is chosen as .M min . The numbers of identified links under .M min is 419 and 715, respectively, for the method in [7] and the EM based estimation method. Note that the number of identified links by the EM based method is much greater than that by the method in [7]. This is mainly because for the latter method the component j failure in generation .g + 1 is only considered to be caused by the component failure in generation g that has caused the component j for the largest number of times among Fig. 2.3 Number of links for different M’s (triangles denote the number of links from EM algorithm and dots denote that by the method in [7]). © [2018] IEEE. Reprinted, with permission, from [10]

2.6 Results Fig. 2.4 Propagation capacity for different .Mu ’s. © [2018] IEEE. Reprinted, with permission, from [10]

41 3.5

PC ori PCG

PC

3

2.5

2

Table 2.2 Number of estimated links

102

Method EM based method EM based method Method in [7]

103

.Mu

41,000 400 400

104

Mu

n 186 186 186

.card(L )

715 170 77

r 0.0207 0.0049 0.0022

all cascades. If the numbers of times for two components in generation g is very close, ignoring the slightly smaller one will inevitably ignore important interactions. For determining .Mumin , we set .Mu0 , .1 , .2 , and .ΔM in Sect. 2.5.2 as 100, .0.05, min is determined as 400, .0.05, and 100, respectively. After four .ΔM iterations .Mu min which only accounts for .0.98% of .M . By contrast, for the interaction estimation method in [7], .Mumin is determined as 3700, which is almost one order of magnitude more, indicating the greatly improved efficiency of the EM based method. In Fig. 2.4 we present the propagation capacity calculated from the original cascades and the interaction network for different .Mu ’s. The mismatch is always small even when the number of cascades is very small and the propagation capacity of the original cascades can stabilize around .Mu = 400.

2.6.2 Interaction Matrix and Interaction Network The component interactions (links) are estimated by the EM based method in this chapter and the method in [7] based on .Mu = 400 cascades. For the EM based method, the EM algorithm converges after 5 iterations. Table 2.2 lists the number of identified links (.card(L )). The ratio of nonzero elements in the interaction matrix 2 .B is defined as .r = card(L )/n where n is the number of components. The very small value of r suggests that .B is a sparse matrix. Based on the interaction matrix, an interaction network can be constructed, which provides a graphical representation of the interactions between the components. Using the interaction network, the key links and key components that play important roles in failure propagation can also be identified, which will be discussed in detail below.

42

2 Cascading Failure Interaction Analysis

Table 2.3 Identified key links .i

→j .74 → 73 .74 → 72 .40 → 34 .74 → 82 .40 → 35 .62 → 68 .121 → 122 .121 → 125 .40 → 182 .12 → 18 .68 → 59 .182 → 43 .46 → 47 .40 → 43 .82 → 74 .182 → 40 .155 → 151 .102 → 82 .102 → 73 .102 → 34 .102 → 35

Line pairs .(53, 54) → .(52, 53) .(53, 54) → .(51, 52) .(29, 31) → .(27, 28) .(53, 54) → .(56, 58) .(29, 31) → .(28, 29) .(45, 46) → .(45, 49) .(77, 78) → .(78, 79) .(77, 78) → .(79, 80) .(29, 31) → .(114, 115) .(11, 12) → .(13, 15) .(45, 49) → .(43, 44) .(114, 115) → .(27, 32) .(35, 36) → .(35, 37) .(29, 31) → .(27, 32) .(56, 58) → .(53, 54) .(114, 115) → .(29, 31) .(94, 100) → .(80, 97) .(65, 66) → .(56, 58) .(65, 66) → .(52, 53) .(65, 66) → .(27, 28) .(65, 66) → .(28, 29)

∗ (400)

.Il (41,000)

.Il (400)

.Il

.12,490

.122(2)

.123(3)

.12,436

.126(1)

.127(2)

.11,187

.109(4)

.112(5)

.11,125

.111(3)

.110(7)

.10,864

.98(5)

.100(8)

.9922

.91(6)

.92.6(10)

.9581

.90(7)

.90(14)

.9579

.90(8)

.90(15)

.7396

.85(9)

.80(16)

.5307

.41(10)

.42(20)

.3682

.32(12)

.33(21)

.3317

.27(13)

10 (39)

.2167

17 (17) .34(11) .21.1(15) .23(14) .20.9(16) 0.51 (74) 0.49 (77) 1.38 (48) 1.24 (52)

.47(18)

.2154 .2085

1458 (19) 918 (22) 164 (52) 134 (57) 395 (29) 382 (30)

.48(17)

21 (26) 26 (25) 21 (29) .127.2(1) .123(4) .112(6) .100(9)

2.6.3 Identified Key Links and Key Components Key links and key components that are most critical for cascading outage propagation are identified by the method in Section III of [7], in which .Il (the expected value of the number of failures propagated through the link) is defined for a link l to indicate its importance. The .l in [7] is set to be .0.15 to make sure that all key links have their .Il in the same order of magnitude. In Table 2.3 we list the identified key links and their .Il . Note that .Il corresponds to the links obtained from the EM based interaction estimation method and .Il∗ for the method in [7]. The numbers highlighted in bold font correspond to the identified key links and the numbers in the parentheses are the ranking of the links. When .41,000 cascades are used, 15 links are identified as key links. Although the key links only account for .2.1% of all links, their total .Il is .79.9% of the total .Il of all links. In Table 2.3 we can see that the identified key links by using the EM algorithm based only on 400 cascades (denoted by a set .L1 ) are almost the same as those from .41,000 cascades. By contrast, when also using the same 400 cascades, the identified key links from the method in [7] (denoted by a set .L2 ) are very different. The seven components with the largest out-strengths are identified as key components. The tripping of these lines could produce extensive outage propagation

2.6 Results

43

Table 2.4 Identified key components Key component 74 40 121 62 12 68 182 102 46

Line .(53, 54) .(29, 31) .(77, 78) .(45, 46) .(11, 12) .(45, 49) .(114, 115) .(65, 66) .(35, 36)

.si

out (41,000)

.si

out (400)

.si

out∗ (400)

.38,230

.384(1)

.380(2)

.32,317

.328(2)

.342(3)

.19,805

.185(3)

.181(4)

.10,300

.92(4)

.93(6)

.6561

.50(7)

42 (8)

.6250

.59(5)

.61(7)

.4831

.52(6)

39 (9)

4221 (9) 4751 (8)

22 (11) 27 (9)

.908(1) .137(5)

in the system and should be avoided as much as possible. These key components, the corresponding branches, and their out-strengths are presented in Table 2.4. Note that out is the out-strength obtained from the EM based interaction estimation method .s i and .siout∗ for the method in [7]. The numbers highlighted in bold font correspond to the identified key components and the numbers in the parentheses represent the ranking of the components in terms of the out-strength. When .41,000 cascades are used, the number of key components is .3.76% of all of the components and the summation of their out-strengths accounts for .83.41% of the total out-strengths of the components. It is clearly seen from Table 2.4 that by using the EM based interaction method we can identify the same key components based on only 400 cascades as those using .41,000 cascades. By contrast, when only using 400 cascades, the interaction estimation method in [7] identifies very different key components and several important components cannot be identified.

2.6.4 Validation of Estimated Interactions We validate the estimated interactions between component failures by the interaction estimation method based on EM algorithm. In Fig. 2.5 we show the probability distributions of the line outages for the .41,000 original cascades and the .41,000 cascades obtained by the interaction model in [7] based on the interactions estimated from the method in [7] and the EM based method. Note that the interaction model uses the distribution of the initial outages and the interactions obtained from 400 original cascades to generate .41,000 cascades. The simulated cascades by using the interactions obtained from 400 original cascades with the EM based estimation method have very similar statistical features to the original .41,000 cascades. By contrast, the distribution of the simulated .41,000 cascades based on the interactions obtained from the same number of cascades using the estimation method in [7] is very different from the original cascades, indicating that the EM based estimation method can more accurately estimate the interactions of component failures.

44

2 Cascading Failure Interaction Analysis

Fig. 2.5 Probability distributions of the number of line outages (“Open Square” and “Open Triangle” denote initial outages and total outages of the original cascades; “Blue Circle” and “Red Inverted Triangle” denote total outages of the simulated cascades using 400 cascades, respectively, for the method in [7] and the EM based method). © [2018] IEEE. Reprinted, with permission, from [10]

2.6.5 Cascading Failure Mitigation Similar to [7], removing some key links can mitigate the cascading outage propagation. In power systems this can be realized by blocking the operation of zone 3 relays [15]. In order to test the identified key links in Sect. 2.6.3, we first obtain the distributions of initial outages and the interaction matrix .B by using .41,000 original cascades and then remove the top 10 key links in .L1 or .L2 by setting the corresponding elements in .B to be zero. By doing this we get .B 1int and .B 2int , which are further used to simulate .41,000 cascades by using the interaction model [7, 8]. By comparing the probability distribution of the total number of line outages of the .41,000 original cascades and those of the .41,000 simulated cascades with the key-link based mitigation, we will be able to figure out (1) if the key-link based mitigation is effective in reducing the cascading failure risk, and (2) if the key links in .L1 identified based on the EM based interaction estimation method are better than those in .L2 from the simple interaction estimation method in [7]. Figure 2.6 shows the comparison of the probability distributions of the number of line outages before and after mitigation. The probability of having a cascading blackout with a large number of line outages is greatly reduced by removing only 10 key links and removing the same number of key links in .L1 can better mitigate the cascading risks than removing those in .L2 . The estimated average propagation of the branching process [2–5] for the original cascades before mitigation is .0.40, which can be reduced to .0.23 and .0.12, respectively, after mitigation based on .L1 key links and .L2 key links, indicating that the EM based interaction estimation method can more accurately identify key links.

2.6 Results

45

Fig. 2.6 Probability distributions of the number of line outages under key-link based mitigation (“Open Square” and “Open Triangle” denote initial outages and total outages of the original cascades; “Blue Circle” and “Red Inverted Triangle” denote total outages of the simulated cascades with mitigation based on .B 1int and .B 2int ). © [2018] IEEE. Reprinted, with permission, from [10] Table 2.5 Efficiency improvement Model AC OPA

Interaction Interaction

Method – EM based method Method in [7]

.Mu

.T1 (second)



205,200 2002 18,518

400 3700

T (second) 0 8.59 5.07

.T2 (second)

0 29 29

2.6.6 Efficiency Improvement As in Sect. 2.6.1, the .Mumin for the EM based interaction estimation method and that for the method in [7] are, respectively, 400 and 3700. We list the time used to simulate .Mumin cascades by AC OPA (.T1 ) and that used to estimate the interactions (T ) for both methods in Table 2.5. Compared with the method in [7], for getting reliable interactions the EM based method can achieve a speedup of .(18,518 + 5.07)/(2002 + 8.59) ≈ 9.21. In Table 2.5, we also list .T2 which is the time for simulating .41,000 cascades by using the interaction model. Compared with purely relying on the AC OPA simulation, the interaction model simulation based on the EM based interaction estimation method can achieve a speedup of .205,200/(2002 + 8.59 + 29) ≈ 100.61. The time efficiency can be significantly improved by first estimating the component interactions using .Mumin M min original cascades and then conducting highly probabilistic interaction model simulation. It is seen in Table 2.5 that when using the EM based estimation method and the interaction model to perform simulation, the majority of the time is for simulating the original cascades by detailed simulation models such as AC OPA, although the number of required original cascades has been significantly reduced by the EM based estimation method. Therefore, in order to further reduce the total simulation

46

2 Cascading Failure Interaction Analysis

time and finally make possible the online application, it is also needed to improve the computational efficiency of the detailed cascading failure simulation, such as by parallel computation or other high performance computing techniques.

References 1. K. Sun, Y. Hou, W. Sun, J. Qi, Power System Control Under Cascading Failures: Understanding, Mitigation, and System Restoration (Wiley-IEEE Press, New York, 2019) 2. I. Dobson, J. Kim, K.R. Wierzbicki, Testing branching process estimators of cascading failure with data from a simulation of transmission line outages. Risk Analy. 30(4), 650–662 (2010) 3. J. Kim, K.R. Wierzbicki, I. Dobson, R.C. Hardiman, Estimating propagation and distribution of load shed in simulations of cascading blackouts. IEEE Syst. J. 6(3), 548–557 (2012) 4. J. Qi, I. Dobson, S. Mei, Towards estimating the statistics of simulated cascades of outages with branching processes. IEEE Trans. Power Syst. 28(3), 3410–3419 (2013) 5. J. Qi, W. Ju, K. Sun, Estimating the propagation of interdependent cascading outages with multi-type branching processes. IEEE Trans. Power Syst. 32(2), 1212–1223 (2017) 6. P.D.H. Hines, I. Dobson, P. Rezaei, Cascading power outages propagate locally in an influence graph that is not the actual grid topology. IEEE Trans. Power Syst. 32(2), 958–967 (2017) 7. J. Qi, K. Sun, S. Mei, An interaction model for simulation and mitigation of cascading failures. IEEE Trans. Power Syst. 30(2), 804–819 (2015) 8. W. Ju, J. Qi, K. Sun, Simulation and analysis of cascading failures on an NPCC power system test bed, in 2015 IEEE Power Energy Society General Meeting (2015), pp. 1–5 9. W. Ju, K. Sun, J. Qi, Multi-layer interaction graph for analysis and mitigation of cascading outages. IEEE J. Emerg. Sel. Topics Circuits Syst. 7(2), 239–249 (2017) 10. J. Qi, J. Wang, K. Sun, Efficient estimation of component interactions for cascading failure analysis by EM algorithm. IEEE Trans. Power Syst. 33(3), 3153–3161 (2018) 11. A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Stat. Methodol. 39(1), 1–22 (1977) 12. C.B. Do, S. Batzoglou, What is the expectation maximization algorithm? Nat. Biotechnol. 26(8), 897 (2008) 13. C.T. Kelley, Iterative Methods for Optimization (SIAM, Philadelphia, 1999) 14. S. Mei, Y. Ni, G. Wang, S. Wu, A study of self-organized criticality of power system under cascading failures based on AC-OPF with voltage stability margin. IEEE Trans. Power Syst. 23(4), 1719–1726 (2008) 15. S.-I. Lim, C.-C. Liu, S.-J. Lee, M.-S. Choi, S.-J. Rim, Blocking of zone 3 relays to prevent cascaded events. IEEE Trans. Power Syst. 23(2), 747–754 (2008)

Chapter 3

Integrated Preventive and Emergency Responses

3.1 Introduction Natural disasters are the leading cause of blackouts (e.g., 80% of all major power outages between 2003–2012 were caused by natural disasters [1]). To make the power grid stronger against natural disasters, the concept of resilience was recently introduced [2–6]. It is different from the concept of reliability; the focus of reliability is on relatively higher probability events while the focus of resilience is on lowprobability high-impact events. Currently, there is no agreement on the definition of resilience. Over seventy definitions can be found in different disciplines [7], and some of them are completely different from others. As [7] has pointed out, these definitions vary between two features: adaptation and recovery. Here, “adaptation” means the process of changing in order to make the system suitable for a new situation, and “recovery” means the process of returning to a normal condition after a disturbance. In power systems, these two features could also be found [3–6], and power grid resilience can be defined as the ability of power grids to adapt to disaster scenarios and recover to pre-disaster states. Based on the above definition of resilience, resilience enhancement strategies could be categorized into two groups: enhancing the adaptation ability and enhancing the recovery ability. Following the traditional classification of power system practices, these strategies can also be categorized from the perspective of planning and from the perspective of operation, and the operation strategies can be further divided according to the three operating states—preventive state, emergency state, and restorative state [8]—that power grids will go through. As a result, there are four types of power grid resilience enhancement strategies (i.e., resilience planning, preventive response, emergency response, and resilience restoration), as shown in Fig. 3.1. This chapter focuses on the resilience adaptation ability of power grids from the perspective of operation, and only preventive response and emergency response, which can be collectively called resilience response, will be discussed. Preventive response includes the actions available before disaster scenarios unfold, and emergency response comprises the actions taken in the aftermath of © Springer Nature Switzerland AG 2023 J. Qi, Smart Grid Resilience, https://doi.org/10.1007/978-3-031-29290-3_3

47

48

3 Integrated Preventive and Emergency Responses

Fig. 3.1 The milestones of resilience enhancement strategies. © [2017] IEEE. Reprinted, with permission, from [16]

a disaster. For simplicity, only the indispensable components of power grids (i.e., generators, transmission lines, and loads) will be considered in this chapter, and other resources such as energy storage systems [9], are not considered. Accordingly, the specific strategies include generator re-dispatch, topology switching, and load shedding. As power grids in the preventive state are operated to satisfy all demands without violating any operating constraints [8], load shedding is considered only in the emergency state. Generator re-dispatch and topology switching can be utilized in both preventive and emergency states, but evidently the emergency state will have fewer resources than the preventive state. Although both preventive response and emergency response have resources to enhance power grid resilience, they are usually deployed independently. Independent preventive response is often not adequate to keep the generation-demand balance after natural disasters unfold. Models that boost the power grid resilience from the perspective of operation mainly utilize only independent emergency response [10–15]. To further enhance power grid resilience, this chapter integrates preventive response and emergency response [16]. Besides, the widely studied topology switching is the in-service lines (ISLs) switch off [17–22], but with the development of new facilities against natural disasters, the out-of-service lines (OSLs) switch on (e.g., the backup transmission lines [23]) has emerged, and in this chapter this will be considered along with the ISLs switch off. Other possible implementations of OSLs switch on in the preventive state include: (1) restoring the outage-scheduled transmission lines [24] and (2) rescheduling or canceling the outage schedules [25].

3.2 Integrated Resilience Response Based on situational awareness, an integrated resilience response (IRR) framework can be used to enhance power grid resilience. The flow chart of the framework is shown in Fig. 3.2, and the response process is summarized as follows. 1. Situational awareness: Activate the IRR framework when the disaster is brewing. Update the disaster forecast information from national meteorological services

3.2 Integrated Resilience Response

49

Fig. 3.2 Integrated resilience response framework. © [2017] IEEE. Reprinted, with permission, from [16]

or local weather forecast offices and obtain the power outage prediction. Gather the obtained power outage prediction results and the latest information of power grids (including the network topology and system parameters), which will serve as inputs to the next stage. 2. Preventive response: Run the two-stage robust mixed-integer optimization (RoMIO) model, whose mathematical formulation will be presented in Sect. 3.3, to derive effective preventive and emergency response strategies. Quickly perform the derived preventive response when it is available and retain the emergency response strategies according to possible worst disaster scenario for the next stage. 3. Emergency response: Check whether the realized disaster scenario is retained during the preventive response stage. If retained, the corresponding emergency response strategy previously derived from the RoMIO model will be immediately performed; otherwise, run the RoMIO-E model, which is the emergency response module of the RoMIO model and will be presented in Sect. 3.3, then perform the derived emergency response. Considering the provided data from supervisory control and data acquisition (SCADA) and wide-area measurement system (WAMS), the following system parameters are considered: loads at each bus, initial generator outputs, generator capacities, generator ramp-up limits, transmission line capacities, and transmission line parameters. The network topology data includes the bus locations and the transmission line status. Some natural disasters can sweep across a large area for a relatively long period of time. To capture this spatial and temporal dynamics of natural disasters, time

50

3 Integrated Preventive and Emergency Responses

horizon should be considered in the IRR framework, and the above response process could then work in a rolling manner.

3.3 Mathematical Formulation The mathematical formulation of the RoMIO model is presented in this section.

3.3.1 Preventive Response Power grids in the preventive state are operated to satisfy all demands without violating any operating constraints. From an economic point of view, the minimumcost operating point of the system is optimal. But to enhance power grid resilience, the load shedding cost in the emergency state should also be considered in the preventive state. Hence the objective of the preventive response in the RoMIO model is:   c . min cg pga + cd pd,s , (3.1) g∈G

d∈D

where .cg is the offer price of generator g, .cd is the shedding price of load d, .pga is c is the power shed the power output of generator g in the preventive state, and .pd,s of load d in the emergency state. Based on the traditional DC power flow model, the power balance within power grids is described as follows:  .

g∈Gi

pga −



a pij =

(i,j )∈Li



Pd , i ∈ B.

(3.2)

d∈Di

a (sija − 1)Mij ≤ pij + Bij θija ≤ (1 − sija )Mij , (i, j ) ∈ L ,

(3.3)

where (3.2) represents power balance constraints for each bus and (3.3) represents power balance constraints for each transmission line. Note that the modeling of topology switching has already been included in (3.3). Here, generator output .pga a and voltage and transmission line status .sija are independent variables, line flow .pij a angle .θi are dependent variables. .Pd is the power demand of load d, .Bij = −1/Xij , where .Xij is the reactance of transmission line .(i, j ), and .Mij is a big M constant for transmission line .(i, j ), which can be easily chosen, for example, by .Mij = Pijmax − Bij (Θimax − Θjmin ). Following the convention in [19, 20, 26], each generator capacity is limited as: pga ≤ Pgmax , g ∈ G .

.

(3.4)

3.3 Mathematical Formulation

51

In addition, the generators are subject to ramp-up constraints: pga − pg0 ≤ Pga,max , g ∈ G ,

.

(3.5)

where .Pgmax is the generator capacity, .pg0 is the initial generator output, and .Pga,max is the generator ramp-up capacity in the preventive state. The ramp-down constraints are ignored, as the focus is on power grid damage induced by natural disasters, which cause capacity deficiencies, not surpluses [26]. Each transmission line is limited as follows: .

a − Pijmax sija ≤ pij ≤ Pijmax sija , (i, j ) ∈ L ,

(3.6)

and each bus should meet the phase angle limit: Θimin ≤ θia ≤ Θimax , i ∈ B.

.

(3.7)

Furthermore, although the resources are usually less limited in the preventive state than that in the emergency state, the number of ISLs that can be switched off and the number of OSLs that can be switched on are still limited against natural disasters. As a result, there are:  . sij0 (1 − sija ) ≤ KIa. (3.8) (i,j )∈L



a (1 − sij0 )sija ≤ KO ,

(3.9)

(i,j )∈L a are the where .sij0 represents the initial state of transmission line .(i, j ), .KIa and .KO quantity limits for the ISLs switch off and OSLs switch on, respectively. After the employment of preventive response, transmission line .(i, j ) is in service if .sija = 1 and out of service if .sija = 0.

3.3.2 Damage From Natural Disasters Damage from natural disasters can cause load shedding, and the worst disaster scenario corresponds to: .

max



c pd,s .

(3.10)

d∈D

Based on the natural disaster information, statistical models and simulation models can be utilized to predict power outages. The strength of storms could be reflected by the number of damaged transmission lines [27], and many existing

52

3 Integrated Preventive and Emergency Responses

works (e.g., [28]) that aim to assess the impact of storms on power grids using statistical methods also estimate the aggregated number of damages within a given area. The strength of storms is assumed to be predictable and the estimated maximum number of damaged transmission lines is known. Accordingly, the damage from storms can be modeled by an uncertainty set based on the estimated maximum number of damaged transmission lines as follows:  .

(1 − sijb ) ≤ K b ,

(3.11)

(i,j )∈L

where .K b is the estimated maximum number of damaged transmission lines, .sijb = 0 if line .(i, j ) will be damaged and .sijb = 1 otherwise. The uncertainty set based on .K b is used because it is a tractable method given that the statistical information of damaged transmission lines is difficult to obtain. The transmission lines are modeled here because they are the most commonly damaged electricity infrastructures against storms [29]. The damage from other natural disasters could be similarly modeled. Specifically, for those natural disasters that have similar effects of storms and will damage transmission lines (e.g., icing), a similar uncertainty set based on the estimated maximum number of damaged transmission lines could be formulated to model the damage; for other natural disasters that will affect other components (e.g., flooding mainly affects the generators rather than the transmission lines), an uncertainty set based on the estimated maximum number of other damaged components could then be utilized to model the damage. In the latter case, the status of other components may be introduced, and the RoMIO model should be modified accordingly.

3.3.3 Emergency Response After natural disasters unfold, power grid enters emergency state and should be operated to supply as much load as possible. Thus, the objective of the emergency response is: .

min



c pd,s .

(3.12)

d∈D

The power balance constraints of the system are similar to those in the preventive state:     c c . pgc − pij + pd,s = Pd , i ∈ B. (3.13) g∈Gi

(sijc

(i,j )∈Li

− 1)Mij ≤

c pij

d∈Di

+ Bij θijc

d∈Di

≤ (1 − sijc )Mij , (i, j ) ∈ L ,

(3.14)

3.3 Mathematical Formulation

53

c are where generator output .pgc , transmission line status .sijc , and power shed .pd,s c c independent variables, and line flow .pij and phase angle .θi are dependent variables. Similar to the preventive state, each generator capacity is limited as:

pgc ≤ Pgmax , g ∈ G ,

.

(3.15)

and the generators are subject to emergency state ramp-up constraints: pgc − pga ≤ Pgc,max , g ∈ G .

.

(3.16)

Note that the ramp-up constraints for emergency state are not the same as those for preventive state, because generators in emergency state will usually have shorter time to ramp up. Generally, there is .Pgc,max ≤ Pga,max . Each transmission line and each bus are limited similarly to that in the preventive state: .

c − Pijmax sijc ≤ pij ≤ Pijmax sijc , (i, j ) ∈ L .

Θimin ≤ θic ≤ Θimax , i ∈ B.

(3.17) (3.18)

In addition, the following constraints are involved when load shedding is performed: c 0 ≤ pd,s ≤ Pd , d ∈ D.

.

(3.19)

The constraints for ISLs switch off and OSLs switch on in the emergency state are given as follows:  .

sija sijb (1 − sijc ) ≤ KIc.

(3.20)

c (1 − sija )sijb sijc ≤ KO .

(3.21)

(i,j )∈L



(i,j )∈L

sijc ≤ sijb , (i, j ) ∈ L ,

(3.22)

c are the quantity limits for ISLs switch off and OSLs switch on where .KIc and .KO in the emergency state. After the employment of emergency response, transmission line .(i, j ) is in service if .sijc = 1 and out of service if .sijc = 0. As the emergency state has fewer resilience resources than the preventive state, the emergency state may have tighter ramp-up capacities for generators (i.e., c,max .Pg ≤ Pga,max ) and tighter switching limits for transmission lines. To reflect this difference in topology switching limits, assume that the OSLs in the preventive state are unavailable to the emergency state, and emergency response could only switch on the lines that are previously switched off by the preventive response. Another

54

3 Integrated Preventive and Emergency Responses

reason behind this assumption is that some sources of OSLs in the preventive state, for example, restoring the outage-scheduled transmission lines [24], need a relatively long time that the emergency state cannot afford. Thus, there is: sijc ≤ sij0 + sija , (i, j ) ∈ L .

.

(3.23)

3.3.4 Integration of Preventive and Emergency Responses Based on Sects. 3.3.1–3.3.3, the entire RoMIO model can be given as: min

(3.1)

s.t. (3.2)–(3.9) max

(3.10)

s.t.

(3.11)

(3.24)

.

min s.t.

(3.12) (3.13)–(3.23).

It is difficult to directly solve the above model, because there are quadratic terms and cubic terms in (3.20)–(3.21). To deal with this issue, binary variables a b b c a b c .sij,1 = s s , .sij,2 = s s , and .sij,3 = s s s ij ij ij ij ij ij ij are introduced. Then (3.20)–(3.21) become:  . (sij,1 − sij,3 ) ≤ KIc. (3.25) (i,j )∈L



c (sij,2 − sij,3 ) ≤ KO ,

(3.26)

(i,j )∈L

where .sij,1 , .sij,2 , and .sij,3 are all products of multiple binary variables, and they can be linearized by:

.

s=

 k∈N

⎧ ⎨s ≤ sk , k ∈ N  sk ⇐⇒ sk − card(N ) + 1, ⎩s ≥

(3.27)

k∈N

where .card(·) is the number of elements in a set, and .sk (.k ∈ N ) and s are binary variables.

3.4 Solution Methodology

55

After the above relaxation, the objective functions and constraint functions of the RoMIO model become either linear or mixed-integer linear. Its abstract mathematical formulation can be given as: min

s 1 ,x 1

s.t.

cT1 x 1 + cT3 x 3  T A1 s T1 x T1 ≤ b1 max

.

s2

s.t.

cT3 x 3 (3.28) A2 s 2 ≤ b 2 min

s 3 ,x 3

s.t.

cT3 x 3  T A3 s T1 x T1 s T2 s T3 x T3 ≤ b3 ,

a , θ a }, .s b , .{s c , s where .s 1 , .x 1 , .s 2 , .s 3 , and .x 3 are made up of .sija , .{pga , pij i ij ij ij,1 , sij,2 , c c c c sij,3 }, and .{pg , pd,s , pij , θi }, respectively. When natural disasters unfold, the first two levels of the tri-level problem in (3.28) are determined, and only .s 3 and .x 3 remain undecided. This corresponds to the emergency response module of the RoMIO model, which is referred to as the RoMIO-E model. Note that the RoMIO-E model is a single-level mixed-integer linear optimization model, which can be efficiently solved using the state-of-the-art mixed-integer linear programming (MIP) solvers.

3.4 Solution Methodology The RoMIO model is a tri-level mixed-integer linear optimization problem, and the nested column-and-constraint generation (NC&CG) decomposition framework proposed in [30] is able to solve this type of problem. However, a general algorithm based on the NC&CG decomposition can be very time-consuming, and this will impose practical limitations on the application of the IRR framework against natural disasters.

3.4.1 NC&CG Decomposition-Based Algorithm The tri-level problem (3.28) can be decomposed into a single-level .min problem as the master problem and a bi-level .max-.min problem as the subproblem as follows:

56

3 Integrated Preventive and Emergency Responses

min

cT1 x 1 + η

s.t.

cT3 x i3 ≤ η  T A1 s T1 x T1 ≤ b1  iT iT T ≤ b A3 s T1 x T1 s ∗iT 3 2 s3 x3

s 1 ,x 1 ,s i3 ,x i3 ,η

.

(MP-I)

∀i ∈ {1, · · · , k} max min s2

s 3 ,x 3

A2 s 2 ≤ b 2  ∗T T T T T ≤ b , A3 s ∗T 3 1 x1 s2 s3 x3

s.t.

.

cT1 x ∗1 + cT3 x 3 (SP-I)

where (MP-I) and (SP-I) will be iteratively solved to provide the lower bound and upper bound for (3.28), respectively. Here, .η is a newly introduced variable, k is the iteration number, and variables denoted by an asterisk (.∗) in (MP-I) or (SP-I) represent the optimal values derived from each other. After each iteration, k is updated by adding 1, and new variables and constraints will be generated and added to (MP-I). It has been proved that the above iteration will terminate in finite steps and the optimal value can be achieved [30]. Here, (MP-I) is a classical MIP problem and can be directly solved with the state-of-the-art MIP solvers, but (SP-I) is a bi-level problem with integers in either level and cannot be straightforwardly solved. Thus, rewrite (SP-I) in a tri-level form as follows: max min min s2

.

s3

x3

cT1 x ∗1 + cT3 x 3

s.t. A2 s 2 ≤ b2  ∗T T T T T ≤ b . A3 s ∗T 3 1 x1 s2 s3 x3

(3.29)

Then it can be decomposed into the following master problem and subproblem, which will also be iteratively solved to provide upper bound and lower bound for (SP-I), respectively:

3.4 Solution Methodology

57

max

cT1 x ∗1 + τ

s.t.

τ ≤ cT3 x 3

j s 2 ,x 3 ,λj ,τ

j

A2 s 2 ≤ b 2

T ∗T s T s ∗j T x j T A3 s ∗T ≤ b3 x 1 1 2 3 3 AT3,5 λj = −c3

(MP-II)

T j ∗T T ∗j T cT3 x 3 = (A3,1−4 s ∗T − b 3 )T λ j 1 x1 s2 s3

λj ≥ 0 ∀j ∈ {1, · · · , k } min

s 3 ,x 3

s.t.

cT1 x ∗1 + cT3 x 3 .

 ∗T ∗T T T T ≤ b , A3 s ∗T 3 1 x1 s2 s3 x3

(SP-II)

T ∗T T ∗j T x j T where .λj includes the dual variables of the constraint .A3 s ∗T ≤ 1 x1 s2 s3 3 b3 in the (SP-II) problem, .τ is a newly introduced variable, .k is the iteration number, and new variables and  constraints will be generated and added to (MP-II) after each iteration. Note that . A3,1−4 A3,5 = A3 .

3.4.2 Computational Efficiency Improvement Techniques Theoretically, both Karush–Kuhn–Tucker conditions (KKT) and strong duality theory (SDT) can be used in the NC&CG decomposition. But they will both introduce bilinear terms, and it is fundamentally more difficult to deal with bilinear terms than with linear ones. As a result, the algorithm can be effective, but not efficient, especially when a large number of bilinear terms are introduced. When applying the KKT, bilinear terms are introduced due to the following constraint:

T ∗j T j T ∗T ∗T T ◦ λj = 0, . b 3 − A3 s (3.30) x3 1 x1 s2 s3 where “.◦” denotes the component-wise multiplication.

58

3 Integrated Preventive and Emergency Responses

When the SDT is applied, the constraint that introduces bilinear terms becomes:

T

T ∗j T ∗T ∗T T = A3,1−4 s 1 x 1 s 2 s 3 − b 3 λj .

T j .c 3 x 3

(3.31)

 We have .A3 = A3,1−4 A3,5 , and .A3,1−4 can be referred to as  . A3,1 A3,2 A3,3 A3,4 . Note that .A3,1 , .A3,2 , .A3,3 , .A3,4 , and .A3,5 represent the block matrices of .A3 , corresponding to .s 1 , .x 1 , .s 2 , .s 3 , and .x 3 , respectively. In j (3.30) and (3.31), .s 2 , .x 3 , and .λj are the variables. Thus, the number of bilinear terms in (3.30) will be determined by the number of nonzero terms in .A3,3 and .A3,5 , while the number of bilinear terms in (3.31) will be determined by the number of nonzero terms in .A3,3 only. .A3,3 is relatively sparse, and according to (3.13)–(3.19), (3.22)–(3.23), and (3.25)–(3.27), the number of nonzero terms in .A3,3 and .A3,5 can be derived: .

NA3,3 = 4 card(L ).

(3.32)

NA3,5 = 12 card(L ) + 5 card(G ) + 6 card(B),

(3.33)

where .card(L ), .card(G ), and .card(B) represent the number of transmission lines, generators, and buses, respectively. The number of bilinear terms in (3.30) is .NA3,3 + NA3,5 while the number in (3.31) is .NA3,3 , which means that much fewer bilinear terms will be introduced if SDT is applied to the RoMIO model. Furthermore, the above difference is only for one calculation of (MP-II), and this calculation will be repeated several times for one calculation of (SP-I), while several calculations of (SP-I) will be required to solve the whole problem. Due to this cumulative effect, the difference in the number of bilinear terms will be significant. Therefore, SDT can provide higher computational efficiency for the RoMIO model and should be used in the solution methodology. A linearization technique to deal with the bilinear terms induced by SDT is provided as follows:

⎧ ⎪ ⎪ ⎨y = λs .

λ ∈ R≥0 ⎪ ⎪ ⎩s ∈ {0, 1}

⎧ ⎪ y≥0 ⎪ ⎪ ⎪ ⎪ ⎪ y≤λ ⎪ ⎪ ⎪ ⎨y ≤ Ms ⇐⇒ ⎪ y ≥ λ − M(1 − s) ⎪ ⎪ ⎪ ⎪ ⎪ λ ∈ R≥0 ⎪ ⎪ ⎪ ⎩ s ∈ {0, 1}

,

(3.34)

where y is a newly introduced variable, M is a large number, .λ ∈ R≥0 and .s ∈ {0, 1} are the elements of decision variables .λj and .s 2 , respectively. The effectiveness of the integrated resilience response and the efficiency of the SDT-based solution methodology are verified on three different systems, including

3.5 Results on PJM Five-Bus System

59

a modified version of the PJM five-bus system, the IEEE one-area RTS-96 system, and the IEEE three-area RTS-96 system. Since the emergency state decision making (i.e., the RoMIO-E model) is a classical mixed-integer linear programming problem and it can be directly and efficiently solved using the state-of-the-art MIP solvers, we mainly focus on verifying the efficiency of the SDT-based solution methodology for solving the entire RoMIO model for the preventive state decision making. All tests are performed on a 2.90-GHz Intel(R) Core(TM) i7-4600M based laptop, and Gurobi 6.5 [31] is used as the MIP solver with optimality tolerance .10−3 . The maximum time limit for preventive state decision making is set as 1 h, and “NA” indicates the methodology fails to solve the problem within the maximum time limit.

3.5 Results on PJM Five-Bus System The PJM five-bus system [32] shown in Fig. 3.3 has four generators, three loads, and seven transmission lines (including one OSL in preventive state). The system data is given in Appendix 1. The ramp-up capacity in emergency state is set to be 25% of that in the preventive state. The quantity limits for both ISLs switch off and OSLs switch on in preventive and emergency states are set to be the number of OSLs in preventive state. Independent preventive response (referred to as “PR”) includes generator redispatch in the preventive state, and independent emergency response (referred to as “ER”) includes generator re-dispatch and load shedding in the emergency state. IRR includes generator re-dispatch in the preventive state, and generator re-dispatch and load shedding in the emergency state. The benefits of the IRR with topology switching in both preventive and emergency states (referred to as “IRR-TS”) will also be demonstrated. Table 3.1 lists the load shed for PR, ER, IRR, and IRR-TS when the estimated maximum number of damaged transmission lines (i.e., .K b ) varies, in which “✕” means that the response strategy cannot keep the generation-demand balance after disasters unfold. It is seen that PR fails to keep power balance in all seven cases. This is because independent preventive response aims to withstand contingencies with Fig. 3.3 PJM five-bus system (dashed line indicates the OSL in preventive state). © [2017] IEEE. Reprinted, with permission, from [16]

60

3 Integrated Preventive and Emergency Responses

Table 3.1 Load shed of the PJM five-bus system under different .K b ’s

.K

1 2 3 4 5 6 7

b

Load shed (MW) PR ER IRR ✕ 189 39 ✕ 429 300 ✕ 639 489 ✕ 639 489 ✕ 688 638 ✕ 688 638 ✕ 688 638

IRR-TS 0 300 300 489 489 638 638

Table 3.2 Total cost and operating cost of the PJM five-bus system under different .K b ’s .K

1 2 3 4 5 6 7

b

Total cost ($) ER 206,520 446,520 656,520 656,520 705,520 705,520 705,520

IRR 59,945 320,315 509,077 509,637 655,675 655,675 655,675

IRR-TS 16,463 316,163 320,315 508,970 509,637 655,375 655,375

Operating cost ($) ER IRR 17,520 20,945 17,520 20,315 17,520 20,077 17,520 20,637 17,520 17,675 17,520 17,675 17,520 17,675

IRR-TS 16,463 16,163 20,315 19,970 20,637 17,375 17,375

only pre-disturbance strategies, and this will not work under severe disturbances such as natural disasters. It can also be observed that while ER can keep the power balance against natural disasters, much more load will have to be shed, compared to IRR or IRR-TS. For example, when .K b = 1, ER has to shed 189 MW to keep the power balance, but IRR only needs to shed 39 MW and IRR-TS does not need to shed any load. This shows that integrated resilience response can better enhance power grid resilience than independent preventive response or independent emergency response, and power grid resilience could be further enhanced by utilizing topology switching in the integrated resilience response. The total cost of ER, IRR, and IRR-TS under different .K b ’s and the corresponding operating cost in preventive state are given in Table 3.2. The total cost could be greatly reduced if we utilize IRR or IRR-TS rather than ER. For example, the total cost of ER is $206,520 when .K b = 1, but the total cost of IRR is $59,945 and the total cost of IRR-TS is $16,463. The operating cost of IRR or IRR-TS does not increase a lot compared to ER. In fact, the operating cost could even decrease if IRR-TS is applied. For example, when .K b = 2, the operating cost is reduced from $17,520 to $16,163 by utilizing IRR-TS rather than ER. This confirms the costeffectiveness of the integrated resilience response. While IRR and IRR-TS in some cases have the same load shed, IRR-TS could have lower operating cost because the topology switching provides more flexibility to the system and the system could therefore be operated with lower cost. The operating cost of ER keeps the same

3.5 Results on PJM Five-Bus System

61

Table 3.3 Load shed of the PJM five-bus system under different .Pgc,max ’s

Table 3.4 Computational time for IRR and IRR-TS of the PJM five-bus system under different .K b ’s

c,max

.Pg

20% 40% 60% 80% 100% 120% 140% 160% 180%

.K

1 2 3 4 5 6 7

b

Solving IRR (s) KKT SDT 0.576 0.514 1.281 1.016 0.691 0.516 1.172 1.034 0.892 0.827 0.451 0.421 0.452 0.420

Load shed (MW) ER IRR IRR-TS 669 519 309 662 512 302 654 504 300 647 497 300 639 489 300 632 482 300 624 474 300 617 467 300 609 459 300 Solving IRR-TS (s) KKT SDT 2.470 1.119 1.837 1.405 1.936 1.340 0.936 0.721 1.588 1.362 0.957 0.546 0.482 0.407

because the generator re-dispatch in the emergency state will not influence the operating cost in the preventive state. The influence of emergency state ramp-up capacity (i.e., .Pgc,max ) on power grid resilience is investigated. The base values of the capacities are kept the same as that in Appendix 1, and Table 3.3 gives the results of the system with ER, IRR, and IRRTS when the emergency state ramp-up capacity varies from 20% to 180%. Here, b .K is assumed to be 3. From Table 3.3, it can be seen that larger ramp-up capacities can cause less load to be shed in order to maintain power balance. When IRR-TS is applied, load shed is no longer reduced when .Pgc,max is larger than 60% of the base value. This is because the load at Bus 2 is 300 MW, and when .K b = 3 this load cannot be served as long as Lines 1–2 and 2–3 are damaged. In other words, IRR-TS cannot alleviate this situation with larger .Pgc,max here because the load shed is not caused by the output limit of generators. Table 3.4 lists the computational time for solving the IRR and IRR-TS problems with the KKT-based solution methodology and the SDT-based solution methodology. The SDT-based methodology is always faster. But a big difference cannot be observed here because the PJM five-bus system is small.

62

3 Integrated Preventive and Emergency Responses

3.6 Results on IEEE One-Area RTS-96 System The IEEE one-area RTS-96 system is the first version of the IEEE Reliability Test System, which was developed to be a reference system to test and compare results from different power grid operation strategies. An updated version of the system can be found in [33]. We apply a slight modification to the transmission line data, which can be found in Appendix 2. The modified system has 12 generators, 36 transmission lines (including 34 ISLs and 2 OSLs in the preventive state), and 17 loads. When .K b varies from 1 to 10, Table 3.5 lists the load shed of the system with PR, ER, IRR, and IRR-TS. It is seen in Table 3.5 that independent preventive response cannot keep power balance even when .K b = 1. This emphasizes the necessity of applying emergency response against natural disasters. Also, although independent emergency response could keep power balance by shedding load, a large portion of the load shed by the emergency response could be saved if we apply integrated resilience response, and the load shed could be further reduced if topology switching is utilized in the integrated resilience response. For example, when .K b = 2, ER has to shed 1139 MW to keep power balance, while IRR and IRR-TS only need to shed 356 MW and 289 MW, respectively. The total cost and operating cost of ER, IRR, and IRR-TS follow the same trend as that in Table 3.2 and are thus not presented. For the same reason, the load shed under different .Pgc,max ’s is also not presented. The computational time to solve IRR and IRR-TS under different .K b ’s is provided in Table 3.6. The KKT-based solution methodology fails to solve any of the problems within the maximum time limit, while the SDT-based methodology could efficiently solve all cases for the IRR problem and IRR-TS problem. In addition, the computational time will approximately increase with the growth of b .K , and although the topology switching in the integrated resilience response further enhances the power grid resilience, it also increases the computational time. Table 3.5 Load shed of IEEE one-area RTS-96 system under different .K b ’s

.K

b

1 2 3 4 5 6 7 8 9 10

Load shed (MW) PR ER IRR ✕ 739 64 ✕ 1139 356 ✕ 1242 590 ✕ 1522 766 ✕ 1772 1016 ✕ 1772 1250 ✕ 1834 1303 ✕ 1834 1312 ✕ 1834 1366 ✕ 1850 1366

IRR-TS 0 289 356 590 766 1016 1250 1303 1303 1312

3.7 Results on IEEE Three-Area RTS-96 System Table 3.6 Computational time for IRR and IRR-TS of the IEEE one-area RTS-96 system under different .K b ’s

63

.K

b

1 2 3 4 5 6 7 8 9 10

Solving IRR (s) KKT SDT NA 1.12 NA 1.78 NA 4.30 NA 9.34 NA 8.82 NA 8.18 NA 19.87 NA 19.12 NA 22.88 NA 40.56

Solving IRR-TS (s) KKT SDT NA 3.07 NA 2.95 NA 8.51 NA 29.65 NA 35.97 NA 30.94 NA 32.72 NA 44.59 NA 62.47 NA 56.01

However, both the IRR problem and IRR-TS problem can be efficiently solved by the methodology in Sect. 3.4, and the maximum time needed is 62.47 s.

3.7 Results on IEEE Three-Area RTS-96 System To test the computational efficiency of the solution methodology in Sect. 3.4 on larger systems, we perform a case study on the IEEE three-area RTS-96 system. Each area has the same parameters as those in Sect. 3.6, where we assume there are two OSLs in the preventive state in Area-A. The data for the interconnections between areas can be found in [34]. The modified system has 36 generators, 110 transmission lines, and 51 loads. Table 3.7 lists the load shed of the system with PR, ER, IRR, and IRR-TS when b .K varies from 1 to 10. Note that “NA” indicates that we fail to get the optimal solution within the maximum time limit. The observations made in previous case studies can also be seen in this system. Compared with the independent preventive response and independent emergency response, the integrated resilience response can greatly enhance power grid resilience. With topology switching in the integrated resilience response, the load shed could be further reduced. For example, when .K b = 3, we need to shed 2006 MW load to keep power balance with ER, but we only need to shed 308 MW if IRR is applied and 289 MW if IRR-TS is applied. The resilience benefit of topology switching is smaller than that in the IEEE onearea RTS-96 system, because we have the same number of OSLs in preventive state but the system is bigger now. This indicates that the benefits of topology switching could be limited when applied to large systems, as the number of OSLs in preventive state will not increase with the growth of system size due to practical considerations such as financial constraints.

64

3 Integrated Preventive and Emergency Responses

Table 3.7 Load shed of IEEE three-area RTS-96 system under different .K b ’s

.K

b

1 2 3 4 5 6 7 8 9 10 Table 3.8 Computational time for IRR and IRR-TS of the IEEE three-area RTS-96 system under different .K b ’s

.K

b

1 2 3 4 5 6 7 8 9 10

Load shed (MW) PR ER IRR ✕ 1648 0 ✕ 1941 289 ✕ 2006 308 ✕ 2275 603 ✕ 2376 866 ✕ 2437 1042 ✕ 2470 1303 ✕ 2499 1303 ✕ 2532 1366 ✕ 2532 1366

Solving IRR (s) KKT SDT NA 3.87 NA 5.70 NA 20.51 NA 22.93 NA 27.44 NA 25.32 NA 17.14 NA 33.10 NA 41.53 NA 55.77

IRR-TS 0 289 289 NA NA NA NA 1303 1303 1303

Solving IRR-TS (s) KKT SDT NA 32.01 NA 44.30 NA 2657.27 NA NA NA NA NA NA NA NA NA 131.66 NA 466.17 NA 1849.56

While we can get all optimal solutions for the IRR problem, optimal solutions cannot be obtained for the IRR-TS problem in four of the ten cases. This indicates that topology switching introduces too much computational burden to the RoMIO model. To better analyze this problem, we provide the computational time to solve the IRR and IRR-TS in Table 3.8. From Table 3.8, the computational efficiency of the SDT-based methodology over the KKT-based methodology can again be proved. In addition, Although the SDT-based solution methodology can quite efficiently solve the IRR problem (with no more than 1 min), it needs much longer time to solve the IRR-TS problem. In fact, the SDT-based solution methodology cannot always derive the optimal solution for the IRR-TS problem within the maximum time limit. This indicates that with topology switching considered, the integrated resilience response problem is much more difficult to solve.

Appendix 2: IEEE One-Area RTS-96 System

65

Appendix 1: PJM Five-Bus System The data of the PJM five-bus system is listed in Tables 3.9, 3.10, and 3.11. Among the transmission lines, Line 4–5B is the OSL in the preventive state. Table 3.9 Generator data of the PJM five-bus system

0

Bus

.Pg

(MW)

1 3 4 5

210 323.49 0 466.51

max

.Pg

.Pg

210 520 200 600

Bus 2 3 4

Line 1–2 1–4 1–5 2–3

(MW)

60 100 50 150

Table 3.10 Load data of the PJM five-bus system

Table 3.11 Transmission line data of the PJM five-bus system

a,max

(MW)

.Pij

(p.u.) 0.0281 0.0304 0.0064 0.0108

(MW) 400 300 300 300

($/MW)

15 30 40 10

(MW) 300 300 400

.Pd

max

.Xij

.cg

Line 3–4 4–5A 4–5B

($/MW) 1000 1000 1000

.cd

max

.Xij

.Pij

(p.u.) 0.0297 0.0297 0.0297

(MW) 300 240 240

Appendix 2: IEEE One-Area RTS-96 System The transmission line data of the IEEE one-area RTS-96 system is given in Table 3.12. Line 14–16B and Line 15–21B are the OSLs in the preventive state.

66 Table 3.12 Transmission line data of the IEEE one-area RTS-96 system

3 Integrated Preventive and Emergency Responses

Line 1–2 1–3 1–5 2–4 2–6 3–9 3–24 4–9 5–10 6–10 7–8 8–9 8–10 9–11 9–12 10–11 10–12 11–13

max

.Xij

.Pij

(p.u.) 0.0146 0.2253 0.0907 0.1356 0.205 0.1271 0.084 0.111 0.094 0.0642 0.0652 0.1762 0.1762 0.084 0.084 0.084 0.084 0.0488

(MW) 175 175 350 175 175 175 400 175 350 175 350 175 175 400 400 400 400 500

Line 11–14 12–13 12–23 13–23 14–16A 14–16B 15–16 15–21A 15–21B 15–24 16–17 16–19 17–18 17–22 18–21 19–20 20–23 21–22

max

.Xij

.Pij

(p.u.) 0.0426 0.0488 0.0985 0.0884 0.0594 0.0594 0.0172 0.0249 0.0249 0.0529 0.0263 0.0234 0.0143 0.1069 0.0132 0.0203 0.0112 0.0692

(MW) 500 500 500 250 250 500 500 400 600 500 500 500 500 500 1000 1000 1000 500

References 1. A. Kenward, U. Raja, Blackout: extreme weather, climate change and power outages. http:// www.ourenergypolicy.org/wp-content/uploads/2014/04/climate-central.pdf. Tech. Rep. (2014) 2. Executive Office of the President, Economic Benefits of Increasing Electric Grid Resilience to Weather Outages. http://energy.gov/sites/prod/files/2013/08/f2/Grid. Tech. Rep. (2013) 3. National Research Council, The Resilience of the Electric Power Delivery System in Response to Terrorism and Natural Disasters: Summary of a Workshop (The National Academies Press, Washington, DC, 2013) 4. M. Panteli, P. Mancarella, Influence of extreme weather and climate change on the resilience of power systems: Impacts and possible mitigation strategies. Electric Power Syst. Res. 127, 259–270 (2015) 5. M. Panteli, P. Mancarella, The grid: stronger, bigger, smarter? Presenting a conceptual framework of power system resilience. IEEE Power Energy Mag. 13(3), 58–66 (2015) 6. Y. Wang, C. Chen, J. Wang, R. Baldick, Research on resilience of power systems under natural disasters—a review. IEEE Trans. Power Syst. 31(2), 1604–1613 (2016) 7. L. Fisher, Disaster responses: more than 70 ways to show resilience. Nature 518(7537), 35–35 (2015) 8. T. Liacco, The adaptive reliability control system. IEEE Trans. Power App. Syst. 5(PAS-86), 517–531 (1967) 9. Y. Wen, W. Li, G. Huang, X. Liu, Frequency dynamics constrained unit commitment with battery energy storage. IEEE Trans. Power Syst. PP(99), 1–11 (2016) 10. N. Fan, D. Izraelevitz, F. Pan, P. Pardalos, J. Wang, A mixed integer programming approach for optimal power grid intentional islanding. Energy Syst. 3(1), 77–93 (2012) 11. M. Golari, N. Fan, J. Wang, Two-stage stochastic optimal islanding operations under severe multiple contingencies in power grids. Electric Power Syst. Res. 114, 68–77 (2014) 12. M. Golari, N. Fan, J. Wang, Large-scale stochastic power grid islanding operations by line switching and controlled load shedding. Energy Syst. 8, 601–621 (2017)

References

67

13. P. Trodden, W. Bukhsh, A. Grothey, K. McKinnon, MILP formulation for controlled islanding of power networks. Int. J. Electr. Power Energy Syst. 45(1), 501–508 (2013) 14. P. Trodden, W. Bukhsh, A. Grothey, K. McKinnon, Optimization-based islanding of power networks using piecewise linear ac power flow. IEEE Trans. Power Syst. 29(3), 1212–1220 (2014) 15. M. Panteli, D. Trakas, P. Mancarella, N. Hatziargyriou, Boosting the power grid resilience to extreme weather events using defensive islanding. IEEE Trans. Smart Grid PP(99), 1–10 (2016) 16. G. Huang, J. Wang, C. Chen, J. Qi, C. Guo, Integration of preventive and emergency responses for power grid resilience enhancement. IEEE Trans. Power Syst. 32(6), 4451–4463 (2017) 17. E. Fisher, R. O’Neill, M. Ferris, Optimal transmission switching. IEEE Trans. Power Syst. 23(3), 1346–1355 (2008) 18. K. Hedman, R. O’Neill, E. Fisher, S. Oren, Optimal transmission switching with contingency analysis. IEEE Trans. Power Syst. 24(3), 1577–1586 (2009) 19. A. Delgadillo, J. Arroyo, N. Alguacil, Analysis of electric grid interdiction with line switching. IEEE Trans. Power Syst. 25(2), 633–641 (2010) 20. L. Zhao, B. Zeng, Vulnerability analysis of power grids with line switching. IEEE Trans. Power Syst. 28(3), 2727–2736 (2013) 21. F. Qiu, J. Wang, Chance-constrained transmission switching with guaranteed wind power utilization. IEEE Trans. Power Syst. 30(3), 1270–1278 (2015) 22. M. Jabarnejad, J. Wang, J. Valenzuela, A decomposition approach for solving seasonal transmission switching. IEEE Trans. Power Syst. 30(3), 1203–1211 (2015) 23. A. Wagaman, PPL Nearing Completion of Backup Transmission Line in Emmaus, Upper Milford (2016). http://www.mcall.com/news/local/eastpenn/mc-emmaus-new-ppl-transmissionline-20160119-story.html 24. NYISO Energy Market Operations, Outage Scheduling Manual. http://www.nyiso.com/public/ webdocs/markets_operations/documents/Manuals_and_Guides/Manuals/Operations/outage_ sched_mnl.pdf. Tech. Rep. (2015) 25. PJM System Operations Division, PJM Manual 13: Emergency Operations. http://www.pjm. com/~/media/documents/manuals/m13.ashx, Tech. Rep. (2016) 26. P. Ruiz, Reserve Valuation in Electric Power Systems, PhD. Dissertation, University of Illinois Urbana-Champaign, Ann Arbor, MI, 2008 27. W. Yuan, J. Wang, F. Qiu, C. Chen, C. Kang, B. Zeng, Robust optimization-based resilient distribution network planning against natural disasters. IEEE Trans. Smart Grid PP(99), 1–10 (2016) 28. G. Tonn, S. Guikema, C. Ferreira, S. Quiring, Hurricane Isaac: a longitudinal analysis of storm characteristics and power outage risk. Risk Anal. 36(10), 1936–1947 (2016) 29. J. Simonoff, C. Restrepo, R. Zimmerman, Risk-management and risk-analysis-based decision tools for attacks on electric power. Risk Anal. 27(3), 547–570 (2007) 30. L. Zhao, B. Zeng, An Exact Algorithm for Two-stage Robust Optimization With Mixed Integer Recourse Problems. http://http://www.optimization-online.org/DB_FILE/2012/01/3310. pdf. Tech. Rep. (2012) 31. Gurobi Optimization, Inc., Gurobi Optimizer Reference Manual (2016). http://www.gurobi. com 32. F. Li, R. Bo, Small test systems for power system economic studies, in Proc. 2010 IEEE PES General Meeting (2010), pp. 1–4 33. C. Ordoudisa, P. Pinsona, J. Moralesb, M. Zugnob, An updated version of the IEEE RTS 24-bus system for electricity market and power system operation studies. http://orbit.dtu.dk/ files/120568114/An_Updated_Version_of_the_IEEE_RTS_24Bus_System_for_Electricty_ Market_an....pdf. Tech. Rep. (2016) 34. C. Grigg, P. Wong, P. Albrecht, R. Allan, M. Bhavaraju, R. Billinton, Q. Chen, C. Fong, S. Haddad, S. Kuruganty, W. Li, R. Mukerji, D. Patton, N. Rau, D. Reppen, A. Schneider, M. Shahidehpour, C. Singh, The IEEE reliability test system-1996. A report prepared by the reliability test system task force of the application of probability methods subcommittee. IEEE Trans. Power Syst. 14(3), 1010–1020 (1999)

Part II

Cybersecurity of Smart Grid Monitoring

Chapter 4

Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation

4.1 Introduction State estimation is a crucial application in the energy management system (EMS). The well-known static state estimation (SSE) [1–6] assumes that the power system is operating in quasi-steady state, based on which the static states—the voltage magnitude and phase angles of the buses—are estimated by using supervisory control and data acquisition (SCADA). SSE is critical for power system monitoring and it provides inputs for other EMS applications such as automatic generation control and optimal power flow. However, SSE may not be sufficient for desirable situational awareness as the system states evolve more rapidly due to an increasing penetration of renewable generation and distributed energy resources. Therefore, dynamic state estimation (DSE) [7, 8] processes estimating the dynamic states (i.e., the internal states of generators) by using highly synchronized phasor measurement unit (PMU) measurements with high sampling rates will be critical for the wide-area monitoring, protection, and control of power systems. DSE requires a reliable dynamic model of the power system, which can be based on post-validation of the dynamic model and calibration the parameters of generators, as in [9–11]; however, there is still a gap between the model and actual power system physics. Assuming that the dynamical models are perfectly accurate can generate sub-optimal estimation laws. Detecting and isolating cyber attacks (CAs) in cyber-physical systems generally, and smart grids specifically, has received immense attention. Liu et al. present a new class of attacks, called false data injection attacks, targeted against SSE in power networks [12], and show that an attacker can launch successful attacks to alter state estimate. In [13, 14], the authors propose a generic framework for attack detection,

© Springer Nature Switzerland AG 2023 J. Qi, Smart Grid Resilience, https://doi.org/10.1007/978-3-031-29290-3_4

71

72

4 Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation

metrics on controllability and observability, and centralized & distributed attack detection monitors, for a linear time-invariant representation of power systems. The reader is referred to [15] for a survey on different types of CAs and attack detection and identification methods that are mainly based on control-theoretic foundations and to [16] for a survey on cybersecurity in smart grids. In [17], a probabilistic risk mitigation model is presented for CAs against PMU networks, in which a mixed integer linear programming (MILP) is formulated that incorporates the derived threat levels of PMUs into a risk mitigation technique. In this MILP, the binary variables determine whether a certain PMU shall be kept connected to the PMU network or removed, while minimizing the maximum threat level for all connected PMUs [17]. However, the estimation problem with PMUs is not considered—there is no connection between real-time states of the power system and the threat levels. In this chapter, a risk mitigation strategy [18] is presented to enhance the cybersecurity of power system monitoring based on dynamic state estimation. The goal is to eliminate threat from unknown inputs (UIs) and potential CAs and enhance the resilience of power system monitoring. A sliding-mode observer is utilized to estimate the dynamic states and the unknown inputs. Then the estimates of CAs are obtained through an attack detection algorithm. The estimation and detection components are seamlessly utilized in an optimization framework to determine the most impacted PMU measurements. Finally, a risk mitigation strategy is presented that can guarantee the elimination of threats from attacks, ensuring the observability of the power system through available, safe measurements.

4.2 Power System Dynamic Model The notations used in this section are listed in Table 4.1.

4.2.1 10th-Order Nonlinear Power System Model Fast sub-transient dynamics and saturation effects are ignored and each of the .ng generators is described by the two-axis transient model with an IEEE Type DC1 excitation system and a simplified turbine-governor system [19]:

4.2 Power System Dynamic Model Table 4.1 Summary of notations .δ .ω, ω0 , ωf .ωe

0

.Efd , Efd .Et

0

.ET .eq , ed





.eq , ed .eR , eI .exc

1,2,3

.GP

H .It .iq , id .iR , iI .KA .KD .KE .KF .Pe

0

.Pm .Rf .SB , SN .tg 1 , tg 2 , tg 3 .TA , TE , TF .Tm , Te .Y i .T

max 



.Tq0 , Td0 .Ts , Tc .T3 , T4 , T5 .VA , VR .VFB .VTR .xq , xd





.xq , xd .1/r .sgn(·)

Rotor angle in rad Rotor speed, rated rotor speed, and rotor speed set point in rad/s Rotor speed deviation in per unit Internal field voltage and its initial value in per unit Terminal voltage phasor Initial machine terminal voltage Terminal voltage at q axis and d axis in per unit Transient voltage at q axis and d axis in per unit Real and imaginary part of the terminal voltage phasor Internally set exciter constants Set of generators where PMUs are installed Generator inertia constant in second Terminal current phasor Current at q and d axes in per unit Real and imaginary part of the terminal current phasor in per unit Voltage regulator gain Damping factor in per unit Exciter constant Stabilizer gain Electric power in per unit Initial mechanical input power Stabilizing transformer state variable System base and generator base MVA Governor, servo, and reheater variables Voltage regulator, exciter, and stabilizer time constants Mechanical torque and electric air-gap torque in per unit The ith row of the admittance matrix of the reduced network .Y Maximum power order Open-circuit time constants for q and d axes in seconds Servo and HP turbine time constants Transient gain time constant, time constant to set HP ratio, and reheater time constant Regulator output voltage in per unit Feedback from stabilizing transformer Voltage transducer output in per unit Synchronous reactance at q and d axes in per unit Transient reactance at q and d axes in per unit Steady state gain Signum function

73

74

4 Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation

⎧ δ˙i = ωi − ω ⎪  0 ⎪ ⎪ ⎪ KDi ω0 ⎪ ⎪ T m i − Te i − ω˙ i = (ωi − ω0 ) ⎪ ⎪ ω0 2Hi ⎪ ⎪  ⎪ 1 ⎪ ⎪ ⎪ e˙q i =  Efdi − eq i − (xdi − xd i )idi ⎪ ⎪ Td0i ⎪ ⎪ ⎪  ⎪ ⎪  = 1  + (x − x  )i ⎪ e ˙ −e ⎪ q q qi i i di di  ⎪ ⎪ Tq0 ⎪ i ⎪ ⎪ ⎪ 1 ⎪ ˙ ⎪ ⎪ ⎨ VRi = TA (−VRi + KAi VAi ) i . 1 ˙ ⎪ = (VRi − KEi Efdi − SEi ) E fdi ⎪ ⎪ TEi ⎪ ⎪ ⎪ 1 ⎪ ⎪ ⎪ (−Rfi + Ef di ) R˙ = ⎪ ⎪ fi T ⎪ Fi ⎪ ⎪ 1 ⎪ ⎪ ˙ = ⎪ (Di − tg1i ) tg ⎪ ⎪ 1i T si  ⎪   ⎪ ⎪ T3i 1 ⎪ ⎪ ˙ ⎪ tg 1 − − tg = tg ⎪ 1i 2i ⎪ 2i Tc i Tci  ⎪    ⎪ ⎪ ⎪ T4 T3 i 1 ⎪ ⎩ tg ˙ 3i = 1 − i − tg 3i , tg 1i + tg 2i Tc i T5 i T5 i

(4.1)

where i is the generator index. For generator i, the terminal voltage phasor .Eti = eRi + j eIi and the terminal current phasor .Iti = iRi + j iIi can be measured and used as outputs from actual PMU measurements. .Tmi , .Tei , .idi , .iqi .VAi , .SEi , and .Di in (4.1) can be written as functions of the states: Tm i =

.

T4 i T5 i



T3 i tg + tg 2i Tc i 1 i

 + tg 3i .

(4.2a)

ΨRi = ed i sin δi + eq i cos δi.

(4.2b)

ΨIi = eq i sin δi − ed i cos δi.

(4.2c)

Iti = Y i (Ψ R + j Ψ I ).

(4.2d)

iRi = Re(Iti ).

(4.2e)

iIi = Im(Iti ).

(4.2f)

iq i =

SB (iI sin δi + iRi cos δi ). SNi i

(4.2g)

id i =

SB (iR sin δi − iIi cos δi ). SNi i

(4.2h)

eqi = eq i − xd i idi . edi =

ed i

+ xq i iqi .

(4.2i) (4.2j)

4.2 Power System Dynamic Model

75

Pei = eqi iqi + edi idi .

(4.2k)

SB Pe . SNi i

(4.2l)

Te i =

KF i (Efdi − Rfi ). TFi

= eqi 2 + edi 2.

VFBi =

(4.2m)

VTRi

(4.2n)

VAi = −VFBi − VTRi + exci3. SEi = exci1 e ωei =

exci2 |Efdi |

sgn(Efdi ).

1 (ωf − ωi ). ω0 i

di = Pm0 i +

(4.2p) (4.2q)

1 ωe . ri i

⎧ ⎨ 0, Di = di , ⎩ max Ti ,

(4.2o)

(4.2r)

di ≤ 0 0 < di ≤ Timax di > Timax .

(4.2s)

The state vector .x and output vector .y are  x = δ  ω eq  ed  VR  Efd  Rf  tg 1  tg 2  tg 3 

.

y = eR 

eI 

iR 

iI 



,

and the power system dynamics can be written as:  .

˙ x(t) = f (x) y(t) = h(x).

(4.3)

In (4.2) the outputs .iRi and .iIi are written as functions of .x. Similarly, the outputs eRi and .eIi can also be written as functions of .x:

.

eRi = edi sin δi + eqi cos δi.

(4.4a)

eIi = eqi sin δi − edi cos δi .

(4.4b)

.

For the above power system model, the exciter and governor control system variables are treated as state variables and thus there are no control inputs in the system model.

76

4 Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation

4.2.2 Linearized Power System Model For a large-scale power system, the nonlinear model can be difficult to analyze, necessitating a simpler, linear time-invariant (LTI) representation of the system [20]. The power system dynamics can be linearized by considering a small perturbation over an existing equilibrium point. The following assumption is needed to construct the small-signal, linearized model of the nonlinear power system. Assumption 1 For the nonlinear dynamical system in (4.1) there exists an equilibrium point denoted as: ∗ x ∗ = δ  ω eq  ed  VR  Efd  Rf  tg 1  tg 2  tg 3  . The above assumption is typical in transient analysis studies for power systems [21]. Denote by .x˜ ∈ R10 ng the deviations of the state from the equilibrium point and ˜ q ∈ R4q the deviations of the outputs from the outputs at the equilibrium point, .y where q is the number of PMUs with four measurements each. The small-signal dynamics can be written as:  .

˙˜ ˜ x(t) = A x(t) ˜ y˜ q (t) = C q x(t),

(4.5)

where the system matrix .A ∈ R10ng ×10ng is defined by the parameters of the generators, loads, transmission lines, and the topology of the power network, and 4q×10 ng depends on the specific PMU placement. In what follows, we use .C q ∈ R the notations .x and .y q instead of .x˜ and .y˜ q for simplicity.

4.3 Unknown Inputs & Attack-Threat Model Although the modeling of the power system dynamics has been the subject of extensive research studies, a gap still exists between our mathematical understanding of the power system physics and the actual dynamic processes. Therefore, assuming that the developed dynamical models are perfectly accurate can generate suboptimal control or estimation laws. Consequently, various control and estimation theory studies have investigated methods that address the aforementioned discrepancy between the models and the actual physics—for power systems and other dynamical systems. Here, we discuss how these discrepancies can be systematically incorporated into the power system dynamics and present physical interpretations of UIs and potential CAs—exemplifying these discrepancies. We consider UIs, denoted by .w(t), and CAs, denoted by .v q (t), to be unknown quantities that affect the process dynamics and PMU output measurements, respectively.

4.3 Unknown Inputs & Attack-Threat Model

77

4.3.1 Modeling Unknown Inputs ˙ = The nominal system dynamics for a controlled power system can be given by .x(t) f (x, u) = Ax(t) + B u u(t). For the 10th-order power system model, the controls, .u(t), are incorporated with the power system dynamics and states. In that case, .B u and .u(t) are both zeroes, unless there are other power system controls to be considered. Consider the nominal system dynamics to be a function of .w(t), or ˙ = f˜ (x, u, w). For power systems, the UIs affecting the system dynamics .x(t) can include .ud (representing the unknown plant disturbances), .uu (denoting the unknown control inputs), and .ua (depicting potential actuator faults). For simplicity, we can combine .ud , uu , ua into one UI quantity, .w(t), defined as 

   .w(t) = u (t) u (t) u (t) ∈ Rnw , where .nw is the number of UIs, and then u a d write the process dynamics under UIs as ˙ x(t) = f˜ (x, u, w) = Ax(t) + B u u(t) + B w w(t),

.

(4.6)

where .B w is a known weight distribution matrix that defines the distribution of UIs with respect to each state equation .x˙i . For the dynamical system in (4.1), matrix .B w ∈ R10ng ×nw . The term .B w w(t) models a general class of UIs such as uncertainties related to variable loads, nonlinearities, modeling uncertainties and unknown parameters, noise, parameter variations, unmeasurable system inputs, model reduction errors, and actuator faults [22, 23]. For example, the equation .x˙1 = δ˙1 = x2 − ω0 = ω1 − ω0 most likely has no UIs, as there is no modeling uncertainty related to that process. Also, actuator faults on that equation are not inconceivable. Hence, the first row of .B w can be identically zero. If one of the parameters in (4.1) are unknown, this unknown parameter can be augmented to .w(t). Furthermore, the unknown inputs we are considering are influencing all the buses in the power system. Precisely,  for any state-evolution w .xi (t), we have: .x ˙i (t) = Ai x(t) + B wi w(t) = Ai x(t) + nj =1 Bwij wj (t), .∀i = 1, 2, . . . , n, where .Ai is the ith row of the .A matrix and .B i is the ith row of the .B w matrix, which entails that each bus can be potentially influenced by a combination of UIs. Hence, even if we have so many variations, these variations are causing disturbances to all the states. Remark 1 Note that for a large-scale system it can be a daunting task to determine B w . Hence, state estimators should ideally consider worst case scenarios with UIs, process noise, and measurement noise. As a result, assuming a random .B w matrix and then designing an estimator based on that would consequently lead to a more robust estimator/observer design.

.

78

4 Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation

4.3.2 Modeling Cyber Attacks Classical SCADA systems have become insufficient to guarantee real-time protection of power systems’ assets. Consequently, the research and development of wide-area measurement systems (WAMS) have significantly increased. By utilizing PMUs, WAMS technologies enable near real-time monitoring of the system, hence empowering a more accurate depiction of the power-grid’s cyber-physical status— and improved grid control. Recently, the National Electric Sector Cybersecurity Organization Resource (NESCOR) investigated many cybersecurity failure scenarios, which are defined as “realistic event in which the failure to maintain confidentiality, integrity, and/or availability of sector cyber assets creates a negative impact on the generation, transmission, and/or delivery of power” [24]. Among these failure scenarios the following two wide-area monitoring, protection, and control (WAMPAC) scenarios motivate the research in this chapter: • WAMPAC.4: Measurement Data Compromised due to PDC1 Authentication Compromise. • WAMPAC.6: Communications Compromised between PMUs and Control Center. PMU measurements can be attacked by compromising the signals sent to the control center. The two aforementioned scenarios are related in the sense that compromising the communication between PMUs, PDCs, and control center can include alteration of PMU data. Relevant to the physical meaning of CAs (or attack vectors), we define .v q (t) ∈ R4q as a CA that is a function of time, used to depict the aforementioned WAMPAC failure scenarios. Note that many entries in this vector are zero as an attacker might not have the ability to attack all measurements simultaneously. Under a wide class of attacks, the output measurement equation can be written as: .

y q (t) = C q x(t) + v q (t).

(4.7)

While we define .v q (t) to be a cyber attack or attack vector, this definition encapsulate different types of cyber attacks, such as data integrity attacks [25, 26], denial of service (DoS) attacks [27], or replay attacks [28]. Furthermore, another physical meaning for .v q (t) is bad data. Bad data occurs when (1) a redundant measurement is erroneous, which can be detected by statistical tests based on measurement residuals, (2) observations may be corrupted with abnormally large measurement errors, (3) large unexpected meter and communication errors, or (4) malfunctioning sensors; see [29] for bad data definitions. Hence, bad data can be different from CAs—CAs attempt to adversely influence the estimates. However, 1 A single PMU transmits measurements to a phasor data concentrator (PDC), and then to a super PDC, through a wireless communication network based on the NASPInet architecture [16].

4.4 DSE under UIs and CAs

79

both (bad data attacks and CAs) can lead to negative consequences and can share mathematically equivalent meaning with varying threat levels. An attacker wants to cause significant changes to the transmitted PMU data. Since this data is bound to be used for real-time control in smart grids (see U.S. Department of Energy (DOE) and NASPI mandates [24, 30, 31]), a change in these estimates/measurements can cause significant alterations to the corresponding feedback control signals. In fact, the executive summary from a recently published U.S. DOE report highlights the inevitable usability of PMU measurements.2

4.4 DSE under UIs and CAs With the integration of PMUs, an observer or a DSE method can be utilized to estimate the internal state of the generators. Observers can be viewed as computer programs running online simulations—they can be easily programmed and integrated into control centers. Observers differ from KF-based estimators in the sense that no assumptions are made on the distribution of measurement and process noise, i.e., statistical information related to noise distribution is not available.

4.4.1 Sliding-Mode Observer for Power Systems A variable structure control or sliding model control is a nonlinear control method whose structure depends on the current state of the system. Similar to sliding-mode controllers, sliding-mode observers (SMOs) are nonlinear observers that possess the ability to drive the estimation error, the difference between the actual and estimated states, to zero or to a bounded neighborhood in finite time. Similar to some Kalman filter-based methods, SMOs have high resilience to measurement noise. In [32], approaches for effective sliding-mode control in electro-mechanical systems are discussed. Here we present a succinct representation of the SMO architecture in [33]. For simplicity, we use .x as the state vector of the linearized power system, ˜ The linearized power rather than .x˜ and .y as the outputs from PMUs, rather than .y. system dynamics under UIs and CAs can be written as:  .

˙ x(t) = Ax(t) + B w w(t) y q (t) = C q x(t) + v q (t),

(4.8)

2 From a DOE report: The Western Electricity Coordinating Council has determined that it can increase the energy flow along the California-Oregon Intertie by 100 MW or more using synchrophasor data for real-time control—reducing energy costs by an estimated $35 million to $75 million over 40 years without any new high-voltage capital investments [30].

80

4 Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation

where for the system described in (4.3) there are .10 ng states, .nw unknown plant inputs, and .4 q measurements. Assumption 2 The above dynamical system is said to be observable if the observ  10ng −1   has full rank. ability matrix .O, defined as: .O = C  ) q (C q A) · · · (C q A The full-rank condition on the system implies that a matrix .Lq ∈ R10ng ×4q can be found such that matrix .(A−Lq C q ) is asymptotically stable with eigenvalues having strictly negative real parts. While this assumption might be very restrictive, it is not a necessary condition for the estimator we discuss next. This assumption is relaxed to the detectability of the pair .(A, C q ). The power system is detectable if all the unstable modes are observable—verified via the PBH test:   λi I − A = 10ng , ∀ λi > 0, . rank Cq where .λi belongs to the set of eigenvalues of .A. Also, the observer rank-matching condition is satisfied, that is: .rank(C q B w ) = rank(B w ) = ζ. The objective of an observer design is to drive the estimation error to zero within a reasonable amount of time. Accurate state estimates can be utilized to design local or global state feedback control laws, steering the system response towards ˆ ˆ a desirable behavior. Let .x(t) and .e(t) = x(t) − x(t) denote the estimated states and the estimation error.

4.4.2 SMO Dynamics & Design Algorithm The SMO for the linearized power system in (4.8) can be written as:  ˙ˆ ˆ + Lq (y q (t) − yˆ q (t)) − B w E(yˆ q , y q , η) x(t) = Ax(t) . ˆ yˆ q (t) = C q x(t),

(4.9)

where .y q is readily available signals for the observer, and .E(·) is defined as: ⎧ F q (yˆ q − y q ) ⎪ ⎨η , if F q (yˆ q − y q ) = 0 F q (yˆ q − y q )2 + ν .E(·) = ⎪ ⎩0, if F (yˆ − y ) = 0, q

q

q

where: • .η > 1 is the SMO gain and .ν is a smoothing parameter (small positive number). • .F q ∈ Rnw ×4q satisfies the following matrix equality F q Cq = B wP .

.

• .Lq ∈ R10ng ×4q is chosen to guarantee the asymptotic stability of .A − Lq C q .

4.4 DSE under UIs and CAs

81

Hence, for any positive definite symmetric matrix .Q, there is a unique symmetric positive definite matrix .P ∈ R10ng ×10ng such that .P satisfies the Lyapunov matrix equation (A − Lq C q ) P + P (A − Lq C q ) = −Q, P = P  0.

.

(4.10)

The nonlinear vector function, .E(·), guarantees that the estimation error is insensitive to the UI .w(t) and the estimation error converges asymptotically to zero. If for the chosen .Q, no matrix .F q satisfies the above equality, another matrix .Q can be chosen. Note that the SMO can deal with a wide range of unknown parameters and inputs (affecting states evolution), yet it cannot tolerate a severe CA against the PMU measurements. This limitation will be addressed through the dynamic risk mitigation algorithm that utilizes CAs estimation and a detection filter (Sects. 4.5 and 4.6). A design algorithm for the aforementioned SMO can be found in [33], which presents a systematic way of obtaining the gain matrices for reduced-order observers. Here, a simple solution to the observer design problem is presented. The equations in (4.10) are the main matrix equalities needed to solve for the observer matrices .F q , P , and .Lq —guaranteeing the asymptotic stability of the estimation error and the convergence of the state estimates to actual ones. However, these equalities are bi-linear matrix equalities, due to the presence of the .P Lq C q term in the Lyapunov matrix equation. Using the linear matrix inequality (LMI) trick by setting .Y = P Lq , the above system of linear matrix equations can be rewritten as:  A P + P A−C  q Y − Y C q = −Q

.

P = P

(4.11)

F q Cq = B wP . After obtaining .P , F q , Y , and computing3 .Lq = P −1 Y , the SMO can be implemented via a numerical simulation. The above system of equations can be easily solved via any semidefinite program solvers such as CVX [34, 35], YALMIP [36], or MATLAB’s LMI solver.

3 The computation of these matrices is performed offline, i.e., the observer is designed a priori. In Sect. 4.7, we present the number of free and linear variables, as well as the offline running time of the observer design problem for the considered power system.

82

4 Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation

4.5 Asymptotic Reconstruction of UIs & CAs Here, we present estimation methods for the vectors of UIs, .w(t), and potential attacks, .v q (t). This approach does not provide strict guarantees on the convergence of the estimates of these quantities, yet it is significant in the developed risk mitigation strategy. To guarantee the detection of CAs and compromised PMU measurements, we also discuss an attack detection algorithm with performance guarantees.

4.5.1 Estimating Unknown Inputs As discussed earlier, the designed SMO guarantees the asymptotic convergence of the state estimates to the actual ones. Substituting the differential equations governing the dynamics of the power system (4.8) and the SMO (4.9) into the estimation error dynamics, we obtain ˙ˆ . ˙ − x(t) e˙ (t) = x(t)   ˆ = A − Lq C q (x(t) − x(t)) + B w w(t)

.

−B w E(yˆ q , y q , η)   = A − Lq C q e(t) + B w w(t) − B w E(e, η).

(4.12)

(4.13)

ˆ is the asymptotic estimate of .x(t). This SMO is designed to guarantee that .x(t) Since it is assumed that .B w is a full-rank matrix, the following UI approximation holds: .

ˆ w(t) ≈ E(yˆ q (t), y q (t), η).

(4.14)

However, the above estimates, as reported in [37], requires further low-pass filtering which can be very heuristic. Here, an alternative to the UI estimation is presented assuming that the state estimates converge to the actual ones asymptotically. First, the discretized version of the power system dynamics is written as: ˜ x(k + 1) = Ax(k) + B˜ u u(k) + B˜ w w(k),

.

  ˜ = eAh , .B˜ u = h eAτ B u dτ , and .B˜ w = h eAτ B w dτ are the discrete where .A 0 0 version of the state-space matrices. Since the observer design guarantees the ˆ ˆ ˆ convergence of the state estimates, .x(t) or .x(k), and .x(k) is available for all k, then the vector of UI .w(k) can be approximated as follows. Substituting .x(k) by ˆ .x(k) in the discretized dynamics of the power system, we obtain: ˜ x(k) ˆ + 1) = A ˆ ˆ x(k + B˜ u u(k) + B˜ w w(k).

.

4.5 Asymptotic Reconstruction of UIs & CAs

83

Then, another estimate for the UI vector can be generated as: .

 †  ˜ x(k) ˆ + 1) − A ˆ ˆ x(k − B˜ u u(k) , w(k) = B˜ w

(4.15)

assuming that .B˜ w has full column rank and its left pseudo-inverse exists. Note that this estimation of the UI vector uses the generated estimates of one subsequent time ˆ + 1)) and the actual control (if the latter exists in the model). This period (.x(k assumption is not restricting as observers/estimators are computer programs that run in parallel with the plants or dynamic processes.

4.5.2 Estimating CAs Attacks against synchrophasor measurements can be modeled in various scenarios. One possible scenario is the injection of malicious signals that alter the values of the measurements in the data packets sent from PMUs to PDCs and control centers, in addition to PMUs malfunctions. As in (4.8), a real-time CA .v q (t) is included to alter the PMU measurements. An attack detection technique is applied based on the estimation of CAs. Assuming an identical SMO architecture as the one presented earlier, an estimate of the CA, .vˆ q (t), is derived in [37] and its dynamics takes the following form: ¯ ˆ vˆ q (t) = −(F q C q Lq )† (F q C q B w )(E(t) − w(t))

.

+ (F q C q Lq )† F q v˙ˆ q (t),

(4.16)

¯ ˆ where .w(t) is given in (4.14), .F q and .Lq are SMO design parameters, .E(t) is selected such that the system is in sliding mode along .F C q e(t) = 0. In [37], the authors assume that .v˙ˆ q (t) ≈ 0, which might not be a reasonable assumption in our application since an CA can be designed such that .v˙ q (t) = 0. Rearranging (4.16), we obtain .

ˆ q (t) + V −1 v˙ˆ q (t) = V −1 1 v 1 V 2 m(t),

where  ¯  (t) V 1 = (F q C q Lq )† F q ∈ R4q×4q , m(t) = w ˆ  (t) E 

V 2 = (F q C q Lq )† (F q C q B w ) −(F q C q Lq )† (F q C q B w )

.

∈ R4q×(nw +10ng ) .

(4.17)

84

4 Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation

Note that .V 1 is invertible. A more accurate estimate for the CA can further be obtained as  t −1 V −1 (t−t0 ) 1 ˆ q (t) = e .v vˆ q (t0 ) + eV 1 (t−ϕ) V −1 1 V 2 m(ϕ) dϕ. t0

4.5.3 Attack Detection Filter While the CA estimates generated from the methods discussed above can instantly identify compromised measurements for a few time instances after the detection, the attack can propagate and influence the estimation of other measurements. In the case of lower sampling rates or computational power, another attack detector can be used. In [14], a robust attack identification filter is proposed to detect the compromised nodes for longer time periods. This filter is adapted to the dynamical representation of the power system, which is also a dynamical system and takes the following form:  .

 ˙ l(t) = (A + AC  q C q )l(t) + AC q y q (t) r(t) = y q (t) − C q l(t),

(4.18)

where .l(t) ∈ R10ng is the state of the filter and .r(t) ∈ R4q is the residual vector that determines the compromised measurements. The initial state of the filter, .l(t0 ), is by definition equal to the initial state of the plant .x(t0 ). Since the initial conditions might not be available, the SMO discussed in Sect. 4.4.1 is utilized to generate ˆ 0 ). Hence, the SMO is necessary for the detection of the attack, i.e., .x(t0 ) ≈ x(t we assume that the SMO is utilized for an initial period of time when measurements are not compromised. After generating the converging estimates of the states and UIs, the filter (4.18) generates real-time residuals .r(t). These residuals are then compared with a threshold to determine the most infected/attacked nodes. The residuals here are analogous to the estimates of the CAs, .vˆ q (t). It is significantly crucial for the attack detection filter and the CA estimators to obtain online computations of the residuals and estimates—the attacked measurements might adversely influence the estimation as the attacks can propagate in many networks.

4.6 Risk Mitigation—A Dynamic Response Model A risk mitigation strategy is formulated given estimates of measured and estimated outputs and reconstructed UIs and CAs. The formulation uniquely integrates dynamic state estimation, considering attacks and UIs, with an integer linear pro-

4.6 Risk Mitigation—A Dynamic Response Model

85

ˆ gramming formulation. It utilizes .r(t), .vˆ q (t), and .w(t) to determine the authenticity of PMU measurements, and identify the to-be-diagnosed measurements, while guaranteeing the observability of the power system through available measurements.

4.6.1 Weighted Deterministic Threat Level Formulation Definition 1 Given a dynamic system simulation for τ ∈ [kT , (k + 1)T ], where T is any simulation time period, the weighted deterministic threat level (WDTL) vector z for all PMU measurements is defined as  z=

(k+1)T

.

kT

 2  2 ˆ ) Y y q (τ ) − yˆ q (τ ) + W w(τ dτ,

(4.19)

where Y ∈ R4q×4q and W ∈ R4q×nw are constant weight matrices that assign ˆ Note that weights for the estimation error (y q − yˆ q ) and UI approximation w.  2 ˆ ) is equivalent to the square of individual entries. w(τ The scalar quantity zi , the ith WDTL, depicts the threat level present in the ith PMU signal. Ideally, if zi is large the associated PMU must be isolated until the attack is   physically mitigated. The quantity y q (τ )− yˆ q (τ ) can be replaced with either vˆ q (t) or r(t).

4.6.2 Dynamic Risk Mitigation Optimization Problem Deactivating a PMU may lead to a failure in dynamic state estimation, as explained in the following Remark 2. Hence, an optimization-based framework can be used to solve the problem with occasionally conflicting objectives. Remark 2 Recall that to design a dynamic state estimator under UIs and CAs, the power system defined in (4.8) should satisfy certain rank conditions on the statespace matrices. For example, for the SMO observer, the following condition has to be satisfied: .

rank(C q B w ) = rank(B w ) = ζ,

in addition to the detectability condition (Assumption 2). Deactivating a PMU causes a change in the .C q matrix and might render the observer design infeasible. Definition 2 Let .πi be a binary decision variable that determines the connectivity of the ith PMU measurement in the next time period (i.e., .τ ∈ [kT , (k + 1)T ]):

86

4 Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation

 πi =

.

0 ↔ zi − γ i ≥ 0 1 ↔ zi − γi < 0.

If the WDTL for the ith measurement is smaller than a certain threshold .γi , the corresponding measurement qualifies to stay activated in the subsequent time period. This combinatorial condition can be represented as zi − γi + πi M ≥ 0.

(4.20)

.

zi − γi − (1 − πi )M < 0,

(4.21)

where M is a large positive constant [38]. We now formulate the dynamic risk mitigation optimization problem (DRMOP):

.

max π

4q 

(4.22)

αi πi.

i=1

s.t. πi = {0, 1}, ∀i = 1, 2, . . . 4q. zi − γi + πi M ≥ 0 4q 

(4.24)

.

zi − γi − (1 − πi )M < 0

(4.23)

.

βi πi ≤ Z.

(4.25) (4.26)

i=1

rank(C q (π)B w ) = ζ .   λ I −A rank i = 10ng , ∀ λi > 0. C q (π )

(4.27) (4.28)

To increase the observability of a power system, the formulated optimization problem maximizes the weighted number of active PMU measurements in the next time period, finding the PMU measurements that have to be disabled for some period of time while ensuring the feasibility of dynamic state estimation. Albeit there are at most q PMUs, we assume that there are .4 q .πi ’s. The first two constraints depict the logical representation of the binary variable .πi in terms of the WDTL and the threshold. The third constraint represents a weight for each PMU. For example, if the ith PMU measurement is from a significantly important substation, the system operator can choose the corresponding weight .βi to be greater than other weights. The two rank constraints (4.27)–(4.28) ensure that the dynamic state estimation formulated above is still feasible for the next time period; see Assumption 2. Note that this problem is different from the optimal PMU placement problem [39, 40], in the sense that we already know the location of the PMUs. The DRMOP (4.22)–(4.28) is a highly nonlinear, integer programming problem that cannot be solved efficiently—due to the two rank constraints. When

4.6 Risk Mitigation—A Dynamic Response Model

87

excluding the two rank constraints (4.27)–(4.28), the DRMOP (4.22)–(4.26) is an integer linear programming (ILP) problem.

4.6.3 Dynamic Risk Mitigation Algorithm In Sects. 4.6.1 and 4.6.2, two related problems for different time-scales are discussed: the estimation problem is executed in real time, whereas the DRMOP is solved after generating the estimates in the former problem. Here, an algorithm is presented to jointly integrate these two problems, without including the rank constraints in the computation of the DRMOP solution, and hence guaranteeing fast solutions for the optimization problem. Algorithm 4.1 Dynamic risk mitigation algorithm (DRMA) compute small-signal system matrices .A, B w , C q obtain SMO matrices .Lq , F q by solving (4.11) formulate the SMO dynamics as in (4.9) set .k := 0 for .τ ∈ [kT + ξ, (k + 1)T ] measure the PMU output .y(τ ) ˆ ) from (4.18), (4.9), (4.15) compute .r(τ ), yˆ q (τ ), w(τ compute WDTL .z from (4.19), given .Y , U

 solve the DRMOP (4.22)–(4.26) for .π = π1 , · · · , π4q update .C q = C q (π) if (4.27) and (4.28) are satisfied go to Step 17 else solve the DRMOP (4.22)–(4.26) with relaxed conditions on some .πi ’s and update .C q 15: end if 16: end for 17: set .k := k + 1; go to Step 5 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:

Algorithm 4.1 illustrates the dynamic risk mitigation algorithm (DRMA). First, the small-signal matrices are computed given the nonlinear power system model.4 The sliding-mode observer is then designed to ensure accurate state estimation, as in Sect. 4.4.1. Since the rank constraints are computationally challenging, they are not considered in Algorithm 4.1. ˆ are all computed. Then, for .τ ∈ [kT + ξ, (k + 1)T ], the quantities .r, yˆ q , and .w We assume that the computational time to solve the DRMOP (4.22)–(4.26) is .ξ . After solving the ILP, the output matrix .C q is updated, depending on the solution 4 Note that for the 10th-order model, the controls are incorporated in the power system dynamics, and hence .B u and .u(t) are zeros, yet the algorithm provided here is for the case when known controls are considered.

88

4 Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation

of the optimization problem, as the entries in the .C q reflect the location of active and inactive PMUs. The matrix update might render the state estimation problem for .τ ∈ [(k + 1)T + ξ, (k + 2)T ] infeasible as the rank conditions might not be satisfied. To ensure that, these conditions can either be made a constraint in the optimization problem or a condition in the mitigation algorithm. If this rank conditions are not satisfied, some .πi ’s can be reset and the DRMOP can be solved again. The counter k is then incremented and the algorithm is applied for the following time periods. Remark 3 In Algorithm 4.1, it is assumed that the observer from [41] is applied to generate dynamic estimates, given that the power system is subjected to UIs and CAs. However, This assumption is not restricting. In fact, any other robust observer/estimator may be used for state estimation, and hence, the algorithm can be changed to reflect that update in the observer design. Subsequently, the matching rank condition can be replaced by other conditions that guarantee a fast reconstruction of state estimates. Remark 4 The DRMOP assumes an initial power system configuration, i.e., PMUs are placed in certain locations. Since the .C q reflects the latter, the observer design would differ for various configurations of PMUs. This influences the state and UI estimation, and hence the generation of the real-time weighted deterministic threat levels for all PMUs. Thus, the solution to the DRMOP varies for different PMU configurations, while guaranteeing the real-time observability of the power system through available measurements. The high-level detail of this scheme is illustrated in Fig. 4.1. The presented solution scheme requires two essential inputs: (a) The potentially incomplete knowledge of the power system model and parameters (Sect. 4.2). (b) Real-time PMU measurements from a subset of the power network model (Sect. 4.2).

Knowledge of Power Network Model Real-Time Depiction of the Nominal System – System Verification

Real-Time PMU Data from Some Substations

Yes

No

Estimation of Unknown System Parameters & Inputs

Detection of Malfunctions, CyberAttacks, Disturbances

Power System Reconfiguration & Diagnostics

Identification of Attack Locations or Faulty Channels

Grid Still Observable?

Relax Some Constraints to Ensure Observability

Fig. 4.1 A flow chart depicting a high-level representation of the risk mitigation strategy. © [2018] IEEE. Reprinted, with permission, from [18]

4.7 Case Studies

89

Note that (a) and (b) are related in the sense that if the knowledge of a generator’s parameters is available, it is possible to associate this knowledge to specific PMU measurements. Given these two inputs—(a) is static knowledge, while (b) is continuously updated—we construct a real-time depiction of the nominal system, i.e., the power system experiencing no CAs or major disturbances. This step is important as it verifies PMU measurements and the system model. Using the latter and real-time PMU data, we estimate unknown power system parameters and UIs (Sect. 4.5). Given that we have more accurate parameters, the detection of malfunctions, CAs, and major disturbances becomes possible. However, the detection of a CA does not necessarily imply the knowledge of the source of attacks. Hence, the identification of PMU channels with faulty measurements is needed after the detection of such events (Sect. 4.6). The faulty or attacked power system components are then diagnosed and reconfigured. The reconfiguration/diagnostics of the grid should guarantee the observability of the grid (Sect. 4.6.3).

4.7 Case Studies The developed methods are tested on a 16-machine, 68-bus system—extracted from Power System Toolbox (PST) [42]. This system is a reduced order, equivalent version of the interconnected New England test system and New York power system [43]. The model discussed in Sect. 4.2 is used and there are 160 state variables. A total of .q = 12 PMUs are installed at the terminal bus of generators 1, 3, 4, 5, 6, 8, 9, 10, 12, 13, 15, and 16. This PMU placement is randomly chosen and is not optimized for the best observability of the system dynamic sates. More details for optimal PMU placement can be found in [39, 40]. The sampling rate of the measurements is set to be 60 frames per second to mimic the PMU sampling rate. Results of two scenarios are presented. For Scenario I, dynamic state estimation only under UIs is performed and an illustration on the estimation of UIs and states via the methods discussed in Sect. 4.5 is provided. For Scenario II, we add CAs to some PMU measurements and show how the DRMOP can be utilized to estimate, detect, and filter out the presence of these attacks by leveraging the generated estimates from Scenario I.

4.7.1 Scenario I: Dynamic Reconstruction of UI & DSE We show the performance of the SMO in Sect. 4.4.1 in regard to the estimation of (a) the states of the 16 generators (160 states) and (b) UI reconstruction method (developed in Sect. 4.5). DSE is performed over a time period of 20 s. This experiment is considered as a baseline for the Scenario II. The simultaneous

90

4 Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation

estimation of states and UIs can then be utilized to determine the generators that are subject to the most disturbances through available PMU measurements. After the estimation of states and UIs, these quantities are then used to detect a CA against PMU measurements. As discussed in Sect. 4.3, the UIs model a wide range of process uncertainties ranging from load deviations, bounded nonlinearities, and unmodeled dynamics, which can significantly influence the evolution of states due to their nature. However, UIs are not physically analogous to malicious CAs, i.e., UIs exist due to phenomena related to the physics of the power system modeling. For many dynamical systems, it can be hard to determine the impact of UIs. Hence, an ideal scenario would be to use different forms of time-varying UI functions, and a randomly generated .B w matrix with significant magnitude. Here, assume the power system is subject to six different UI functions with different variations, magnitudes, and frequencies. The considered vector of UIs is as follows: ⎡ w (t) ⎢ 1 ⎢w (t) ⎢ 2 ⎢w (t) .w(t) = ⎢ 3 ⎢ ⎢w4 (t) ⎢ ⎣w5 (t) w6 (t)

  = k1 cos(ψ1 t) + e−2t + max 0, 1 − = = = = =

k1 sin(ψ1 t) k1 cos(ψ1 t) k2 square(ψ2 t) k2 sawtooth(ψ2 t)   k2 sin(ψ2 t) + e−5t

|t−5| 3

⎤ ⎥ ⎥ ⎥ ⎥ ⎥, ⎥ ⎥ ⎥ ⎦

where .k1 , k2 and .ψ1 , ψ2 are different magnitudes and frequencies of the UI signals, respectively. We choose .ψ1 = 5, ψ2 = 10. To test the SMO and UI estimator, we use small and large values for .k1 and .k2 . Specifically, we choose two set of values for the magnitude as .k1 = 0.01, k2 = 0.02 and .k1 = 1, k2 = 2. The .B w ∈ R160×6 matrix is randomly chosen using the randn function in MATLAB. The Euclidean norm of .B w is .B w  = 13.8857 which is significant in magnitude. Consequently, since .B w is not sparse, the six chosen UIs influence the 6 dynamics of the 160 states of the power system, i.e., .x˙1 (t) = a 1 x(t) + i=1 Bw1,i wi (t), where .a 1 is the first row of the .A matrix. The above UI setup is used in this experiment as an extreme scenario, as this allows to test the robustness of the estimator in this chapter. Remark 5 Using large magnitudes for the UIs (i.e., large .k1 and .k2 ) leads to unrealistic behavior of the power system as each differential equation is adversely influenced by an unknown, exogenous quantity as described above. This scenario is less likely to occur in real applications, yet the result is included to show the robustness of the simultaneous estimation of the states and the UI estimation scheme. After computing the linearized state-space matrices for the system (.A and .C q ) and given .B w , the LMIs in (4.11) are solved using CVX [34] in Matlab. The SMO parameters are .η = 8 and .ν = 0.01. The numbers of linear and free

4.7 Case Studies

91

Fig. 4.2 Norm of the state estimation error for different magnitudes of UIs (A logarithmic scale is used for y-axis, as initial values for .ex (t)2 are much higher than subsequent ones. For larger magnitudes of UIs, the norm converges to a larger value, albeit it is still very small.) © [2018] IEEE. Reprinted, with permission, from [18]

variables involved in the semidefinite programming are .25,760 and 79685 with .13,840 constraints. The number of variables can be computed by counting the number of unique entries of the LMI in (4.11). The solution to this optimization problem is done offline, as most observer gain matrices are computed before the actual dynamic simulation. The simulations are performed on a 64-bit operating system, with Intel Core i7-4770S CPU 3.10 GHz, equipped with 8 GB of RAM. The execution time for the offline SMO design (4.11) is 5 min and 39 s (CVX converges after 42 iterations). The dynamic simulations for the power system and the observer dynamics are performed simultaneously using the ode15s solver with a computational time of nearly 6 s. After finding a solution for (4.11), we simulate the power system and generate estimates of the states .x(t) and the UIs .w(t) via the SMO design (4.9) and UI estimate (4.15). Figure 4.2 shows the norm of the state estimation error for the above two sets of ˆ values of k’s. The estimation error norm is .ex (t)2 = x(t)− x(t) 2 , ∀ t ∈ [0, T ], which indicates the performance of the SMO for all time instants and all generators. It is clearly seen that the estimation error converges to nearly zero—even for high magnitudes of UIs. This demonstrates that the state estimates for all generators are converging to the actual states. Moreover, Fig. 4.3 shows the estimation of the six UIs given above with .k1 = 1, k2 = 2. While the six UIs vary in terms of magnitude, frequency, and shape, the estimates generated by (4.15) are all very close to the actual UIs.

5 The number of linear and free variables in (4.11) is equal to the number of entries of the symmetric positive definite matrix .P (linear vars.), and .Lq , F q . Since .P ∈ R10ng ×10ng and .ng = 16, the number of linear variables is .(160 + 1) × 160 = 25,760, while the number of free variables is equal to .nw · 4 q + 10 ng · 4 q = 6 × 48 + 160 × 48 = 7968.

92

4 Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation

Fig. 4.3 Converging estimation for the 6 UIs (The UI estimator successfully tracks the UIs for .k1 = 1, k2 = 2 for different shapes of UIs. The results for the UI reconstruction for .k1 = 0.01, k2 = 0.02 are omitted; however, the results are similar to the case here.) © [2018] IEEE. Reprinted, with permission, from [18]

4.7.2 Scenario II: DSE Under UIs & CAs Here we present the case when some PMU measurements are compromised by a CA. The attacker’s objective is to drastically alter the PMU measurements, thus influencing the decisions that could be made by the system operator. We present a hypothetical CA vector on four PMU measurements, which are the fifth 

to eighth measurements, i.e., . eR6 (t) eR8 (t) eR9 (t) eR10 (t) . Note that these four measurements come from the PMUs installed at the terminal buses of Generators 6, 8, 9, and 10, respectively. Since a total of .4 q = 48 measurements are available, the CA .v q (t) ∈ R48 can be constructed in terms of different unknown signal structures, as follows: 

v q (t) = 04 cos(t) 2 sawtooth(t) 3 square(t) 4 sin(t) 040 ,

.

where the cosine, sawtooth, square, and sine signals are the attacks against the four PMU measurements with different magnitudes and variations. Under the same UIs from Scenario I, a CA is artificially added after .t = 20 s. Figure 4.4 shows the generation of residual vector, .r(t), from (4.18). It is seen that the residuals of measurements 5–8 with artificially added CAs are significantly higher than the other measurements without CAs.

4.7 Case Studies

93

Fig. 4.4 Residuals of the 48 measurements generated by the attack detection filter (4.18) (The residuals are notably similar to the actual attacks. For example, for .t = 20.05s, .r7 (t) = 2.986, while .v7 (t) = 3 square(t) = 3.) © [2018] IEEE. Reprinted, with permission, from [18]

After designing the SMO for the power system, achieving desirable state and UI estimates (Scenario I), and generating residuals that are estimates of CAs, we simulate the DRMOP. Assume all PMU measurements have the same weight in the objective function, i.e., .αi = 1, ∀i = 1, . . . , 48. The WDTL vector .z is computed for the 1-s time horizon (for .t = [20, 21]), and generic threshold is chosen as .γ = 10. Given .z(t) and the parameters of the DRMOP, the ILP is solved via 

YALMIP [36]. The optimal solution for the ILP yields .π = 14 04 144 , hence the PMU measurements 5–8 are the most infected among the available 48 ones. This result confirms the findings of the attack detection filter in Fig. 4.4. Following Algorithm 4.1, we check whether the solution generated by YALMIP violates the rank condition (Assumption 2). As measurements 5–8 are removed from the estimation process for diagnosis, the updated .C q matrix, now a function of .π , is obtained—.C q now has 44 rows instead of 48. The system is detectable and the rank-matching condition is still satisfied. Hence, no extra constraints should be reimposed on the ILP, as illustrated in Algorithm 4.1. After guaranteeing the necessary conditions on the existence of the dynamic state estimator and updating .C q , simulations are performed again to regenerate the state estimates and weighted residual threat levels. Figure 4.5 illustrates the impact of CAs on the state estimation process before, during, and after the attack is detected and isolated. Following the removal of the attacked measurements (and not the attack, as the attack cannot be physically controlled) at .t = 21 s due to the risk mitigation strategy, the estimation error norm converges again to small values. Figure 4.6 shows the impact of this strategy on DSE for Generator 1. During the short-lived CA, state estimates diverge. However, the risk mitigation strategy restores the estimates to their nominal status under UIs and CAs.6

6 While the CAs are still targeting the four PMU measurements after .t = 21s, the attacks become futile. Consequently, their impact on state estimation becomes nonexistent, as the four attacked measurements are isolated from the estimation process.

94

4 Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation

Fig. 4.5 Norm of estimation error before, during, and after the CA is detected and isolated (For ≤ t ≤ 21 s, the norm increases exponentially, signifying the occurrence of an attack or a large disturbance. After the removal of the artificial attack due to the outcome of the DRMOP, the estimation error norm converges again to small values.) © [2018] IEEE. Reprinted, with permission, from [18]

.20 s

Fig. 4.6 Estimation of the states of Generator 1 before, during, and after the detection and isolation of the CA. After the DRMA succeeds in detecting the compromised measurements and isolating them from the estimation process after .t = 21 s, the state estimates converge again to the actual ones. © [2018] IEEE. Reprinted, with permission, from [18]

References

95

Remark 6 The DRMA requires the redesign of the SMO immediately after the detection of the compromised PMU measurements. Since .C q has less rows as the number of measurements is supposedly reduced after some of them are isolated, the SMO is designed again for an updated observer gain matrices .Lq and .F q . For a large-scale system, the solution of the LMI in (4.11) can take a significant amount of time. Hence, a database of the most possible PMU measurement configurations (different .C q ’s) with corresponding SMO LMI solutions (different .Lq ’s and .F q ’s) can be obtained offline and stored when needed to guarantee a minimal off-time. Note that for a different time period, the power system and the PMUs might encounter a different set of UIs or attack vectors. Furthermore, the optimization problem can be redesigned to allow for the inclusion of the possibly, now-safe measurements. The optimal solution to the DRMOP is a tradeoff between keeping the power system observable through the possible measurements—enabling state estimation and real-time monitoring—and guaranteeing that the system and the observer are robust to UIs and CAs.

References 1. F. Schweppe, J. Wildes, Power system static-state estimation, part I: exact model. IEEE Trans. Power App. Syst. PAS-89(1), 120–125 (1970) 2. A. Abur, A. Expósito, Power System State Estimation: Theory and Implementation, ser. Power Engineering (Willis) (CRC Press, Boca Raton, 2004) 3. A. Monticelli, Electric power system state estimation. Proc. IEEE 88(2), 262–282 (2000) 4. G. He, S. Dong, J. Qi, Y. Wang, Robust state estimator based on maximum normal measurement rate. IEEE Trans. Power Syst. 26(4), 2058–2065 (2011) 5. J. Qi, G. He, S. Mei, Z. Gu, A review of power system robust state estimation. Adv. Technol. Electr. Eng. Energy 30(3), 59–64 (2011) 6. J. Qi, G. He, S. Mei, F. Liu, Power system set membership state estimation, in IEEE Power and Energy Society General Meeting (2012), pp. 1–7 7. J. Zhao, A. Gómez-Expósito, M. Netto, L. Mili, A. Abur, V. Terzija, I. Kamwa, B. Pal, A.K. Singh, J. Qi et al., Power system dynamic state estimation: motivations, definitions, methodologies, and future work. IEEE Trans. Power Syst. 34(4), 3188–3198 (2019) 8. J. Qi, K. Sun, J. Wang, H. Liu, Dynamic state estimation for multi-machine power system by unscented Kalman filter with enhanced numerical stability. IEEE Trans. Smart Grid 9(2), 1184–1196 (2018) 9. Z. Huang, P. Du, D. Kosterev, S. Yang, Generator dynamic model validation and parameter calibration using phasor measurements at the point of connection. IEEE Trans. Power Syst. 28(2), 1939–1949 (2013) 10. A. Hajnoroozi, F. Aminifar, H. Ayoubzadeh et al., Generating unit model validation and calibration through synchrophasor measurements. IEEE Trans. Smart Grid 6(1), 441–449 (2015) 11. M. Ariff, B. Pal, A. Singh, Estimating dynamic model parameters for adaptive protection and control in power system. IEEE Trans. Power Syst. 30(2), 829–839 (2015) 12. Y. Liu, P. Ning, M.K. Reiter, False data injection attacks against state estimation in electric power grids, in Proceedings of the 16th ACM Conference on Computer and Communications Security, ser. CCS ’09 (ACM, New York, 2009), pp. 21–32

96

4 Risk Mitigation against Cyber Attacks Based on Dynamic State Estimation

13. F. Pasqualetti, F. Dörfler, F. Bullo, Cyber-physical attacks in power networks: models, fundamental limitations and monitor design, in 2011 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC) (2011), pp. 2195–2201 14. F. Pasqualetti, F. Dörfler, F. Bullo, Attack detection and identification in cyber-physical systems. IEEE Trans. Autom. Control 58(11), 2715–2729 (2013) 15. F. Pasqualetti, F. Dörfler, F. Bullo, Control-theoretic methods for cyberphysical security: geometric principles for optimal cross-layer resilient control systems. IEEE Control Syst. 35(1), 110–127 (2015) 16. W. Wang, Z. Lu, Cyber security in the smart grid: survey and challenges. Comput. Netw. 57(5), 1344–1371. http://www.sciencedirect.com/science/article/pii/S1389128613000042 17. S. Mousavian, J. Valenzuela, J. Wang, A probabilistic risk mitigation model for cyber-attacks to PMU networks. IEEE Trans. Power Syst. 30(1), 156–165 (2015) 18. A.F. Taha, J. Qi, J. Wang, J.H. Panchal, Risk mitigation for dynamic state estimation against cyber attacks and unknown inputs. IEEE Trans. Smart Grid 9(2), 886–899 (2018) 19. P. Sauer, M. Pai, Power System Dynamics and Stability (Prentice Hall, Englewood Cliffs, 1998) 20. A. Chakrabortty, P. Khargonekar, Introduction to wide-area control of power systems, in American Control Conference (ACC), 2013 (2013), pp. 6758–6770 21. D. Hill, On the equilibria of power systems with nonlinear loads. IEEE Trans. Circuits Syst. 36(11), 1458–1463 (1989) 22. J. Chen, R. Patton, Robust Model-Based Fault Diagnosis for Dynamic Systems (Springer, Berlin, 2012) 23. A. Pertew, H. Marquezz, Q. Zhao, Design of unknown input observers for Lipschitz nonlinear systems, in Proc. American Control Conf. (2005), pp. 4198–4203 24. Electric Sector Failure Scenarios and Impact Analyses, Electric Power Research Institute (EPRI), Tech. Rep., 2014 25. S. Sridhar, G. Manimaran, Data integrity attacks and their impacts on SCADA control system, in IEEE Power and Energy Society General Meeting (2010), pp. 1–6 26. A. Giani, E. Bitar, M. Garcia, M. McQueen, P. Khargonekar, K. Poolla, Smart grid data integrity attacks. IEEE Trans. Smart Grid 4(3), 1244–1253 (2013) 27. S. Liu, X. P. Liu, A. E. Saddik, Denial-of-service (DoS) attacks on load frequency control in smart grids, in Innovative Smart Grid Technologies (ISGT), 2013 IEEE PES (2013), pp. 1–6 28. T.T. Tran, O.S. Shin, J.H. Lee, Detection of replay attacks in smart grid systems, in Int. Conf. Computing, Management and Telecommunications (ComManTel) (2013), pp. 298–302 29. W. Chen, M. Saif, Unknown input observer design for a class of nonlinear systems: an LMI approach, in Proc. 2006 American Control Conf. (2006) 30. Smart Grid System Report to Congress, United States Department of Energy (DOE), Tech. Rep., 2014 31. A.P. Meliopoulos, V. Madani, D. Novosel et al., Synchrophasor Measurement Accuracy Characterization. North American SynchroPhasor Initiative (NASPI), Tech. Rep., 2007 32. V. Utkin, J. Guldner, J. Shi, Sliding Mode Control in Electro-Mechanical Systems, 2nd edn. ser. Automation and Control Engineering (CRC Press, Boca Raton, 2009). http://books.google. com/books?id=8IrLBQAAQBAJ ˙ 33. S. Hui, S. Zak, “Observer design for systems with unknown inputs. Int. J. Appl. Math. Comput. Sci. 15, 431–446 (2005) 34. M. Grant, S. Boyd, CVX: Matlab software for disciplined convex programming, Tech. Rep., 2013 35. M. Grant, S. Boyd, Graph implementations for nonsmooth convex programs, in Recent Advances in Learning and Control, ser. Lecture Notes in Control and Information Sciences (Springer, London, 2008), pp. 95–110 36. J. Löfberg, Yalmip: a toolbox for modeling and optimization in MATLAB, in Proc. CACSD Conf., Taipei, Taiwan, 2004. http://users.isy.liu.se/johanl/yalmip ˙ 37. K. Kalsi, S. Hui, S. Zak, Unknown input and sensor fault estimation using sliding-mode observers, in American Control Conference (ACC), 2011 (2011), pp. 1364–1369

References

97

38. A. Bemporad, M. Morari, Control of systems integrating logic, dynamics, and constraints. Automatica 35(3), 407–427 (1999). http://www.sciencedirect.com/science/article/pii/ S0005109898001782 39. J. Qi, K. Sun, W. Kang, Optimal PMU placement for power system dynamic state estimation by using empirical observability Gramian. IEEE Trans. Power Syst. 30(4), 2041–2054 (2015) 40. J. Qi, K. Sun, W. Kang, Adaptive optimal PMU placement based on empirical observability Gramian, in 10th IFAC Symposium on Nonlinear Control Systems (NOLCOS) (Monterey, 2016) 41. K. Kalsi, Decentralized observer-based control of uncertain dynamic systems. PhD Dissertation, Purdue University, 2010 42. J.H. Chow, K.W. Cheung, A toolbox for power system dynamics and control engineering education and research. IEEE Trans. Power Syst. 7(4), 1559–1564 (1992) 43. A.K. Singh, B.C. Pal, IEEE PES Task Force on Benchmark Systems for Stability Controls– Report on the 68-Bus, 16-Machine, 5-Area System (2013)

Chapter 5

Comparing Kalman Filters and Observers Against Cyber Attacks

5.1 Introduction For both static state estimation (SSE) [1–6] and dynamic state estimation (DSE) [7], two major challenges make their practical application significantly difficult. First, the system model and parameters used for estimation can be inaccurate, which is often called model uncertainty [8], consequently deteriorating estimation in some scenarios. Second, the measurements used for estimation are vulnerable to cyber attacks, which in turn leads to compromised measurements that can greatly mislead the estimation. For the first challenge, there are recent efforts on validating the dynamic model of the generator and calibrating its parameters [9–13], which DSE can be based on. However, model validation itself can be very challenging. Hence, it is a more viable solution to improve the estimators by making them more robust to the model uncertainty. For the second challenge, false data injection (FDI) attacks against SSE are proposed in [14]. After that it has been widely studied about how to mitigate this type of attack and further secure the monitoring and control of power grids [15–17]. In [18] an extended distributed state estimation is proposed for the tolerable FDI attacks on the SSE. In [19] an optimal phasor measurement unit (PMU) placement based defense scheme is proposed for a least-effort data integrity attack on DC SSE. In [20] FDI attacks are designed to bypass the anomaly detection of the Kalman filtering in DSE. Enhancement of Kalman filtering and temporal-based detection algorithm are proposed as countermeasures against the attacks. In Chap. 4 a risk mitigation strategy is used to eliminate the threat levels from the power grid’s unknown inputs and potential cyber attacks based on a sliding-mode observer and an attack detection filter [21]. In this chapter, a cubature Kalman filter (CKF) [22] is introduced, which uses a more accurate cubature approach and possesses an important virtue of mathematical rigor rooted in the third-degree spherical-radial cubature rule for numerically computing Gaussian-weighted integrals. Then, a nonlinear observer is used for © Springer Nature Switzerland AG 2023 J. Qi, Smart Grid Resilience, https://doi.org/10.1007/978-3-031-29290-3_5

99

100

5 Comparing Kalman Filters and Observers Against Cyber Attacks

the power system DSE problem that only requires computing a Luenberger-like gain matrix. This computation can be performed offline—and hence the presented observer is scalable for large-scale power networks. Last but not least, a realistic power system DSE problem is designed by developing the system and measurement models and considering various practical scenarios such as unknown initial conditions, model uncertainties including process noise, unknown and unavailable inputs, and inaccurate parameters, and different types of measurement noises and cyber attacks against measurements. Thorough numerical experiments are presented to showcase the performance of the nonlinear observer and CKF in comparison with three other methods that have been recently applied to DSE. The conceptual strengths and limitations of different methods with significant model uncertainty and cyber attacks are also discussed [23].

5.2 4th-Order Nonlinear Power System Model Each of the G generators is described by the fourth-order transient model in local d − q reference frame [24]:

.

⎧ δ˙i = ωi −ω0 ⎪  ⎪ ⎪ ⎪ K ω ⎪ ⎨ ω˙ i = 2H0i Tmi − Tei − ωDi0 (ωi − ω0 )   .  = 1 E  − (x − x  )i − e e ˙ ⎪  di qi qi di di ⎪ Td0i  fdi ⎪  ⎪ ⎪ ⎩ e˙ = 1 − e + (xqi − x  )iqi , qi di di T

(5.1)

q0i

where i is the generator serial number, .δi is the rotor angle, .ωi is the rotor speed  and .e are the transient voltage along .q and .d axes; .i and .i are in rad/s, and .eqi qi di di stator currents at .q and .d axes; .Tmi is the mechanical torque, .Tei is the electric airgap torque, and .Efdi is the internal field voltage; .ω0 is the nominal rotor frequency,   .Hi is the inertia constant, and .KDi is the damping factor; .T q0i and .Td0i are the opencircuit time constants for .q and .d axes; .xqi and .xdi are the synchronous reactance  and .x  are the transient reactance, respectively, at the .q and .d axes. and .xqi di The .Tmi and .Efdi in (5.1) are considered as inputs. The set of generators where PMUs are installed is denoted by .GP . For generator .i ∈ GP , the terminal voltage phasor .Eti = eRi + j eIi and current phasor .Iti = iRi + j iIi can be measured and are used as the outputs. Correspondingly, the state vector .x ∈ Rn , input vector .u ∈ Rv , and output vector .y ∈ Rp are     x = δ  ω eq ed .    u = Tm Efd  .   y = e R  e I  iR  iI  .

.

(5.2a) (5.2b) (5.2c)

5.2 4th-Order Nonlinear Power System Model

101

The .Tei , .idi , and .iqi can be written as functions of .x using (4.2b)–(4.2l). The outputs .iRi and .iIi can be written as functions of .x using (4.2b)–(4.2f), and the outputs .eRi and .eIi can also be written as function of .x using (4.4a)–(4.4b). The dynamic model (5.1) can then be rewritten in a general state-space form as

.

x˙ = Ax + Bu + φ(x) y = h(x),

(5.3)

where

I G is an identity matrix of dimension G, .1G is a vector of all ones with dimension  · · · T  ] , .T  = G, .K D = [KD1 · · · KDG ] , .H = [H1 · · · HG ] , .T d0 = [Td01 d0G q0  · · · T  ] , .T = [T · · · T ] , .x = [x · · · x ] , .x  = [x  · · · x  ] , [Tq01 e e1 eG d1 dG d q0G d1 dG d        .x q = [xq1 · · · xqG ] , .x q = [x q1 · · · xqG ] , .i d = [id1 · · · idG ] , .i q = [iq1 · · · iqG ] , .h includes functions (4.2e)–(4.2f) and (4.4) for all generators, . and . are the Hadamard division/product (elementwise division/product) of two vectors, and .(a)d gets a square diagonal matrix with the elements of vector .a on the main diagonal. Note that the model presented here is used for DSE for which the real time inputs are assumed to be unavailable and .Tmi and .Efdi only take steady-state values, mainly because these inputs are difficult to measure [25, 26]. However, when we simulate the power system to mimic the real system dynamics, we model an IEEE Type DC1 excitation system and a simplified turbine-governor system for each generator and thus .Tmi and .Efdi change with time due to the governor and the excitation control, which leads to a tenth order generator model. More details about the model can be found in [21]. .

102

5 Comparing Kalman Filters and Observers Against Cyber Attacks

A more detailed model including the exciter and governor, such as the one in Chap. 4, is not used here. This is mainly because (1) A good model should be simple enough to facilitate design [8], (2) it is harder to validate a detailed model and there are also more parameters that need to be calibrated [9, 10, 27], and (3) the computational burden can be higher for a more detailed model, which may not satisfy the requirement of real time estimation. The dynamic model of the power system can be written in a general state-space form as x˙ = f (x, u) (5.4a) y = h(x, u),

(5.4b)

where .x ∈ Rn , .u ∈ Rv , and .y ∈ Rp are the vectors of the state, input, and output, and .f and .h are the nonlinear state transition functions and measurement functions. We rewrite (5.4) by separating the nonlinear term in the state transition functions as

x˙ = Ax + Bu + φ(x)

(5.5a)

y = h(x, u),

(5.5b)

where .φ(x) represents the nonlinear term that models the interconnections in a multi-machine power system.

5.3 Model Uncertainty and Cyber Attacks 5.3.1 Model Uncertainty The term model uncertainty refers to the differences or errors between models and reality. Various control and estimation theory studies investigated methods that addresses the discrepancy between the actual physics and models. Model uncertainty can be caused by the following reasons. 1. Unknown inputs: The unknown inputs against the system dynamics include .ud (representing the unknown plant disturbances), .uu (denoting the unknown control inputs), and .f a (depicting potential generators actuator faults). For simplicity, we can combine them into one unknown input quantity .w =      ud uu f a . Define .B w to be the known weight distribution matrix of the distribution of unknown inputs with respect to each state-equation. The term .B w w models a general class of unknown inputs such as nonlinearities, modeling uncertainties, noise, parameter variations, unmeasurable system inputs, model reduction errors, and actuator faults [28, 29]. The process dynamics under unknown inputs can be written as follows:

5.3 Model Uncertainty and Cyber Attacks

x˙ = Ax + Bu + B w w + φ(x).

.

103

(5.6)

2. Unavailable inputs: Real time inputs .u can be unavailable, in which case the steady-states inputs .u0 are used for estimation. 3. Parameter inaccuracy: The parameters in the system model can be inaccurate. For example, the reduced admittance matrix can be inaccurate when a fault or the following topology change are not detected.

5.3.2 Cyber Attacks National Electric Sector Cybersecurity Organization Resource (NESCOR) developed cybersecurity failure scenarios with corresponding impact analyses [30]. The wide-area monitoring, protection, and control (WAMPAC) failure scenarios related to DSE based on PMU measurements include: (a) Measurement Data (from PMUs) Compromised due to phasor data concentrator (PDC) Authentication Compromise and (b) Communications Compromised between PMUs and Control Center [30]. Specifically, the following three types of attacks [30, 31] are considered. 1. Data integrity attack: An adversary attempts to corrupt the content of either the measurement or the control signals. A specific example of data integrity attacks are Man-in-the-Middle attacks, where the adversary intercepts the measurement signals and modifies them in transit. For DSE the PMU measurements can be modified and corrupted. 2. Denial of Service (DoS) attack: An attacker attempts to introduce a denial in communication of measurement. The communication of a sensor could be jammed by flooding the network with spurious packets. DoS attacks can happen at a variety of communication layers in a smart grid, such as the physical layer, Medium Access Control (MAC) layer, network and transport layer, and application layer. For DSE the consequence can be that the updated measurements cannot be sent to the control center. 3. Replay attack: A special case of data integrity attacks, where the attacker replays a previous snapshot of a valid communication packet sequence that contains measurements in order to deceive the system. For DSE the PMU measurements can be changed to be those in the past. For a data integrity cyber attack, it can be modeled by adding a vector .v(t). Then the measurement model under cyber attacks becomes

y(t) = h x(t), u(t) + v(t).

.

(5.7)

A DoS attack on output i at .t ∈ (t1 , t2 ] can be modeled as

yi = hi x(t1 ), u(t1 ) , t ∈ (t1 , t2 ].

.

(5.8)

104

5 Comparing Kalman Filters and Observers Against Cyber Attacks

A replay attack on output i at .t ∈ [t1 , t2 ] can be modeled as

yi = hi x(t − ΔT ), u(t − ΔT ) , t ∈ [t1 , t2 ],

.

(5.9)

where .ΔT = t2 − t1 . Apart from cyber attacks against the PMU measurements, the commonly assumed Gaussian distribution of the PMU measurement noise may not hold for real data. Extensive results using field PMU data from WECC system has revealed that the Gaussian assumption is questionable [32]. Therefore, it would be valuable to evaluate the performance of different DSE methods under non-Gaussian noise.

5.4 DSE Algorithms As for the approaches for performing DSE, there are mainly two classes of methods that have been proposed: 1. Stochastic estimators: given a discrete-time representation of a dynamical system, the observed measurements, and the statistical information on process noise and measurement noise, Kalman filter (KF) and its many derivatives have been proposed that calculate the Kalman gain as a function of the relative certainty of the current state estimate and the measurements [22, 33–36]. 2. Deterministic observers: given a continuous- or discrete-time dynamical system depicted by state-space matrices, a combination of matrix equalities and inequalities are solved, while guaranteeing asymptotic (or bounded) estimation error. The solution to these equations is often matrices that are used in an observer to estimate states and other dynamic quantities [37–39]. In Chap. 4 a sliding-mode observer is applied to perform DSE under unknown inputs and cyber attacks [21]. In [40] a robust observer is developed based on the concept of .L∞ stability which has a performance guarantee on the state estimation error norm relative to the magnitude of uncertainty from unknown inputs, and process and measurement noises.

5.4.1 Kalman Filters for Power System DSE Unlike many estimation methods that are computationally unmanageable or require special assumptions about the form of the process and observation models, KF only utilizes the first two moments of the state (mean and covariance) in its update rule [33]. It consists of two steps: in prediction step, the filter propagates the estimate from last time step to current time step; in update step, the filter updates the estimate

5.4 DSE Algorithms

105

using collected measurements. KF was initially developed for linear systems while for power system DSE the system equations and outputs have strong nonlinearity. Thus variants of KF that can deal with nonlinear systems have been introduced, such as extended Kalman filter (EKF) [25, 41], unscented Kalman filter (UKF) [26, 42– 45], square-root unscented Kalman filter (SR-UKF) [46–49], extended particle filter [50, 51], and ensemble Kalman filter [52]. We consider a nonlinear system (without model uncertainty or attack vectors) in discrete-time form as (5.10a) x k = f (x k−1 , uk−1 ) + q k−1 y k = h(x k , uk ) + r k ,

(5.10b)

where .x k ∈ Rn , .uk ∈ Rv , and .y k ∈ Rp are states, inputs, and observed measurements at time step k; the estimated mean and estimated covariance of the estimation error are .m and .P ; .f and .h are vectors consisting of nonlinear state transition functions and measurement functions; .q k−1 ∼ N (0, Qk−1 ) is the Gaussian process noise at time step .k − 1; .r k ∼ N (0, R k ) is the Gaussian measurement noise at time step k; and .Qk−1 and .R k are covariance matrices of .q k−1 and .r k .

5.4.1.1

EKF

Although EKF maintains the elegant and computationally efficient recursive update form of KF, it works well only in a “mild” nonlinear environment, owing it to the first-order Taylor series approximation for nonlinear functions [22]. It is sub-optimal and can easily lead to divergence. Also, the linearization can be applied only if the Jacobian matrix exists and calculating Jacobian matrices can be difficult and errorprone. For power system DSE, EKF has been discussed in [25, 41].

5.4.1.2

UKF

The unscented transformation (UT) [53] is developed to address the deficiencies of linearization by providing a more direct and explicit mechanism for transforming mean and covariance information. Based on UT, Julier et al. [35, 36] propose UKF as a derivative-free alternative to EKF. The Gaussian distribution is represented by a set of deterministically chosen sample points called sigma points. UKF has been applied to power system DSE, for which no linearization or calculation of Jacobian matrices is needed [26, 42–45]. In UKF, a total of .2 n√+ 1 sigma points (denoted by .X ) are calculated from the columns of the matrix .η P as

106

5 Comparing Kalman Filters and Observers Against Cyber Attacks

⎧ X (0) = m ⎪ ⎪ ⎪  √  ⎪ ⎨ X (i) = m + η P , i ⎪ ⎪  √  ⎪ ⎪ ⎩ X (i) = m − η P , i

(5.11a) i = 1, . . . , n

(5.11b)

i = n + 1, . . . , 2n

(5.11c)

with weights ⎧ λ ⎪ ⎪ w (0) ⎪ m = ⎪ n + λ ⎪ ⎪ ⎪ ⎪ ⎪ λ ⎪ (0) 2 ⎪ ⎪ ⎨ w c = n + λ + (1 − α + β) ⎪ 1 ⎪ ⎪ , i = 1, . . . , 2n w (i) ⎪ m = ⎪ 2(n + λ) ⎪ ⎪ ⎪ ⎪ ⎪ 1 ⎪ (i) ⎪ ⎩ wc = , i = 1, . . . , 2n, 2(n + λ)

(5.12a) (5.12b) (5.12c) (5.12d)

where the matrix square-root of a positive semidefinite matrix .P is a matrix .S = √  P such that .P = √ SS , .wm and .wc are, respectively, weights for the2mean and the covariance, .η = n + λ, .λ is a scaling parameter defined as .λ = α (n + κ) − n, and .α, .β, and .κ are constants and .α and .β are nonnegative. The basic idea of UKF is to choose the sigma point set to capture a number of low-order moments of the prior density of the states as correctly as possible, and then compute the posterior statistics of the nonlinear functions (either state transition functions .f or measurement functions .h) by UT which approximates the mean and the covariance of the nonlinear function by a weighted sum of projected sigma points. However, for the sigma points, the stem at the center (the mean) is highly significant as it carries more weight which is usually negative for high-dimensional systems. Therefore, the UKF is supposed to encounter numerical instability troubles when used in high-dimensional problems. Several techniques including SR-UKF have been proposed to solve this problem [46, 47]. Recently SR-UKF has been applied to DSE in power systems in [48].

5.4.1.3

CKF

EKF and UKF can suffer from the curse of dimensionality while becoming detrimental in high-dimensional state-space models of size twenty or more— especially when there are high degree of nonlinearities in the equations that describe the state-space model [22, 54], which is exactly the case for power systems. Making use of the spherical-radial cubature rule, Arasaratnam et al. [22] propose CKF, which possesses an important virtue of mathematical rigor rooted in the

5.4 DSE Algorithms

107

third-degree spherical-radial cubature rule for numerically computing Gaussianweighted integrals. Compared with EKF, UKF, and SR-UKF, CKF has the following advantages: 1. Compared with EKF and similar to UKF and SR-UKF, CKF is also derivativefree and is easier for application. 2. Similar to UKF and SR-UKF, CKF also uses a weighted set of symmetric points to approximate the Gaussian distribution. But the cubature-point set does not have a stem at the center and thus does not have the numerical instability problem of UKF discussed in Sect. 5.4.1.2. 3. UKF treats the derivation of the sigma point set for the prior density and the computation of posterior statistics as two disjoint problems. By contrast, CKF directly derives the cubature-point set to accurately compute the first two-order moments of a nonlinear transformation, naturally increasing the accuracy of the numerical estimates for moment integrals [22]. 4. As sub-optimal Bayesian filters, EKF, UKF, and CKF all have some robustness to model uncertainties and measurement outliers [55]. The extent of robustness depends on their ability to accurately deal with the nonlinear transformations. EKF is the least robust method due to a first-order Taylor series approximation of the nonlinear functions while CKF has the highest robustness thanks to its more accurate cubature approach.

5.4.2 Nonlinear Observers for Power System DSE Dynamic observers have been thoroughly investigated for different classes of systems. To mention a few, they have been developed for linear time-invariant (LTI) systems, nonlinear time-invariant (NLTI) systems, LTI and NLTI systems with unknown inputs, sensor and actuator faults, stochastic dynamical systems, and hybrid systems [37, 38]. Most observers utilize the plant’s outputs and inputs to generate real time estimates of the plant states, unknown inputs, and sensor faults. The cornerstone is the innovation function—sometimes a simple gain matrix designed to nullify the effect of unknown inputs and faults. Linear and nonlinear functional observers, sliding-mode observers, unknown input observers, and observers for fault detection and isolation are all examples on developed observers for different classes of systems, under different assumptions [39]. In comparison with KF techniques, nonlinear and robust observers have not been utilized for power system DSE. However, they inherently possess the theoretical, technical, and computational capabilities to perform good estimation of the power system’s dynamic states. As for implementation, observers are simpler than KFs. For observers, matrix gains are computed offline to guarantee the asymptotic stability of the estimation error or the boundedness of the estimation error within a neighborhood of the origin.

108

5 Comparing Kalman Filters and Observers Against Cyber Attacks

Here, we present the observer in [56] that can be applied for DSE in power systems. This observer assumes that the nonlinear function .φ(x) in (5.5) satisfies the one-sided Lipschitz condition. Specifically, there exists .ρ ∈ R such that .∀ x 1 , x 2 in a region D including the origin with respect to the state .x, there is

φ(x 1 ) − φ(x 2 ), x 1 − x 2 ≤ ρ x 1 − x 2 2 ,

.

where . ·, · is the inner product. Besides, the nonlinear function is also assumed to be quadratically inner-bounded as



φ(x 1 ) − φ(x 2 ) φ(x 1 ) − φ(x 2 ) ≤ μ x 1 − x 2 2

.

+ ϕ φ(x 1 ) − φ(x 2 ), x 1 − x 2 , where .μ and .ϕ are real numbers. Similar results related to the dynamics of multimachine power systems established a similar quadratic bound on the nonlinear component (see [57]). To determine the constants .ρ, μ, and ϕ, a simple offline algorithm can be implemented. For example, we can define a region of interest n .D ⊂ R to be the state-space region where the system operates. For the multimachine power network, this region is the intersection of all upper and lower bounds of states, which can be written as max min max min max D = [x min 1 , x 1 ] × [x 2 , x 2 ] × · · · × [x n , x n ].

.

This region .D can be obtained by the method discussed in [58]. We sample random points in this region. Denser sampling yields a more realistic Lipschitz constant, while requiring more computational time. Let .nD be the total number of samples inside .D. Algorithm 5.1 includes the steps required to obtain .ρ. Specifically, .ρ can be calculated from  ∂φ  ρ = lim sup β ∂x

.

for all .x ∈ D, where .β(H ) denotes the logarithmic matrix norm of matrix .H defined as

I + H − 1 ,

→0

β(H ) = lim

.

where . · represents any matrix norm. It is shown in [59] that the logarithmic matrix norm can also be written as    1  H +H ≤ H . .β(H ) = λmax 2

5.4 DSE Algorithms

109

At each iteration, we obtain the maximum eigenvalue of 1 . 2



∂φ(x) + ∂x



∂φ(x) ∂x

  ,

(5.13)

where the Jacobian of the nonlinear function is evaluated at the ith sampled point. Finally, .ρ is computed by finding the maximum value of .β(·) over .D. Algorithm 5.1 Obtaining one-sided Lipschitz constant .ρ input φ(x) and D ρ0 ← −∞ for i = 1 : nD do x ← xi 

     1 ∂φ(x) ∂φ(x)  compute ρi = λmax + 2 ∂x ∂x ρi = max(ρi−1 , ρi ) end for end for output ρ ← ρnD

Algorithm 5.1 is an offline search method to obtain the one-side Lipschitz constant. The most computationally intensive step of Algorithm 5.1 is finding the eigenvalues of an n-by-n matrix (where n is the state dimension, i.e., .x ∈ Rn ), followed by finding the maximum eigenvalue. There are many algorithms to find the eigenvalues of a matrix, but the majority rely on matrix decompositions. Classical algorithms replying on the singular value decomposition (SVD) require .O(n3 ), and since the algorithm is repeated .nD times, then the computational complexity of Algorithm 5.1 is .O(n3 · nD ). To compute the quadratic inner-boundedness constants .μ and .ϕ, a similar algorithm can be obtained. In particular, instead of sampling over individual .x i ∈ D, two state-space samples .x i and .x j can be sampled at each iteration .(i, j ), and



φ(x i ) − φ(x j ) φ(x i ) − φ(x j ) ≤ μi,j x i − x j 2

.

+ ϕi,j φ(x i ) − φ(x j ), x i − x j is evaluated iteratively for all possible permutations .x i and .x j in .D to obtain the maximum values for .μ and .ϕ that satisfy the above inequality. Following these assumptions, the dynamics of this observer can be written as .



ˆ + L y − C xˆ , x˙ˆ = Axˆ + Bu + φ(x)

(5.14)

where .L is a matrix gain determined by Algorithm 5.2. First, given the Lipschitz constants .ρ, ϕ, and .μ, the linear matrix inequality (LMI) in (5.17) is solved for

110

5 Comparing Kalman Filters and Observers Against Cyber Attacks

positive constants . 1 , 2 , and .σ and a symmetric positive semidefinite matrix .P . Utilizing the solution .L in (5.18), the state estimates generated from (5.14) are guaranteed to converge to the actual values of the states. Algorithm 5.2 Observer design algorithm P

P A P

PA

I C C

P

P I

I

0

(5.15)

I L

L

P

C

(5.16)

Note that the observer design utilizes linearized measurement functions .C, which for power system DSE can be obtained by linearizing the nonlinear functions in (5.5). However, since the measurement functions have high nonlinearity, when performing the estimation we do not use (5.14), as in [56], but choose to directly use the nonlinear measurement functions as .



ˆ + L y − h(x) ˆ . x˙ˆ = Axˆ + Bu + φ(x)

(5.17)

The main principle behind the observer design is to minimize the difference ˆ (i.e., .y(t)) between the estimated measurements and the actual ones (.y(t)) through

ˆ . The objective of this term is to nullify/minimize the innovation term .L y − h(x) the discrepancies due to errors in the estimation, model uncertainties, measurement ˆ noise, or attack vectors. The difference between .y(t) and .y(t) yields an estimate for the attack vector. Hence, the states evolution for the observer are indirectly aware of the differences between measured and potentially corrupt outputs and the estimated ones. Given the solution to the LMI, the estimation error dynamics will be asymptotically stable. Finally, it is important to mention that Algorithm 5.2 can be performed offline, which implies that the observer in real time only requires a state estimate update while all other quantities are given; after finding .L one can simulate (5.17). For Algorithm 5.2, we solve the LMI (5.17). Primal–dual interior-point

methods for LMIs/SDPs have a worst-case complexity estimate of .O m2.75 L1.5 , where m is the number of variables (a function of n and .ny , the state and output dimensions)

5.5 Numerical Results

111

and L is the number of constraints [60]. In various problems arising in estima tion/control, it is shown that the complexity estimate is closer to .O m2.1 L1.2 ; see [60] and references therein. Recent advancements in semidefinite programming that utilize the sparse nature of the state-space matrices can be exploited to improve the computational efficiency. The observer is endowed with the following properties: (a) it assumes that the generators’ control inputs are not known to the state estimation method; (b) it tolerates three classes of cyber-attacks (data integrity, denial of service, and replay attack) and other disturbances while accurately reconstructing the power system state within seconds of an attack or large disturbance; (c) it assumes no statistical properties of the noise targeting process and measurement models; and (d) it requires no major real time computation, in comparison with other estimation methods that are computationally expensive.

5.5 Numerical Results EKF, UKF, SR-UKF, CKF, and the nonlinear observer are tested on the 16-machine 68-bus system extracted from Power System Toolbox (PST) [61]. The one-line diagram of the test system is shown in Fig. 5.1. For the DSE we consider both unknown inputs to the system dynamics and cyber attacks against the measurements including data integrity, DoS, and replay attacks. All tests are performed on a 3.2GHz Intel(R) Core(TM) i7-4790S desktop. For simulating the power system to mimic the real system dynamics, we model an IEEE Type DC1 excitation system and a simplified turbine-governor system, which leads to the 10th-order generator model in Chap. 4. The simulation data is generated as follows. 1. The simulation data is generated by the detailed 10-th order model. The sampling rate is 60 samples/s. 2. In order to generate a dynamic response, a three-phase fault is applied at bus 6 of branch 6–11 and is cleared at the near and remote ends after .0.05 and .0.1 s. 3. All generators are equipped with PMUs at their terminal buses. The real and imaginary parts of the voltage phasor and current phasor are considered as measurements. 4. The sampling rate of the measurements is set to be 60 frames/s to mimic the PMU sampling rate. 5. Gaussian process noise is added and the corresponding process noise covariance is a diagonal matrix, whose diagonal entries are the square of 5% of the largest state changes [50]. 6. Gaussian noise with variance .0.012 is added to the PMU measurements. 7. Each entry of the unknown input coefficients .B w is a random number that follows normal distribution with zero mean and variance as the square of 50% of the

112

5 Comparing Kalman Filters and Observers Against Cyber Attacks

Fig. 5.1 16-machine 68-bus system

largest state changes. Note that the variance here is much bigger than that of the process noise. 8. The unknown input vector .w is set as a function of t as ⎡

⎤ 0.5 cos(ωu t) ⎢ 0.5 sin(ωu t) ⎥ ⎢ ⎥ ⎢ 0.5 cos(ω t) ⎥ u ⎢ ⎥ ⎢ ⎥ ⎢ 0.5 sin(ωu t) ⎥ .w(t) = ⎢ ⎥, ⎢ ⎥ −e−5t ⎢ ⎥ ⎢0.2 e−t cos(ωu t)⎥ ⎢ ⎥ ⎣ 0.2 cos(ωu t) ⎦ 0.1 sin(ωu t) where .ωu = 100 is the frequency of the given signals. The unknown inputs are manually chosen, showing different scenarios for inaccurate model and parameters without a predetermined distribution. For DSE we use the 4th-order generator model in Sect. 5.2. The Kalman filters and the observer are set as follows. 1. DSE is performed on the post-contingency system on time period .[0, 10 s], which starts from the fault clearing.

5.5 Numerical Results

113

2. The initial estimated mean of the rotor speed is set to be .ω0 and that for the other states is set to be twice of the real initial states. 3. The initial estimation error covariance is set to be .0.1I n . 4. The covariance of the process noise is set as a diagonal matrix, whose diagonal entries are the square of 5% of the largest state changes [50]. 5. The covariance for the measurement noise is a diagonal matrix, whose diagonal entries are .0.012 , as in [50]. 6. For both UKF and SR-UKF, .2n + 1 sigma points are used in the unscented transformation. 7. For UKF and SR-UKF, a popular heuristic .n + κ = 3 proposed in [62] is used to choose the parameter .κ in unscented transformation in order to minimize the moments of the standard Gaussian and the sigma points up to the fourth order. 8. UKF is performed by using the EKF/UKF toolbox [63], in which the function “.schol” is used to calculate the lower triangular Cholesky factor of a matrix and can get an output even when the matrix is not positive semidefinite [48]. 9. For the observer in Sect. 5.4.2, the LMI (5.17) is solved via CVX on MATLAB [64]. The Lipschitz constants in Algorithm 5.2 are set as .ρ = 10, .μ = 1, and .ϕ = 1. 10. The mechanical torque and internal field voltage are considered as unavailable inputs and take steady-state values, because they are difficult to measure [25, 26]. 11. On .[0, 1 s] the reduced admittance matrix is the one for the pre-contingency state. 12. Data integrity, DoS, and replay attacks, as discussed in Sect. 5.3.2, are added to the PMU measurements.

5.5.1 Scenario 1: Data Integrity Attack Data integrity attack is added to the first eight measurements, i.e., the real parts of the voltage phasors. The compromised measurements are obtained by scaling the real measurements by .0.6 and .1/0.6, respectively, for the first four and the last four ˆ measurements. The 2-norm of the relative error of the states, .||(x(t) − x(t))/x(t)|| 2, for different estimation methods is shown in Fig. 5.2. It is seen that the error norm for both CKF and the observer can quickly converge among which the observer converges faster, while the value that CKF converges to is slightly smaller in magnitude. By contrast, EKF, UKF, and SR-UKF do not perform as well. We also show the states estimation for Generator 1 in Fig. 5.3. It is seen that the observer and CKF converge rapidly while the EKF fails to converge after 10 seconds. The estimation for UKF is separately shown in Fig. 5.4 because its estimated states are far away from the real states. Note that the real system dynamics are stable while the UKF estimation misled by the data integrity attack indicates that the system is unstable.

114

5 Comparing Kalman Filters and Observers Against Cyber Attacks

Fig. 5.2 Norm of relative error of the states in Scenario 1

Fig. 5.3 Estimated states by EKF, SR-UKF, CKF, and the observer in Scenario 1

Fig. 5.4 Estimated states by UKF in Scenario 1

The real, compromised, and estimated values for the first measurement are shown in Fig. 5.5. For the observer, CKF, and SR-UKF, the estimated measurements are very close to the actual ones. For EKF there are some differences between the estimates and the real values, while UKF’s generated estimates are close to the compromised measurements, indicating that it is completely misled by the cyber attack.

5.5 Numerical Results

115

Fig. 5.5 Estimated values for the first measurement in Scenario 1

Fig. 5.6 Norm of relative error of the states: (a) Scenario 2; (b) Scenario 3

5.5.2 Scenario 2: DoS Attack and Scenario 3: Replay Attack The first eight measurements are kept unchanged for .t ∈ [3 s, 6 s] to mimic the DoS attack in which case the updated measurements cannot be sent to the control center due to, for example, jammed communication between PMU to PDC or between PDC to the control center [30]. Replay attack is added on the first eight measurements for which there is .yi (t) = yi (t − 3) for .t ∈ [3 s, 6 s]. The 2-norm of the relative error of the states is shown in Fig. 5.6 and the results are very similar to those in Scenario 1.

5.5.3 Discussion on Model Uncertainty Estimation Scenario 1 is used to show the performance of different methods in dealing with model uncertainty. The states of the system with and without model uncertainty, including unknown inputs, unavailable inputs, and parameter inaccuracy, are separately denoted by .x and .x 0 , which are shown in Fig. 5.7. The difference between .x and .x 0 , .x − x 0 , is shown in Fig. 5.8. The estimated model uncertainty for Generator

116

5 Comparing Kalman Filters and Observers Against Cyber Attacks

Fig. 5.7 System states with and without model uncertainty in Scenario 1

Fig. 5.8 The .x − x 0 in Scenario 1

1 by EKF, SR-UKF, CKF, and the observer is shown in Fig. 5.9 and that for UKF is shown in Fig. 5.10. It is seen that SR-UKF, CKF, and the observer can estimate the model uncertainty pretty well while EKF does not perform as well and UKF has the worst performance for which the model uncertainty estimation is largely misled by the data integrity attack.

5.5.4 Discussion on Cyber Attack Detection The normalized innovation ratio of the j th measurement at time step k is defined as the ratio between the deviation of its actual measurement from the predicted measurement and the expected standard deviation [26, 42, 44]:

5.5 Numerical Results

117

Fig. 5.9 Estimated model uncertainty for EKF, SR-UKF, CKF, and the observer in Scenario 1

Fig. 5.10 Estimated model uncertainty for UKF in Scenario 1

yk,j − yˆk|k−1,j λk,j =  , Pyy,k|k−1,j

.

(5.18)

where .Pyy,k|k−1,j is the j th diagonal element of the measurement covariance. The normalized innovation ratio for all of the measurements for EKF, UKF, SR-UKF, and CKF in Scenario 1 are shown in Fig. 5.11. It is seen that for EKF and UKF the normalized innovation ratios of a few uncompromised measurements are greater than those for the compromised measurements, which means that EKF and UKF cannot correctly detect the compromised measurements. For SRUKF and CKF, after a few seconds (in the first second some uncompromised measurements can have bigger normalized innovation ratios mainly because the parameters used for estimation in that time period are inaccurate), the normalized innovation ratios for compromised measurements are significantly greater than those for the uncompromised ones, and the compromised measurements can be

118

5 Comparing Kalman Filters and Observers Against Cyber Attacks

Fig. 5.11 Cyber attack detection in Scenario 1 for (a) EKF, (b) UKF, (c) SR-UKF, and (d) CKF

detected by a properly chosen threshold. Compared to SR-UKF, CKF has a better performance. For Scenarios 2–3 the results are similar and are not presented. For the observer, since there is no measurement covariance we detect cyber attacks against the measurements directly using the measurement innovation .yk,j − yˆk|k−1,j , which is shown in Fig. 5.12a for Scenario 1. After the first second in which the parameters are inaccurate the measurement innovation of the compromised measurements are significantly greater than those of the uncompromised ones and thus the compromised measurements can be easily detected. In Fig. 5.12b we also show the difference between the real and estimated measurements, .y0 − y. ˆ For both the compromised and uncompromised measurements, the estimated measurements from the observer can almost immediately converge to the real measurements after the first second. In Fig. 5.13 we show the measurement innovation of the observer for Scenario 2 (Fig. 5.13a) and Scenario 3 (Fig. 5.13b), which indicates that the compromised measurements can also be detected by the observer.

5.5 Numerical Results

119

Fig. 5.12 Cyber attack detection and real and estimated measurements for the observer in Scenario ˆ (b) .y0 − yˆ 1: (a) .y − y;

Fig. 5.13 Cyber attack detection for the observer in (a) Scenario 2 and (b) Scenario 3

5.5.5 Non-Gaussian Measurement Noise We performed DSE under data integrity attack in Scenario 1 with non-Gaussian measurement noise, including the Laplace noise and Cauchy noise. Laplace noise with mean m and scale s is generated by rLaplace = m − s sgn(U1 ) ln(1 − 2|U1 |),

.

(5.19)

where m is set to be zero, s is chosen as .0.02, and .U1 is a random number sampled from a uniform distribution in the interval .(−0.5, 0.5]. Cauchy noise is obtained by sampling the inverse cumulative distribution function of the distribution

rCauchy = a + b tan π(U2 − 0.5) ,

.

(5.20)

where .a = 0 and .b = 10−4 are the location and scale parameters, and .U2 is randomly sampled from the uniform distribution on the interval .(0, 1).

120

5 Comparing Kalman Filters and Observers Against Cyber Attacks

Fig. 5.14 Norm of relative error of the states under different measurement noises: (a) Laplace noise; (b) Cauchy noise Table 5.1 Time for performing estimation for 10 seconds EKF

UKF

SR-UKF

CKF

Observer

.4.0 s

.11.2 s

.11.6 s

.9.9 s

.5.8 s

The norms of the relative error of the states under Laplace and Cauchy noises are shown in Fig. 5.14. Similar to the case with Gaussian noise, the observer and CKF also outperform the other methods. Under Laplace noise, the performance of different methods are similar to that under Gaussian noise. However, under Cauchy noise that has a super-heavy tailed distribution with no defined moments, the performance of all methods degrade, converging to a much bigger norm of relative error of the states.

5.5.6 Computational Efficiency For the above three scenarios, the time for estimation by different methods is listed in Table 5.1. It is seen that EKF and the observer are more efficient than the other methods while CKF is the least efficient. Note that the time reported here is from MATLAB implementations. It can be greatly reduced by more efficient, such as C-based implementations.

5.6 Summary Here, various functionalities of DSE methods and their strengths and weaknesses relative to each functionality are presented based on (a) the technical, theoretical capabilities and (b) experimental results in Sect. 5.5.

5.6 Summary

121

• Nonlinearities in dynamics: UKF, SR-UKF, CKF, and the observer in Sect. 5.4.2 all work on nonlinear systems while EKF assumes linearized system dynamics. Besides, the presented observer uses linearized measurement functions for design but directly uses nonlinear measurement functions for estimation. • Solution feasibility: The main principle that governs the design of most observers is based on finding a matrix gain satisfying a certain condition, such as a solution to a matrix inequality. The state estimates are guaranteed to converge to the actual ones if a solution to the LMI exists. In contrast, KF methods do not require that. • Unknown initial conditions: Observer designs are independent on the knowledge of the initial conditions of the system. However, if the estimator’s initial condition is chosen to be reasonably different from the actual one, estimates from KF might not converge to the actual ones. • Robustness to model uncertainty and cyber attacks: The observer in Sect 5.4.2 and the CKF outperforms UKF (SR-UKF) and EKF in the state estimation under model uncertainty and attack vectors. The observer is robust to model uncertainties because it only assumes that the nonlinearities in the power system dynamics (i.e., .φ(x)) satisfy the quadratic inner-boundedness and the one-sided Lipschitz condition. As in Sect. 5.4.1.3, CKF is more robust mostly due to its more accurate cubature approach, which, however, requires more careful investigation. With that in mind, it is hard to generalize. Therefore, advanced theoretical understanding and more numerical experiments vis-à-vis robustness to model uncertainty and attacks are both needed. • Tolerance to process and measurement noise: The observer in Sect. 5.4.2 is tolerant to measurement and process noise similar to those assumed for KFs. By design, the KF techniques are developed to deal with such noise assuming statistical distributions are provided. However, many observers do not assume any statistical information regarding unknown inputs. • Convergence guarantees: Observers have theoretical guarantees for convergence while for the KF techniques there is no strict proof to guarantee that the estimation converges to actual states. • Numerical stability: Observers do not have numerical stability problems while UKF can encounter numerical instability because the estimation error covariance matrix is not always guaranteed to be positive semidefinite [48]. • Tolerance to parametric inaccuracy: KF-based methods can tolerate inaccurate parameters to some extent. Dynamic observers deal with parametric uncertainty in the sense that all uncertainties can be augmented to the unknown input component in the state dynamics (.B w w). • Computational complexity: CKF, UKF (SR-UKF), and EKF all have computational complexity of .O(n3 ) [22, 46]. Since the observers’ matrix gains are obtained offline by solving LMIs, observers are easier to implement as only the dynamics are needed in the estimation.

122

5 Comparing Kalman Filters and Observers Against Cyber Attacks

References 1. F. Schweppe, J. Wildes, Power system static-state estimation, part I: exact model. IEEE Trans. Power App. Syst. PAS-89(1), 120–125 (1970) 2. A. Abur, A. Expósito, Power System State Estimation: Theory and Implementation. Power Engineering (Willis) (CRC Press, Boca Raton, 2004) 3. A. Monticelli, Electric power system state estimation. Proc. IEEE 88(2), 262–282 (2000) 4. G. He, S. Dong, J. Qi, Y. Wang, Robust state estimator based on maximum normal measurement rate. IEEE Trans. Power Syst. 26(4), 2058–2065 (2011) 5. J. Qi, G. He, S. Mei, Z. Gu, A review of power system robust state estimation. Adv. Technol. Electr. Eng. Energy 30(3), 59–64 (2011) 6. J. Qi, G. He, S. Mei, F. Liu, Power system set membership state estimation, in IEEE Power and Energy Society General Meeting (2012), pp. 1–7 7. J. Zhao, A. Gómez-Expósito, M. Netto, L. Mili, A. Abur, V. Terzija, I. Kamwa, B. Pal, A. K. Singh, J. Qi et al., Power system dynamic state estimation: motivations, definitions, methodologies, and future work. IEEE Trans. Power Syst. 34(4), 3188–3198 (2019) 8. K. Zhou, J.C. Doyle, K. Glover, Robust and Optimal Control (Prentice Hall, New Jersey, 1996) 9. Z. Huang, P. Du, D. Kosterev, S. Yang, Generator dynamic model validation and parameter calibration using phasor measurements at the point of connection. IEEE Trans. Power Syst. 28(2), 1939–1949 (2013) 10. M. Ariff, B. Pal, A. Singh, Estimating dynamic model parameters for adaptive protection and control in power system. IEEE Trans. Power Syst. 30(2), 829–839 (2015) 11. S.R. Khazeiynasab, J. Qi, Generator parameter calibration by adaptive approximate bayesian computation with sequential monte carlo sampler. IEEE Trans. Smart Grid 12(5), 4327–4338 (2021) 12. S.R. Khazeiynasab, J. Qi, I. Batarseh, Generator parameter estimation by q-learning based on pmu measurements, in 2021 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT) (IEEE, Piscataway, 2021), pp. 01–05 13. S.R. Khazeiynasab, J. Qi, PMU measurement based generator parameter calibration by blackbox optimization with a stochastic radial basis function surrogate model, in 2020 52nd North American Power Symposium (NAPS) (IEEE, Piscataway, 2021), pp. 1–6 14. Y. Liu, P. Ning, M.K. Reiter, False data injection attacks against state estimation in electric power grids. ACM Trans. Inf. Syst. Secur. 14(1), 13:1–13:33 (2011) 15. R.J.T.O. Kosut, L. Jia, L. Tong, Malicious data attacks on the smart grid. IEEE Trans. Smart Grid 2(4), 645–658 (2011) 16. G.D.O. Vukovic, K.C. Sou, H. Sandberg, Network-aware mitigation of data integrity attacks on power system state estimation. IEEE J. Sel. Area Comm. 30(6), 1108–1118 (2012) 17. W.Y.D.A.N.Z.W.Z.Q. Yang, J. Yang, On false data-injection attacks against power system state estimation: modeling and countermeasures. IEEE Tran. Parallel Distrib. Syst. 25(3), 717–729 (2014) 18. D. Wang, X. Guan, T. Liu, Y. Gu, C. Shen, Z. Xu, Extended distributed state estimation: a detection method against tolerable false data injection attacks in smart grids. Energies 7(3), 1517–1538 (2014) 19. Q. Yang, D. An, R. Min, W. Yu, X. Yang, W. Zhao, On optimal pmu placement-based defense against data integrity attacks in smart grid. IEEE Trans. Inform. Forens. Secur. 12(7), 1735– 1750 (2017) 20. Q. Yang, L. Chang, W. Yu, On false data injection attacks against kalman filtering in power system dynamic state estimation. Secur. Commun. Netw. 9(9), 833–849 (2016) 21. A.F. Taha, J. Qi, J. Wang, J.H. Panchal, Risk mitigation for dynamic state estimation against cyber attacks and unknown inputs. IEEE Trans. Smart Grid 9(2), 886–899 (2018) 22. I. Arasaratnam, S. Haykin, Cubature Kalman filters. IEEE Trans. Autom. Control 54(6), 1254–1269 (2009)

References

123

23. J. Qi, A.F. Taha, J. Wang, Comparing kalman filters and observers for power system dynamic state estimation with model uncertainty and malicious cyber attacks. IEEE Access 6, 77155– 77168 (2018) 24. P. Sauer, M. Pai, Power System Dynamics and Stability (Prentice Hall, Hoboken, 1998) 25. E. Ghahremani, I. Kamwa, Dynamic state estimation in power system by applying the extended Kalman filter with unknown inputs to phasor measurements. IEEE Trans. Power Syst. 26(4), 2556–2566 (2011) 26. A. Singh, B. Pal, Decentralized dynamic state estimation in power systems using unscented transformation. IEEE Trans. Power Syst. 29(2), 794–804 (2014) 27. A. Hajnoroozi, F. Aminifar, H. Ayoubzadeh et al., Generating unit model validation and calibration through synchrophasor measurements. IEEE Trans. Smart Grid 6(1), 441–449 (2015) 28. J. Chen, R. Patton, Robust Model-Based Fault Diagnosis for Dynamic Systems (Springer Publishing Company, Cham, 2012) 29. A. Pertew, H. Marquezz, Q. Zhao, Design of unknown input observers for Lipschitz nonlinear systems, in Proceedings of the 2005, American Control Conference (2005), pp. 4198–4203 30. Electric Sector Failure Scenarios and Impact Analyses. Electric Power Research Institute (EPRI), Technical Report (2014) 31. S. Sridhar, A. Hahn, M. Govindarasu, Cyber–physical system security for the electric power grid. Proc. IEEE 100(1), 210–224 (2012) 32. S. Wang, J. Zhao, Z. Huang, R. Diao, Assessing Gaussian assumption of PMU measurement error using field data. IEEE Trans. Power Del. PP(99), 1–1 (2017) 33. R.E. Kalman, A new approach to linear filtering and prediction problems. J. Fluids Eng. 82(1), 35–45 (1960) 34. A.H. Jazwinski, Stochastic Processes and Filtering Theory (Courier Corporation, North Chelmsford, 2007) 35. S.J. Julier, J.K. Uhlmann, New extension of the Kalman filter to nonlinear systems, in AeroSense’97. International Society for Optics and Photonics (1997), pp. 182–193 36. S. Julier, J. Uhlmann, Unscented filtering and nonlinear estimation. Proc. IEEE 92(3), 401– 422 (2004) 37. W. Kang, A.J. Krener, M. Xiao, L. Xu, A survey of observers for nonlinear dynamical systems, in Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications (Vol. II) (Springer, Berlin, 2013), pp. 1–25 38. A. Radke, Z. Gao, A survey of state and disturbance observers for practitioners, in American Control Conference (2006), pp. 5183–5188 39. Z. Hidayat, R. Babuska, B.D. Schutter, A. Núñez, Observers for linear distributed-parameter systems: A survey, in IEEE International Symposium on Robotic and Sensors Environments (ROSE) (2011), pp. 166–171 40. S.A. Nugroho, A.F. Taha, J. Qi, Robust dynamic state estimation of synchronous machines with asymptotic state estimation error performance guarantees. IEEE Trans. Power Syst. 35(3), 1923–1935 (2020) 41. Z. Huang, K. Schneider, J. Nieplocha, Feasibility studies of applying Kalman filter techniques to power system dynamic state estimation, in Proceedings of the 2007 Power Engineering Conference (2007), pp. 376–382 42. G. Valverde, V. Terzija, Unscented Kalman filter for power system dynamic state estimation. IET Gener. Transm. Distrib. 5(1), 29–37 (2011) 43. E. Ghahremani, I. Kamwa, Online state estimation of a synchronous generator using unscented Kalman filter from phasor measurements units. IEEE Trans. Energy Convers. 26(4), 1099–1108 (2011) 44. S. Wang, W. Gao, A. Meliopoulos, An alternative method for power system dynamic state estimation based on unscented transform. IEEE Trans. Power Syst. 27(2), 942–950 (2012) 45. K. Sun, J. Qi, W. Kang, Power system observability and dynamic state estimation for stability monitoring using synchrophasor measurements. Control Eng. Practice 53, 160–172 (2016)

124

5 Comparing Kalman Filters and Observers Against Cyber Attacks

46. R. Van Der Merwe, E.A. Wan, The square-root unscented Kalman filter for state and parameter-estimation, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, vol. 6 (2001), pp. 3461–3464 47. J. Qi, K. Sun, W. Kang, Optimal PMU placement for power system dynamic state estimation by using empirical observability gramian. IEEE Trans. Power Syst. 30(4), 2041–2054 (2015) 48. J. Qi, K. Sun, J. Wang, H. Liu, Dynamic state estimation for multi-machine power system by unscented Kalman filter with enhanced numerical stability. IEEE Trans. Smart Grid 9(2), 1184–1196 (2018) 49. J. Qi, K. Sun, W. Kang, Adaptive optimal pmu placement based on empirical observability gramian. IFAC-PapersOnLine 49(18), 482–487 (2016) 50. N. Zhou, D. Meng, S. Lu, Estimation of the dynamic states of synchronous machines using an extended particle filter. IEEE Trans. Power Syst. 28(4), 4152–4161 (2013) 51. Y. Cui, R. Kavasseri, A particle filter for dynamic state estimation in multi-machine systems with detailed models. IEEE Trans. Power Syst. 30(6), 3377–3385 (2015) 52. N. Zhou, D. Meng, Z. Huang, G. Welch, Dynamic state estimation of a synchronous machine using PMU data: a comparative study.IEEE Trans. Smart Grid 6(1), 450–460 (2015) 53. J. K. Uhlmann, Simultaneous map building and localization for real time applications. Transfer Thesis, University of Oxford, Oxford, UK, 1994 54. R. Bellman, R. Bellman, Adaptive Control Processes: A Guided Tour. Rand Corporation. Research Studies (Princeton University Press, Princeton, 1961) 55. I.A.S.A. Gadsden, M. Al-Shabi, S.R. Habibi, Combined cubature Kalman and smooth variable structure filtering: A robust nonlinear estimation strategy. Signal Process. 96, 290– 299 (2014) 56. W. Zhang, H. Su, H. Wang, Z. Han, Full-order and reduced-order observers for onesided lipschitz nonlinear systems using riccati equations. Commun. Nonlinear Sci. Numer. Simul. 17(12), 4968–4977 (2012) 57. D. Siljak, D. Stipanovic, A. Zecevic, Robust decentralized turbine/governor control using linear matrix inequalities. IEEE Trans. Power Syst. 17(3), 715–722 (2002) 58. J. Qi, J. Wang, H. Liu, A.D. Dimitrovski, Nonlinear model reduction in power systems by balancing of empirical controllability and observability covariances. IEEE Trans. Power Syst. 32(1), 114–126 (2017) 59. M. Vidyasagar, Nonlinear Systems Analysis (SIAM, Philadelphia, 2002) 60. S. Boyd, L. El Ghaoui, E. Feron, V. Balakrishnan, Linear Matrix Inequalities in System and Control Theory, vol. 15 (SIAM, Philadelphia, 1994) 61. J.H. Chow, K.W. Cheung, A toolbox for power system dynamics and control engineering education and research. IEEE Trans. Power Syst. 7(4), 1559–1564 (1992) 62. S. Julier, J. Uhlmann, H.F. Durrant-Whyte, A new method for the nonlinear transformation of means and covariances in filters and estimators. IEEE Trans. Autom. Control 45(3), 477–482 (2000) 63. J. Hartikainen, A. Solin, S. Särkkä, Optimal filtering with Kalman filters and smoothers, in Department of Biomedica Engineering and Computational Sciences, Aalto University School of Science, 16th August (2011) 64. M. Grant, S. Boyd, CVX: Matlab software for disciplined convex programming. Technical Report (2013)

Chapter 6

Self-Healing PMU Network Against Cyber Attacks

6.1 Introduction In today’s power grid, phasor measurement units (PMUs) are being deployed to monitor the state of a power system in real time (e.g., static and dynamic state estimation, oscillation detection and control, power line outage detection) [1–4]. Based on the NASPInet architecture for a PMU network [5], the measurements collected by multiple PMUs are delivered and combined at a phasor data concentrator (PDC), which further sends the measurements to the next-level PDC or the control center. Off-the-shelf computing and communication technologies are integrated with the intelligent electronic devices (IEDs), including PMUs and PDCs, to boost monitoring and control efficiency. However, this integration opens up new attack vectors: a PMU or a PDC can become the target of cyber-attacks. Recent studies reveal that PMUs or PDCs can suffer different types of cyber-attacks, including denialof-service or man-in-the-middle attacks [6, 7]. To make things worse, the network connections make the further propagation of attacks possible [8]. Consequently, upon detection of attacks, compromised PMUs or PDCs can be disconnected from the communication network. Although quarantine of the compromised devices can prevent further propagation of the attacks, it can significantly reduce the system’s observability (i.e., the capability to estimate the state of each bus in a power system), and thus affect state estimation and other power system applications. Recent work has focused on the impact of compromised PMUs on the observability of power systems; very little work has studied the impact of compromised PDCs. When a PDC is compromised and quarantined from communication networks, it can cause more severe consequences than a single compromised PMU can, as all measurements that the PDC originally collected are lost. However, PMUs that originally report the measurements to the PDC may not be compromised and can still collect trusted measurements. It is possible to reroute these measurements to other PDCs immediately, instead of waiting for the compromised PDC to be fixed. In this chapter, a self-healing mechanism for the PMU network [9] is discussed. It exploits the feature of dynamic and programmable configuration enabled by © Springer Nature Switzerland AG 2023 J. Qi, Smart Grid Resilience, https://doi.org/10.1007/978-3-031-29290-3_6

125

126

6 Self-Healing PMU Network Against Cyber Attacks

software-defined networking (SDN) technology. When a group of PMUs or PDCs is disconnected, either by accidents or because of a cyber-attack, logical connections between uncompromised PMUs and PDCs are rearranged in order to restore the observability of the power system. The logical reconnection is mapped into the configuration of network switches, which establish new communication paths to deliver PMU measurements. After the reconnection, state estimation and other power system applications can resume working.

6.2 Motivation for Self-Healing PMU Network Consider a power system that relies on a PMU network to perform state estimation. For each substation, assume a single, logical PMU is installed. This PMU can collect data on the state of the local substation (i.e., voltage magnitude and phasor angle). When more PMUs are deployed, a PDC is used to collect measurements from several substations and forward them to the next-level PDC or the control center. For deployment of communication networks, consider the case in which PMUs and PDCs are connected via an Internet protocal (IP) based network. Even though in many of today’s utility substations, PMUs and PDCs may still be connected through proprietary communications (e.g., serial links), the current trend suggests that the deployment of IP-based networks in power systems is growing; research experiments are already being performed under this assumption [7, 10]. As shown in Fig. 6.1, since PMUs are deployed on substations that are distributed over the whole power system, PMUs and PDCs can be connected by a wide-area network (WAN). In a WAN, network traffic is manipulated by routing and forwarding rules configured in each network switch. At the perimeter of a WAN, PMUs and PDCs are first connected to edge switches (as highlighted in Fig. 6.1). Assume that each PDC is connected with a single edge switch. The edge switches can further connect to the core switches, which are positioned within the backbone of the WAN. Fig. 6.1 The integration of a communication network and PMU network. © [2018] IEEE. Reprinted, with permission, from [9]

6.2 Motivation for Self-Healing PMU Network

127

Use of advanced communication network technology, e.g., SDN, can bring both benefits and risks for a power system environment [11]. On the one hand, the communication infrastructure allows attacks to easily propagate to other PMUs. As a result, attackers can gain access to more measurements simultaneously when performing cyber-attacks (e.g., false data injection attacks [12, 13]). On the other hand, programmability enabled by SDN can quickly isolate the compromised PMUs or PDCs and reroute the remaining devices to “self-heal” the PMU network and recover the system observability. Assume the cyber-attacks on the PMUs and PDCs have already been detected. System administrators can perform those detections by using security mechanisms, such as intrusion detection systems, designed for power grids. The detection of cyber-attacks in Supervisory Control and Data Acquisition (SCADA) systems, which include PMU networks, has been extensively studied in the literature, e.g., [14–18]. These methods utilize the information in communication networks or the power system’s physical models to detect anomalies and intrusions. In an IP-based network, malware at compromised PMUs or PDCs can infect other devices through network connections [19, 20]. As suggested by a report from the National Institute of Standards and Technology (NIST) [21], after system administrators detect compromised devices, they can place temporary restrictions on network connectivity of those devices to prevent further propagation of the attacks. In practice, system administrators can disconnect the compromised devices by removing routing rules in network switches connected to them; thus, network traffic initiated from the compromised devices can no longer reach any other devices. When PMUs or PDCs are compromised and disconnected, the consequences can vary. As shown in Fig. 6.2, when system administrators detect that a PMU has been compromised (denoted by “X”), they disconnect it. The measurements collected by the PMU are lost, which can impact the observability of the power system. Likewise, when a PDC is detected as compromised and disconnected, all measurements that it originally collected are lost. However, the PMUs that originally reported the measurements to the PDC may not be compromised. In that case, we regard the measurements from the disconnected yet uncompromised PMUs, denoted by “?” in Fig. 6.2, as trusted, unless they are directly compromised. These trusted measurements can be used in power systems’ applications, e.g., state estimation, if the PMUs into other remaining uncompromised PDCs can be rerouted. Fig. 6.2 Device condition after attacks. © [2018] IEEE. Reprinted, with permission, from [9]

128

6 Self-Healing PMU Network Against Cyber Attacks

The concept of self-healing has previously been proposed for virtual circuit switching networks, such as the asynchronous transfer mode (ATM) network [22, 23]. When a link or node failure happens, the self-healing algorithms try to recover as many lost services as possible under the resource constraint of network switches. In this network environment, the self-healing is performed on predetermined backup or protection paths [24]. To achieve self-healing in PMU networks, the programmability enabled by SDN is utilized to reroute the remaining uncompromised PMUs and PDCs, and thus recover the observability of a power system after a cyber-attack. The top priority of a PMU network is to collect measurements from substations under the real-time communications requirements. For that purpose, the existing self-healing algorithms proposed for general-purpose networks (e.g., the Internet) are not suitable for a WAN deployed in a PMU network [22], as explained below: • First, the optimization objectives of the existing self-healing algorithms are different from those of PMU networks. The existing self-healing algorithms focus on maximizing the connections of end hosts. In the PMU network, however, the top priority is to restore measurements of the voltage phasor at each substation. Because the voltage phasor at a substation can be measured by the PMU deployed at the substation and also the PMU deployed at its neighbor substations, restoring all lost measurements is not equivalent to restoring all disconnected PMUs. Instead, the self-healing algorithm needs to selectively reconnect PMUs, which can restore the observability of power systems more quickly than reconnecting all PMUs. • Second, the performance requirements of general communication networks are different from those of PMU networks. The existing self-healing algorithms put more priority on maintaining the network performance, e.g., throughputs or communication latency, than on the availability of transmitted data. Consequently, those algorithms always select the shortest path to reconnect nodes. In contrast, PMU networks put the availability of phasor measurements at higher priority than the network performance. An existing self-healing mechanism that reconnects PMUs with the shortest paths often ends up spending a long time to reconnect PMUs. Consequently, the self-healing algorithm in this chapter will minimize the time to restore the observability of power systems and maximize the redundancy of measurements to provide more accurate estimation of system state. • Third, the existing algorithm does not consider the constraints in the physical infrastructure of power systems. In PMU networks, the number of PMUs that can be connected to a PDC is limited by the computation capability and storage space of the PDC. These constraints can impact the paths selected to reconnect PMUs. The self-healing algorithm here will consider the constraints of both the cyber and physical infrastructure in power systems. Therefore, a self-healing PMU network is needed, which should reduce the performance overhead of reconfiguring the communication network and increase

6.3 System Model

129

the observability of power systems, taking into consideration resource constraints on PDCs and network switches. Specifically, the focus is on how to reroute the disconnected but uncompromised PMUs into uncompromised PDCs without changing the connections of the remaining PMUs.

6.3 System Model The topology of a power transmission network is described by a graph .Gt (V , L ), where .V denotes the set of buses and .L denotes the set of transmission lines. Assume the system observability is achieved by the measurements from PMUs, and let .U denote the set of buses that have PMUs installed, so .U ⊆ V . The IPbased PMU communication network consists of PMUs (also denoted by the set .U ), PDCs (denoted by the set .D), and network switches (denoted by the set .S ). Thus, the topology of communication networks for delivering PMU measurements is represented by a graph .Gp (S ∪ U ∪ D, E ), where .E denotes the set of network links connecting PMUs, PDCs, and network switches. A PMU network is a cyber-physical system. From cyber systems’ perspective, a communication network should make sure that measurements from PMUs can be delivered to PDCs and the control center; from physical systems’ perspective, the PMUs should make sure that the whole power system is observable, so that state estimation and other advanced power system applications can be performed. These cyber-physical features are integrated into the design of the self-healing mechanism for PMU networks. These features differentiate this algorithm from the existing self-healing schemes, which emphasize the maintenance of network performance and reconnection of end hosts in general communication networks [22–24].

6.3.1 Power System Observability When a PMU is installed at bus .i ∈ U , the voltage phasor at bus i and current phasor of all branches connected to it can be measured. The observability function of bus i is defined as a function of a PMU location:  .Oi = ai,j xj , (6.1) j ∈U

where .xj is a binary variable that is equal to 1 if a PMU is installed at bus j and 0 otherwise. .ai,j is the connectivity parameter, defined as:  ai,j =

.

1, if i = j or (i, j ) ∈ L 0, otherwise.

(6.2)

130

6 Self-Healing PMU Network Against Cyber Attacks

Oi ≥ 1 implies that bus i is observable, as the voltage phasor at bus i can either be measured by the PMU at bus i or be calculated by PMUs at neighbors of bus i (e.g., the buses connected through transmission lines). The power system is observable if the observability function .Oi for each bus is greater than or equal to one, i.e.,

.

Oi ≥ 1, ∀i ∈ V .

.

(6.3)

With the disconnection of some PMUs due to cyber-attacks, the observability function .Oi at some buses may become 0; thus, the system is no longer observable. However, by utilizing the reconfiguration features enabled by SDN, it is possible to reconnect some disconnected yet uncompromised PMUs to the communication network to restore the system observability.

6.3.2 Rules in Network Switches The reconnection of disconnected yet uncompromised PMUs can be achieved by adding rules in network switches. In a communication network, the switch can include rules that specify both routing policies and endpoint policies [25]. Given a packet entering the network, the routing policy specifies the path that the network packet should take to reach its destination. The path is often expressed as a chain of ordered network switches. To implement a routing policy, a forwarding rule is added in each switch on the path, to direct the network packet to the appropriate following stop. A forwarding rule in a switch usually is uniquely decided by destination addresses. In other words, a path always corresponds to a unique destination. An endpoint policy often defines the access control between two hosts. In other words, the policy specifies whether or not host A can communicate with host B, regardless of what path the communication should follow (which is decided by the routing policies). As explained in [25], the endpoint policy often “views the network as one big switch that hides internal topology details” and “specifies which packets to drop, or to forward to specific egress ports, as well as any modifications of header fields.” In the PMU network, an endpoint policy specifies to which PDC a PMU measurement should be delivered. Unlike the routing policy, the endpoint policy is implemented once along the path that the packet travels. For example, in the network topology shown in Fig. 6.3, we want to deliver the measurements from PMU 1 to PDC 1 through path 1. The routing policy destined for PDC 1 is implemented by adding a forwarding rule in each switch of path 1 (i.e., switches 1, 2, and 5). However, not all packets destined for PDC 1 and traveling through this path are from PMU 1. To ensure that the measurements from PMU 1 are delivered to PDC 1 via path 1, we need to add a rule for this endpoint policy along the path. This rule can be implemented only once in switch 1, 2, or 5 before packets reach PDC 1.

6.4 Optimization Formulation

131

Fig. 6.3 An example network connection. © [2018] IEEE. Reprinted, with permission, from [9]

6.4 Optimization Formulation The self-healing mechanism for PMU networks is modeled as an integer linear programming problem. To better illustrate the optimization formulation, in addition to the parameters defined in Sect. 6.3.1, the key notations are listed in Table 6.1. Four groups of integer variables are defined in the optimization model: • Variable .xi indicates whether the PMU at bus .i ∈ U is connected, regardless of which path in the communication network .Gp it has taken and to which PDC it is connected. • Variable .yp specifies whether path .p ∈ P is used to achieve the reconnection, regardless of which PMUs and PDCs use this path for reconnection. Instead of considering the shortest path, all paths that can be used to reconnect PMUs are considered, as long as the delivery of PMU measurements over that path satisfies the timing requirement. Finding all paths between two nodes is regarded as an “all simple paths” problem; a “depth-first search” is used to search all paths that can be used for reconnections. Although there is no efficient algorithm to solve this problem, all paths can be found before running the optimization model at runtime. p • Variable .zs , as in [25], specifies the number of endpoint policies allocated for path p and assigned in switch s. Since a switch can be shared by different paths, a subscript p is used for each switch to distinguish the endpoint policies used for different “PMU-PDC” connections. • Variable .wsd indicates whether switch s contains a forwarding rule destined to PDC d. Because a forwarding rule is indexed by the destination address, a switch can include forwarding rules destined for different PDCs. Thus a subscript d is used to distinguish the forwarding rules assigned for different destination PDCs. To better understand these decision variables, an example is provided in Fig. 6.3. If we assume that three PMUs at three buses (i.e., buses 1, 2, and 3) are reconnected, while PMU at bus 4 is not, we have .x1 = x2 = x3 = 1 while .x4 = 0. In addition, if we use two paths (i.e., paths 1 and 2) for reconnection, we have .y1 = y2 = 1. In addition, assume the PMU at bus 1 is reconnected to PDC 1, while the PMUs at bus 2 and 3 are reconnected to PDC 2, as illustrated by lines with different patterns in Fig. 6.3. To appropriately deliver measurements to the right destination, the forwarding rules destined for PDC 1 are added to the switches of path 1, which

132

6 Self-Healing PMU Network Against Cyber Attacks

Table 6.1 Summary of notations Sets and indices: Set of buses with PMUs installed. .Ub Set of buses with compromised PMUs. .Ug Set of buses with uncompromised PMUs, .U = Ub ∪ Ug . .Ud Set of buses with uncompromised but disconnected PMUs, .Ud ⊂ Ug . .D Set of PDCs used in the PMU network. .Dg Set of uncompromised PDCs. .S Set of network switches in a communication network. .P Set of paths in the communication network, denoted by graph .Gp , that can be used to deliver measurements from a PMU to a PDC with the real-time requirements observed. p .U Set of buses with disconnected yet uncompromised PMUs that can be reconnected by path p. p .D Set of uncompromised PDCs that are connected by path p. i .P Set of paths that can be used to connect a disconnected yet uncompromised PMU at bus .i ∈ Ud . d .P Set of paths that are connected to a remaining PDC .d ∈ Dg . p .S Set of network switches in path p. .Vz Set of zero injection buses in the power system. .i, j, k Indices of buses. d Index of PDCs. s Index of network switches in set .S . p Index of paths in set .P. Parameters: .Cd Number of additional PMUs to which a PDC .d ∈ Dg can connect. .Rs Number of additional rules that switch .s ∈ S can accommodate. Decision variables: .xi Binary variable that equals 1 when the PMU at bus i is connected, and 0 otherwise. .yp Binary variable that equals 1 when path .p ∈ P is selected to reconnect PMUs, and 0 otherwise. p .zs Integer variable to specify the number of endpoint rules that are allocated for path p and are implemented in switch s. d .ws Binary variable that equals 1 when a forwarding rule destined for the PDC d is added in network switch s, and 0 otherwise. .U

consists of network switches 1, 2, and 5. Similarly, a forwarding rule destined for PDC 2 is added to the switches of path 2, namely switches 1, 3, 4, and 5. Consequently, we have .w11 = w21 = w51 = 1 and .w12 = w32 = w42 = w52 = 1. An endpoint policy is decided by the connection between PMUs and PDCs. There are three rules for the endpoint policy in this example, which are three specific “PMU-PDC” connections: (PMU 1, PDC 1), (PMU 2, PDC 2), and (PMU 3, PDC 2). For each connection, we need to add a rule once along the path that established the connection. For example, if we add an endpoint policy that specifies a connection between PMU 3 and PDC 2, we can add the endpoint policy at switch 5 of path 2,

6.4 Optimization Formulation

133

which is established by routing policies. In that case, we should increase the value of variable .z52 by one. With those notations, the constraints and objectives of the integer linear programming (ILP) formulation for the self-healing scheme are described as follows.

6.4.1 PMU Connection Status Constraints (PCSCs) After the disconnection of compromised PMUs and PDCs in response to cyberattacks, the PMU connection status needs updates. Let .Ub denote the set of buses with compromised PMUs. Because these PMUs will remain disconnected, decision variable .xi for .i ∈ Ub should be set to 0, i.e., xi = 0, ∀i ∈ Ub .

.

(6.4)

In addition, let .Ug denote the set of buses with uncompromised PMUs, and Ud denote the set of buses with uncompromised and disconnected PMUs. As the remaining connected and uncompromised PMUs (denoted by set .Ug \ Ud ) will remain unchanged, we have the following constraints:

.

xi = 1, ∀i ∈ Ug \ Ud .

.

(6.5)

The PMUs in set .Ud can be reconnected by the self-healing scheme as the actual decision variables; thus, the following constraints apply: xi = {0, 1}, ∀i ∈ Ud .

.

(6.6)

6.4.2 Power System Observability Constraints (PSOCs) The constraints in (6.1)–(6.3) specify the observability function .Oi at bus i as a function of the topology of the power system’s transmission network and PMU locations. The considered power system is observable when .Oi ≥ 1 for all buses. If there are zero injection buses, i.e., no generation or load units are connected to them, the voltage phasor at certain buses (e.g., those zero injection buses or their neighbors in transmission networks) can be calculated indirectly by applying Kirchhoff’s Current Law (KCL). Consequently, the number of PMUs needed to make sure that the system is observable can be reduced. The formulation in [26] that considers zero injection buses is applied. For each zero injection bus k, a linear relation between its voltage phasor and the phasor of its neighbor can be obtained based on KCL as:

134

6 Self-Healing PMU Network Against Cyber Attacks

 .

Yk,j V¯j = 0,

(6.7)

j ∈V

where .V¯j is the voltage phasor at bus j , and .Yk,j is the .k − j th entry of the admittance matrix of the power system. A group of equations handling zero injection buses forms a system of linear equations, from which some voltage phasors can be calculated to make the corresponding buses observable. As shown in [26], the observability function .Oi with zero injection buses considered can be modeled as: Oi =



.



ai,j xj +

j ∈U



ai,k vi,k , ∀i ∈ V .

(6.8)

k∈Vz

ai,k vi,k = 1, ∀k ∈ Vz.

(6.9)

ai,k vi,k ≤ 1, ∀i ∈ V ,

(6.10)

i∈V



k∈Vz

where .Vz denotes the set of zero injection buses, and the auxiliary binary variable vi,k is defined so that .vi,k = 1 implies that calculation of the voltage phasor at bus i is assigned to the equation associated with zero injection bus k. The constraints (6.9) and (6.10) guarantee the solvability of the group of equations corresponding to zero injection buses. The detailed derivation of (6.8)–(6.10) can be found in [26]. The power system observability constraints can be modeled as (6.2)–(6.3), and (6.8)– (6.10).

.

6.4.3 PDC Connection Space Constraints (PSCs) The disconnected PMUs can be reconnected only to PDCs with sufficient connection spaces. The connection space of a PDC is defined as the maximum number of PMUs it can concentrate. This parameter can be found in the specification of the PDC, e.g., SEL-3373 PDC can concentrate up to 40 PMUs [27]. This constraint can be modeled as follows:   . (yp xi ) ≤ Cd , ∀d ∈ Dg , (6.11) p∈P d

i∈U p

where .Cd denotes the additional PMUs that PDC d can connect. In the constraint, set .U p includes all disconnected yet uncompromised PMUs that can be reconnected by path p; this set is constructed by including all PMUs that are connected by the edge switch at the ending node of path p. Furthermore, set .P d specifies all paths that are connected to the remaining PDC .d ∈ Dg .

6.4 Optimization Formulation

135

If path p is selected for the reconnection, .yp is set to 1. The innermost summation on the left side of constraint (6.11) is used to calculate the number of PMUs that path p can reconnect. Note that in the IP-based communication network, the routing path is uniquely decided by its destination address. A PDC can be used by different paths to connect different PMUs. When multiple paths connected to PDC d are selected, the outermost summation calculates all PMUs that can be reconnected by those paths. Constraint (6.11) ensures that the remaining connection spaces of PDC d can satisfy all those disconnected PMUs. Obviously, constraint (6.11) is a nonlinear constraint. To lower the computational complexity, constraint (6.11) is reformulated to linear forms in (6.12)–(6.14) using the big-M method with an auxiliary variable .tp which represents the number of PMUs that path p can reconnect if the path is selected.  When .yp = 1, constraints (6.12)–(6.13) become an equality constraint, i.e., .tp = i∈U p xi . On the other hand, if .yp = 0, the big number M ensures that the left side of constraint (6.12) is a nonpositive number. Consequently, constraints (6.12)–(6.13) are equivalent to  another inequality, i.e., .0 ≤ tp ≤ i∈U p xi . The optimization makes .tp equal to 0, to let the PDC’s connection spaces .Cd accommodate more PMUs via other paths. Consequently, the value of M is set to be .|Ud |, where .| · | denotes the number of elements in a set. In other words,  the value of M is set to be the total number of disconnected PMUs. Because . i∈U p xi < |Ud |, setting M with that value ensures that the left side of constraint (6.12) is a nonpositive number. .



− (1 − yp ) · M +

xi ≤ tp ≤

i∈U p



xi .

(6.12)

i∈U p

tp ≥ 0, ∀p ∈ P.  tp ≤ Cd , ∀d ∈ Dg

(6.13) (6.14)

p∈P d

6.4.4 PMU Reconnection Constraints (PRCs) When a PMU is selected for reconnection, at least one path must be selected to reconnect it to a PDC, i.e.,  .xi ≤ yp , ∀i ∈ Ud . (6.15) p∈P i

In constraint (6.15), set .P i includes any path that can be used to connect a disconnected PMU at bus i. When the PMU at bus i is reconnected, the constraint guarantees that at least one path can connect this PMU to a PDC.

136

6 Self-Healing PMU Network Against Cyber Attacks

6.4.5 Switch Rule Space Capacity Constraints (SRSCCs) There should be sufficient space for each network switch to add the rules of both endpoint and routing policies; this requirement is modeled as the following constraints:  p  . wsd + zs ≤ Rs , ∀s ∈ S . (6.16) d∈Dg

p∈P

wsd ∈ {0, 1}, ∀s ∈ S , ∀d ∈ Dg. p

zs ≥ 0, ∀s ∈ S , ∀p ∈ P,

(6.17) (6.18)

  p where . d∈Dg wsd + p∈P zs is the total number of rules added to switch s.

6.4.6 Routing Policy Constraints (RPCs) When a PDC is used to reconnect PMUs, a routing rule should be added to each network switch in the path that connects the PMUs to the PDC, i.e., .

wsd ≥ yp , ∀p ∈ P, ∀d ∈ D p , ∀s ∈ S p.

(6.19)

yp ∈ {0, 1}, ∀p ∈ P.

(6.20)

Constraint (6.19) is specified for each network switch in the path p, i.e., .s ∈ S p , where .S p denotes the set of network switches in path p. The set .D p includes PDCs that can be connected through path p. If path p is selected to connect PMUs, a forwarding rule will be added in each switch in p that is used to transmit network packets destined for the corresponding PDCs. If the path is not selected, constraint (6.19) is equivalent to constraint (6.17).

6.4.7 Endpoint Policy Constraints (EPCs) The total number of rules for the endpoint policy should be equal to the total number of PMUs that are reconnected, i.e.,    p . xi = zs . (6.21) i∈Ud

s∈S p∈P

Constraint (6.21) specifies that if we choose to reconnect a PMU, we need to add one rule in the endpoint policy to specify this connection.

6.4 Optimization Formulation

137

6.4.8 Optimization for Self-Healing Mechanism After certain PMUs and PDCs are disconnected in response to cyber-attacks, the self-healing mechanism first checks the system observability by calculating the observability function .Oi at each bus, according to (6.1)–(6.2) for a system without zero injection buses, or according to (6.8)–(6.10) and (6.2) for a system with zero injection buses. Depending on system observability, the self-healing mechanism is divided into two stages. The first stage aims at recovering the observability of the power system by a PMU network as quickly as possible, while the second stage recovers the remaining disconnected PMUs to increase redundancies of PMU measurements and thus more accurately estimate system states.

6.4.8.1

Stage 1: Recover System Observability

If the remaining connected PMUs (denoted by set .Ug \ Ud ) cannot make the system observable, i.e., .Oi ≥ 1, ∀i ∈ V is not satisfied, but the disconnected yet uncompromised PMUs (denoted by set .Ud ) would make the system observable, then the self-healing mechanism will enter Stage 1. At this stage, the time to recover system observability is minimized. The total number of rules that all network switches need to modify is used to estimate the time to reconfigure network switches in order to reconnect PMUs. The number of rules in network switches plays an important role in the performance of both traditional and SDN-enabled networks. Ways to reduce or compress the size of network rules have been an active research area for almost a decade. The authors of [28, 29] propose algorithms to reduce and compress the number of rules stored in a single switch. In later work, [25, 30, 31] focus on how to distribute rules further in optimal locations in order to save storage space in an SDN-enabled network. In the scenario that recovers the observability of power systems, the number of rules to add can impact the performance of PMU networks as well. In a traditional communication network or a more advanced network environment that uses SDN technology [11], modifying a rule in a network switch requires the system operator to interact with the switch and make the corresponding configuration. The roundtrip time of this communication in a wide-area network can take up to hundreds of milliseconds [32]. In PMU networks that perform real-time monitoring on system states, optimizing the configuration time by reducing the number of switch rules can help improve performance. Consequently, the optimization at stage 1 can be formulated as: .

minp

xi ,yp ,zs ,wsd

  p∈P s∈S

p

zs +

  d∈Dg s S

s.t. PCSC : (6.4) − (6.6)

wsd

(6.22)

138

6 Self-Healing PMU Network Against Cyber Attacks

PSOC : (6.2) − (6.3), (6.8) − (6.10) PSC : (6.12) − (6.14) PRC : (6.15) SRSCC : (6.16) − (6.18) RPC : (6.19) − (6.20) EPC : (6.21), which is an ILP problem. The objective of the self-healing mechanism at this stage is to minimize the number of rules for both the routing and the endpoint policies, in order to restore the disconnected yet uncompromised PMUs to the uncompromised PDCs.

6.4.8.2

Maximize System Observability

If the power system becomes observable after Stage 1, the self-healing mechanism will enter Stage 2 to continue recovering the remaining disconnected PMUs. Note that at the beginning of the self-healing process, if the power system is already observable with the remaining connected PMUs (denoted by set .Ug \ Ud ), or if reconnection of all disconnected yet uncompromised PMUs fails to make the system observable, the self-healing mechanism will omit the optimization at Stage 1 and directly enter Stage 2. At this stage, the aim is to improve the observability of the power system. The minimum observability function of all buses, i.e., .mini∈V Oi , is used to quantify the observability function of the entire system. The objective at this stage can be formulated as: .

max

min Oi ,

p xi ,yp ,zs ,wsd i∈V

(6.23)

where all constraints except (6.3) in model (6.22) are still applied; constraint (6.3) is removed because the system observability either cannot be satisfied or has already been satisfied. The set defined in Table 6.1 needs to be updated according to the decisions made at Stage 1. Through introduction of an auxiliary variable O, the max-min optimization problem in (6.23) can be reformulated as: .

max p

O

xi ,yp ,zs ,wsd

s.t. O ≤ Oi , ∀i ∈ V , all constraints in model (6.22) except (6.3), which is also an ILP problem.

(6.24)

6.6 Case Study on IEEE 30-Bus System

139

6.5 Greedy Heuristic Algorithm In the ILP model, the number of variables is on the scale of .O(|Ud | + |Dg | + |P| × |S |). Consequently, ILP solvers suffer from its dimensionality, especially when a power system and underlying communication network increase in size. In that case, solving the problem can introduce a long delay; the slow response to the attacks can result in the damage to PMU networks. Algorithm 6.1 is a greedy heuristic algorithm that can be used to find reroutes for disconnected PMUs one by one instead of finding a global optimal reroute for all PMUs. As shown in Step 3 in Algorithm 6.1, the heuristic algorithm always selects the PMU on the bus with the largest degree in power systems’ transmission networks (i.e., the bus that has the largest number of neighbors in the transmission networks), to make the system observable by reconnecting a small number of PMUs. In Steps 4–28, among paths with sufficient hardware resources (e.g., rule spaces in network switches and connection spaces in PDCs), the heuristic selects the path with the smallest latency, as specified by latency(p) in Algorithm 6.1, to deliver measurements. In practice, the latency of delivering measurements can be measured by different metrics, such as the number of switches in a path or round-trip times. Unlike the ILP model, Algorithm 6.1 considers the different optimization objectives at Stages 1 and 2 together. In the heuristic, Steps 29–31 mark the end of Stage 1 (i.e., observability recovery), and the heuristic continues to select disconnected PMUs until all of them have been reconnected.

6.6 Case Study on IEEE 30-Bus System The IEEE 30-bus system is used to demonstrate how the self-healing mechanism reconnects compromised and disconnected PDCs and PMUs. Based on the topology of transmission networks, the topology of the communication network of a power system is constructed. First, the minimum set cover problem is employed to find a set of substations that cover all transmission lines [33, 34]. The consequence is that substations are classified into different sets. A network switch is assigned for each set as an edge switch, which connects PMUs deployed among all substations in each set. To each edge switch, enough PDCs are added to combine all measurements of the PMUs that are connected to the same switch. The communication network of the IEEE 30-bus system is shown in Fig. 6.4. White rectangles represent edge switches. In each edge switch, the index of the PMU and PDC that the switch connects in a format of (PDC#: PMU#, .. . . ) is specified. In this network, there are 16 edge switches that connect PMUs in substations and 4 core switches that form a mesh backbone network; the edge switches are evenly distributed among the core switches. Figure 6.4a demonstrates how the self-healing mechanism (implemented in the ILP model) and the greedy heuristic algorithm work. Consider PDCs 1, 6, and 13

140

6 Self-Healing PMU Network Against Cyber Attacks

Algorithm 6.1 Greedy Heuristic Algorithm 1: while Ud is not empty do 2: CandidatePDCs = {}, CandidatePaths={}, Routes={} 3: Select i ∈ Ud with the largest degree in transmission networks 4: for all d ∈ Dg do 5: if d has connection spaces then 6: Put d in CandidatePDCs 7: end if 8: end for 9: for all d ∈ CandidatePDCs do 10: Find the communication path p between d and i with the smallest latency, calculated by latency(p) 11: if latency(p) is shorter than the latency requirement of delivering PMU measurements then 12: Put p in CandidatePaths 13: else 14: Remove d from CandidatePDCs 15: end if 16: end for 17: while CandidatePaths is not empty do 18: Select p of smallest latency(p) from CandidatePaths 19: if p has no rule space then 20: Remove p from CandidatePaths 21: else 22: Put p in Routes; break; 23: end if 24: end while 25: Remove i from Ud 26: Connect i through p 27: Set routing policy in each switch in p 28: Set endpoint policy in the first switch in p that has available space 29: if power grid becomes observable then 30: Continue; // mark the end of Stage 1 31: end if 32: end while

are compromised by attackers. Consequently, measurements from PMUs 1, 2, 4, 10, 17, 20, 21, 25, and 26 cannot be delivered to the control center, even though none of them is compromised. The ILP problem is formulated in the OPTI toolbox, which gives a solution that reconnects PMUs 1 and 10 to PDC 8 (shown by dotted arrows). Such reconnections need two rules to specify the endpoint policy that grants their access to the PDC 8. Meanwhile, a forwarding rule destined for the PDC 8 are added to switches 1, 6, 7, 18, and 8. As a result, there are a total of seven rules to configure. Figure 6.4b shows that the greedy heuristic algorithm can introduce a different and nonoptimal result for the same case. Also, this case demonstrates the procedure of the greedy heuristic algorithm in Algorithm 6.1. • Step 3: All disconnected PMUs are ranked based on their degrees in transmission networks (i.e., the number of neighbor substations that each PMU has); the

6.6 Case Study on IEEE 30-Bus System

141

Fig. 6.4 Reconnection of PMUs in a communication network for IEEE 30-bus system with (a) ILP model and (b) greedy heuristic algorithms. © [2018] IEEE. Reprinted, with permission, from [9]

heuristic tries to reconnect PMUs one by one based on this order. Because a PMU can measure a voltage phasor not only at the substation where it is deployed but also at its neighbor substations, we select PMUs in that order will restore the observability of the power system as quickly as possible. Specifically in the attack case considered in Fig. 6.4, we select PMUs in the following order: PMUs 4, 10, 2, 5, 1, 17, 20, 21, and 26. When we select a PMU, we find it a PDC (Steps 4– 8) and a path (Steps 9–16). Without loss of generality, we describe the following steps to reconnect PMU 4, which is the first PMU to reconnect in this attack case. • Steps 4–8: When we select PMU 4 for reconnection, we find all remaining PDCs (i.e., all PDCs except for PDCs 1, 6, and 13) that have sufficient connection space for this PMU. We regard all those PDCs as candidate PDCs that can reconnect the PMU (stored in a set CandidatePDC in Algorithm 6.1). • Steps 9–16: For each PDC in CandidatePDC, we find the shortest path (i.e., with the shortest round-trip time) between the PDC and PMU 4. If the round-trip time of this path meets the requirement of being able to deliver PMU measurements, we regard this path as a candidate path. We put all candidate paths in a set, i.e., CandidatePaths in Algorithm 6.1. To simplify discussions, we assume that the

142

6 Self-Healing PMU Network Against Cyber Attacks

round-trip times on all network links are the same; the round-trip time of a path can be quantified by the number of links or the number of switches in a path. • Steps 17–24: If set CandidatePaths is not empty, we select a path from the set that has the shortest round-trip time to reconnect PMU 4. For example, we can select PDC 2 to reconnect PMU 4; the path contains only three switches, i.e., Switch 1, Switch 17, and Switch 2. • Steps 29–31: After reconnecting a PMU, we check whether the power system has become observable or not. If the system has become observable, we continue to Stage 2 of the optimization to achieve more redundancy of PMU measurements.

6.7 Performance Evaluation of Stage 1 To show the effectiveness and optimality of the self-healing mechanism, including both the ILP model and the greedy heuristic algorithm, we compare them against a baseline method on both IEEE 30-bus and IEEE 118-bus test systems. The baseline method is based on self-healing mechanisms for traditional communication networks that intend to recover as many disconnected hosts or links as possible, which is different from the recovery of observability of a power system. In this baseline method, we randomly reconnect PMUs with the shortest paths to any PDCs that have sufficient connection space. The self-healing mechanism is implemented in Mininet, a software platform that simulates SDN-enabled communication networks, to demonstrate the procedures for self-healing PMU networks against the attack case considered in Fig. 6.4. The solver in the OPTI Toolbox is used to solve the ILP problem in Sect. 6.4 [35]. The heuristic algorithm presented in Sect. 6.5 is implemented in MATLAB. All experiments are performed on a 64-bit desktop with two Intel Core I7 3.6 G processors and 16 GB of RAM.

6.7.1 Impact of Scale of Attacks Assume each PDC can combine measurements from up to 40 PMUs, which is the connection space of the SEL 3373 [27]. Each network switch can contain up to 1000 forwarding or routing rules, which is in line with the experiment settings in [25]. Figure 6.5 shows the impact of the scale of attacks on the number of rules added to recover the observability of power systems. The horizontal axis specifies the number of compromised PDCs. In each case, the compromised PDCs are randomly selected and the experiment is performed for 500 times. The vertical axis specifies the average number of network rules that are added to reconnect PMUs. Bars of different shades of gray are used to represent the results obtained by the ILP problem, the greedy heuristic algorithm, and the baseline method with

6.7 Performance Evaluation of Stage 1

(a)

143

(b)

Fig. 6.5 Comparison of the number of network rules to add for (a) IEEE 30-bus system and (b) IEEE 118-bus system. © [2018] IEEE. Reprinted, with permission, from [9]

(a)

(b)

Fig. 6.6 Comparison of execution time for (a) IEEE 30-bus system and (b) IEEE 118-bus system. © [2018] IEEE. Reprinted, with permission, from [9]

a 95% confidence interval. Obviously, if attackers compromise more PDCs, more rules should be added in network switches to reconnect PMUs. The greedy heuristic algorithm performs better when a small number of PDCs are compromised in both IEEE 30-bus and IEEE 118-bus systems. The worst case happens when 8 PDCs are compromised in the 30-bus system; the greedy heuristic algorithm needs to add 5 to 6 more rules, on average, to recover the observability of the power system. The baseline method performs much worse than the greedy heuristic, as it can take a long time to reconnect enough PMUs to recover the observability. Figure 6.6 compares the execution time of the ILP model, the greedy heuristic algorithm, and the baseline method. The execution time of the ILP model is specified by the major vertical axis, while the execution times of the other two algorithms are specified by the secondary vertical axis. Because the execution times vary significantly for the compromise of different PDCs, the 50 largest execution times are selected for those three methods and their averages are shown in Fig. 6.6. Because the number of compromised PDCs impacts the size of the search space in the ILP model, the execution time to solve the ILP model (represented by the light grey line in both sub-figures) increases slightly with the number of compromised PDCs.

144

6 Self-Healing PMU Network Against Cyber Attacks

On average, the greedy heuristic algorithm can reduce the execution time by approximately two orders of magnitude. In the greedy heuristic algorithm, however, we reconnect PMUs one by one, starting from the one on the bus with most neighbors in transmission networks. The number of reconnected PMUs is not directly related to the number of compromised PDCs, and the execution time does not change significantly. The baseline method randomly selects PMUs for reconnection and can take a long time to restore the observability of power systems. The execution time of these methods scales differently with the size of the power system. Because there are more paths to reconnect PMUs in the IEEE 118bus system, the search space in the ILP models increases dramatically, which can increase the execution time of the ILP model by more than ten times. The execution times of the greedy heuristic algorithm and the baseline method do not change significantly as these algorithms select PDCs that can connect PMUs with the shortest distance. For the communication networks of the IEEE 30-bus and IEEE 118-bus systems, the disconnected PMUs are often reconnected to PDCs in their neighbors based on the greedy heuristic algorithm and the baseline method.

6.7.2 Impact of Hardware Resources We evaluate the impact of hardware resources, i.e., rule spaces in communication networks and connection spaces (the number of PMUs to which a PDC can connect), on the performance of the self-healing mechanisms. In practice, these resources can be shared with other devices in addition to being assigned to PMUs and PDCs. For the simulated communication network, the size of the rule space of each switch is set to be between 5 and 10 and the number of PMUs to which each PDC can connect to be between 3 and 8. For each value set for those two parameters, 8 compromised PDCs are randomly selected and experiments are performed for 500 times. Figures 6.7 and 6.8 show how solutions obtained by the ILP model and the greedy heuristic are affected. In both cases, because there are insufficient resources, some

(a)

(b)

Fig. 6.7 Comparison of the numbers of added network rules with impact of limited network resources. (a) IEEE 30-bus system. (b) IEEE 118-bus system. © [2018] IEEE. Reprinted, with permission, from [9]

6.7 Performance Evaluation of Stage 1

(a)

145

(b)

Fig. 6.8 Comparison of the numbers of added network rules with impact of limited PDC connection space. (a) IEEE 30-bus system. (b) IEEE 118-bus system. © [2018] IEEE. Reprinted, with permission, from [9]

disconnected PMUs need to reroute to PDCs distributed at different substations; that increased the number of forwarding rules added to switches. Compared to the results shown in Fig. 6.5 (when 8 PDCs are compromised), the solutions obtained from the ILP model at Stage 1 can increase by more than three times due to the increased number of forwarding rules in switches. The heuristic algorithm and the baseline method perform worse and give solutions which increase by more than six times. As shown in Figs. 6.7 and 6.8, when hardware resources, i.e., rule spaces in switches or connection spaces in PDCs, become limited, the greedy heuristic algorithm performs better in the IEEE 118-bus system than in the IEEE 30-bus system. In the experiments, the rule spaces in each switch or the connection spaces in each PDC are limited. Because the IEEE 118-bus system includes more switches and PDCs, it can provide more hardware resources for the greedy heuristic algorithm to use. As shown in Figs. 6.9 and 6.10, the limited resources in network switches and PDCs have two impacts on the execution time. On one hand, they reduce the search space. In the ILP problem, the search space specified by constraints (6.12)–(6.14) and (6.16)–(6.18) is reduced. The greedy heuristic algorithm stops adding rules into the switches at Steps 17–24, if there is insufficient rule space in a path. On the other hand, because of limited resources, it can take more time to find a solution in both methods. When they are impacted by these two factors simultaneously, the execution times of the ILP model, the greedy heuristic algorithm, and the baseline method fluctuates without increasing or decreasing dramatically. An interesting phenomenon happens for the IEEE 118-bus system when PDCs are short of connection spaces (shown in Fig. 6.10b); the greedy heuristic algorithm spends more time than the baseline method to obtain the result. The reason is that even though the heuristic algorithm reconnects PMUs in the order of their degrees in the power system’s transmission networks, the limited connection space in PDCs mean that the algorithm must spend a lot of time searching possible paths for reconnection before it tries the next disconnected PMU.

146

6 Self-Healing PMU Network Against Cyber Attacks

Fig. 6.9 Comparison of execution time with impact of limited network resources. (a) IEEE 30-bus system. (b) IEEE 118-bus system. © [2018] IEEE. Reprinted, with permission, from [9]

Fig. 6.10 Comparison of execution time with impact of limited PDC connection space. (a) IEEE 30-bus system. (b) IEEE 118-bus system. © [2018] IEEE. Reprinted, with permission, from [9]

6.8 Performance Evaluation of Stage 2 We further evaluate the second stage of the optimization, which reconnects PMUs to maximize the redundancy of measurements. In this evaluation, the power grid is still observable with the disconnected PMUs. When there are sufficient hardware resources, the ILP model, the greedy heuristic algorithm, and the baseline algorithm always reach the same result, i.e., all remaining uncompromised PMUs are reconnected. Figures 6.11 and 6.12 show how the limited rule space of network switches and the connection spaces of PDCs can impact the result of the self-healing mechanism. As shown in Fig. 6.11, the greedy heuristic algorithm and the baseline method are affected if there are a limited number of rule spaces in switches. Both the greedy heuristic and the baseline algorithm reconnect PMUs one by one. The algorithm can use up the rule space of a small number of switches; those switches can become bottlenecks that prevent other PMUs from reaching unused PDCs. The ILP model tries to optimize global reconnection of PMUs among all paths; it can avoid selecting paths that tend to use up the rule space of a few switches. As seen in Fig. 6.12, the ILP model, the greedy heuristic, and the baseline method usually give

6.8 Performance Evaluation of Stage 2

(a)

147

(b)

Fig. 6.11 Comparison of power system observability with the impact of limited network resources. (a) IEEE 30-bus system. (b) IEEE 118-bus system. © [2018] IEEE. Reprinted, with permission, from [9]

(a)

(b)

Fig. 6.12 Comparison of power system observability with the impact of limited PDC connection space. (a) IEEE 30-bus system. (b) IEEE 118-bus system. © [2018] IEEE. Reprinted, with permission, from [9]

the same results, i.e., reconnect PMUs based on the existing connection spaces from all PDCs. Figures 6.13 and 6.14 show the execution time of the ILP model, the greedy heuristic algorithm, and the baseline method. Similar to Figs. 6.6, 6.9, and 6.10, Figs. 6.13 and 6.14 include the average of the 50 largest execution times spent by these three methods. By comparing Figs. 6.9 and 6.10 with Figs. 6.13 and 6.14, we can see that the execution times of the ILP model spent at Stage 2 is one order of magnitude smaller than the time spent at Stage 1. In other words, the ILP model uses a large amount of computation to satisfy constraints (6.3). However, the execution times spent by the greedy and baseline algorithms at Stage 2 are one magnitude larger than their times spent at stage 1, as they often end up attempting to reconnect many PMUs. The execution time of the ILP model does not change dramatically when the rule space of network switches or the connection space of PDCs become limited. However, the greedy algorithm and the baseline method select PMUs in sequence. They can reconnect more PMUs if more hardware resources are available, and thus need more execution time to finish. Also, the greedy algorithm and the baseline method expect more variation in execution time. Because in both approaches, the search space of a PMU can be varied with the location of the disconnected PMUs when hardware resources become limited.

148

6 Self-Healing PMU Network Against Cyber Attacks

(a)

(b)

Fig. 6.13 Comparison of execution times with the impact of limited network resources. (a) IEEE 30-bus system. (b) IEEE 118-bus system. © [2018] IEEE. Reprinted, with permission, from [9]

(a)

(b)

Fig. 6.14 Comparison of execution times with the impact of limited network resources. (a) IEEE 30-bus system. (b) IEEE 118-bus system. © [2018] IEEE. Reprinted, with permission, from [9]

References 1. P. Zhang, Phasor Measurement Unit (PMU) Implementation and Applications ( EPRI, Palo Alto, CA, 2007), Tech. Rep. 2. J. Qi, K. Sun, J. Wang, H. Liu, Dynamic state estimation for multi-machine power system by unscented Kalman filter with enhanced numerical stability. IEEE Trans. Smart Grid 9(2), 1184–1196 (2018) 3. J. Qi, K. Sun, W. Kang, Optimal PMU placement for power system dynamic state estimation by using empirical observability Gramian. IEEE Trans. Power Syst. 30(4), 2041–2054 (2015) 4. K. Sun, J. Qi, W. Kang, Power system observability and dynamic state estimation for stability monitoring using synchrophasor measurements. Control. Eng. Pract. 53, 160–172 (2016). http://www.sciencedirect.com/science/article/pii/S0967066116300132 5. NASPI, Data bus technical specifications for North American Synchro-Phasor Initiative network (NASPInet), Quanta Technology, Tech. Rep., 2009 6. T. Morris, S. Pan, J. Lewis, J. Moorhead, N. Younan, R. King, M. Freund, V. Madani, Cybersecurity risk testing of substation phasor measurement units and phasor data concentrators, in Proceedings of the Seventh Annual Workshop on Cyber Security and Information Intelligence Research (CSIIRW) (2011), pp. 1–4 7. C. Beasley, G. Venayagamoorthy, R. Brooks, Cyber security evaluation of synchrophasors in a power system, in Power System Conference (PSC) (2014), pp. 1–5 8. S. Mousavian, J. Valenzuela, J. Wang, A probabilistic risk mitigation model for cyber-attacks to PMU networks. IEEE Trans. Power Syst. 30(1), 156–165 (2015)

References

149

9. H. Lin, C. Chen, J. Wang, J. Qi, D. Jin, Z.T. Kalbarczyk, R.K. Iyer, Self-healing attackresilient PMU network for power system operation. IEEE Trans. Smart Grid 9(3), 1551–1565 (2018) 10. J. Stewart, T. Maufer, R. Smith, C. Anderson, E. Eren, Synchrophasor Security Practices, Schweitzer Engineering Laboratories, Tech. Rep., 2011. 11. X. Dong, H. Lin, R. Tan, R. K. Iyer, Z. Kalbarczyk, Software-defined networking for smart grid resilience: opportunities and challenges, in Proceedings of the 1st ACM Workshop on Cyber-Physical System Security (CPSS) (ACM, New York, 2015), pp. 61–68 12. Y. Liu, P. Ning, M.K. Reiter, False data injection attacks against state estimation in electric power grids, in Proceedings of the 16th ACM Conference on Computer and Communications Security, ser. CCS ’09 (ACM, New York, 2009), pp. 21–32 13. O. Kosut, L. Jia, R.J. Thomas, L. Tong, Malicious data attacks on the smart grid. IEEE Trans. Smart Grid 2(4), 645–658 (2011) 14. S. Ponomarev, T. Atkison, Industrial control system network intrusion detection by telemetry analysis. IEEE Trans. Dependable Secure Comput. 13(2), 252–260 (2016) 15. C.W. Ten, G. Manimaran, C.C. Liu, Cybersecurity for critical infrastructures: attack and defense modeling. IEEE Trans. Syst. Man Cybern. 40(4), 853–865 (2010) 16. Y. Yang, K. McLaughlin, T. Littler, E.G. Im, B. Pranggono, H.F. Wang, Multiattribute SCADA-specific intrusion detection system for power networks. IEEE Trans. Power Delivery 29(3), 1092–1102 (2014) 17. S. Cui, Z. Han, S. Kar, T.T. Kim, H.V. Poor, A. Tajer, Coordinated data-injection attack and detection in the smart grid: a detailed look at enriching detection solutions. IEEE Signal Process. Mag. 29(5), 106–115 (2012) 18. A.F. Taha, J. Qi, J. Wang, J.H. Panchal, Risk mitigation for dynamic state estimation against cyber attacks and unknown inputs. IEEE Trans. Smart Grid 9(2), 886–899 (2016) 19. B. Stephenson, B. Sikdar, A quasi-species model for the propagation and containment of polymorphic worms. IEEE Trans. Comput. 58(9), 1289–1296 (2012) 20. B. Sun, G. Yan, Y. Xiao, T.A. Yang, Self-propagating mal-packets in wireless sensor networks: dynamics and defense implications. Ad Hoc Netw. 7(8), 1489–1500 (2009) 21. P. Mell, K. Kent, J. Nusbaum, Guide to malware incident prevention and handling, National Institute of Standards and Technology, Tech. Rep., 2005. http://csrc.nist.gov/publications/ nistpubs/800-83/SP800-83.pdf 22. K. Murakami, H. Kim, Comparative study on restoration schemes of survivable ATM networks, in Proceedings of the IEEE International Conference on Computer Communications (INFOCOM) (1997), pp. 345–352 23. R. Kawamura, K. Sato, I. Tokizawa, Self-healing ATM networks based on virtual path concept. IEEE J. Sel. Areas Commun. 12(1), 120–127 (1994) 24. T. Frisanco, Optimal spare capacity design for various protection switching methods in ATM networks, in 1997 IEEE International Conference on Communications (ICC) (1997), pp. 293– 298 25. W. Kang, A.J. Krener, M. Xiao, L. Xu, A survey of observers for nonlinear dynamical systems, in Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications (Vol. II) (Springer, Berlin, 2013), pp. 1–25 26. S. Azizi, A.S. Dobakhshari, S.A. Nezam Sarmadi, A.M. Ranjbar, Optimal PMU placement by an equivalent linear formulation for exhaustive search. IEEE Trans. Smart Grid 3(1), 174–182 (2012) 27. Schewitzer Engineering Laboratories, SEL-3373 station phasor data concentrator, instruction manual, Tech. Rep., 2014. https://selinc.com/products/3373/ 28. D.L. Applegate, G. Calinescu, D.S. Johnson, H. Karloff, K. Ligett, J. Wang, Compressing rectilinear pictures and minimizing access control lists, in Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA) (2007), pp. 1066–1075 29. C.R. Meiners, A.X. Liu, E. Torng, TCAM razor: a systematic approach towards minimizing packet classifiers in TCAMs, in Proceedings of the 2007 IEEE International Conference on Network Protocols (2007), pp. 266–275

150

6 Self-Healing PMU Network Against Cyber Attacks

30. M. Moshref, M. Yu, A. Sharma, R. Govindan, vCRIB: virtualized rule management in the cloud, in Proceedings of the 4th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud) (USENIX, Berkeley, 2012). https://www.usenix.org/conference/hotcloud12/ vcrib-virtualized-rule-management-cloud 31. Y. Kanizo, D. Hay, I. Keslassy, Palette: distributing tables in software-defined networks, in Proceedings of IEEE International Conference on Computer Communication (INFOCOM) (2013), pp. 545–549 32. C.Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill, M. Nanduri, R. Wattenhofer, Achieving high utilization with software-driven WAN, in Proceedings of the ACM SIGCOMM (ACM, New York, 2013), pp. 15–26. http://doi.acm.org/10.1145/2486001.2486012 33. F. Gori, G. Folino, M.S.M. Jetten, E. Marchiori, MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks. Bioinformatics 27(2), 196– 203 (2011) 34. V. Chvatal, A greedy heuristic for the set-covering problem. Math. Oper. Res. 4(3), 233–235 (1979) 35. J. Currie, D.I. Wilson, B.T. Young, OPTI toolbox: a free Matlab toolbox for optimization. http://www.i2c2.aut.ac.nz/Wiki/OPTI/index.php/Main/HomePage

Part III

Cyber-Physical Security for Distributed Energy Resources

Chapter 7

Cyber-Physical Security Research Framework for Distributed Energy Resources

7.1 Introduction Recent findings indicate that the threat of cyber-based attacks targeting the Nation’s energy sector, and in particular the electric grid, is growing in number and sophistication [1]. A major cyber incident in the power system could have serious consequences on grid operation in terms of socioeconomic impacts, market impacts, equipment damage, and/or large-scale blackouts [2]. Several efforts—such as the DOE Cyber Security Roadmap for Energy Delivery Systems (EDS) [3], North American Electric Reliability Corporation (NERC) Critical Infrastructure Protection (CIP) Standards [4], National Institute of Standards and Technology Interagency Report (NISTIR) 7628 [5], and NESCOR report [6]—have explored the power grid’s security and resilience against cyber threats. Meanwhile, the traditional power grid is undergoing a massive change through renewable integration, microgrids, demand response, advanced metering infrastructure (AMI), and distributed energy resources (DERs). Accordingly, the power grid architecture is fast evolving from a utility-centric structure to a distributed smart grid that heavily integrates DER [7]. Currently, Hawaii depends on renewables for over 23% of its energy, while California utilizes over 26% renewables [8, 9]. California has a goal of integrating 15 GW of DER, including 12 GW of renewable energy, into distribution systems by 2020 and achieving 50% renewable energy by 2030. DER will likely decrease the control that utilities have over the energy resources in the power grid. To enable high levels of renewable penetration, utilities must implement wide-area communication to remotely control these devices. While smart meters and AMI already significantly expand the utility’s attack surface, DER deployments present additional risks due to the tremendous number of devices and access points that operate outside the typical utility’s administrative domain. To promote the deployment of DER, the New York State Public Service Commission made an effort to address DER cyber vulnerabilities in its recent Reforming the Energy Vision (REV) [10] initiative. California’s Rule 21 smart inverter working group has also provided recommendations for technical requirements for smart © Springer Nature Switzerland AG 2023 J. Qi, Smart Grid Resilience, https://doi.org/10.1007/978-3-031-29290-3_7

153

154

7 Cyber-Physical Security Research Framework for Distributed Energy Resources

inverters, including cybersecurity requirements [11]. The National Electric Sector Cybersecurity Organization Resource (NESCOR) has discussed the DER system architectures and cybersecurity requirements of DER systems [12] and has identified many cybersecurity failure scenarios that DER could introduce to the grid [13]. In this chapter we discuss a holistic attack-resilient framework and layered cyberphysical solution portfolio to protect the critical power grid infrastructure and the integrated DER from malicious cyber attacks, helping ensure the large-scale and secure integration of DER without degrading the grid reliability and stability [14].

7.2 Cyber-Physical Power System with Large-Scale DER Deployments 7.2.1 Generic Architecture of Power Systems with DERs In [13], a DER system architecture is proposed, which has five levels: (i) autonomous DER generation and storage, (ii) facilities DER energy management, (iii) utility and retail energy provider (REP) operational communications, (iv) distribution utility operational analysis, and (v) transmission and market operations. In IEC 62351-12, a similar DER system architecture was mapped to the European M/490 Smart Grid Architecture Model (SGAM) and the interfaces enabling multiple levels of information exchanges between different levels of the system were also discussed [15]. Here the DER system architecture is divided into four domains as shown in Fig. 7.1. • Domain 1: DER Devices and Controllers – Actors: In this domain the DER is likely owned and controlled by consumers who gain profit by generating power for personal use and may selling excess power to the utility. Facilities DER energy management systems (FDEMSs) are the entities that act upon the DER and their controllers for operations (using smart inverters). The owners have complete authority over the devices and controllers, and the FDEMS may have access limited to management of the devices, modifying certain DER operations and reading real-time data allowed by the DER owner. The AMI system is the third actor; it can collect data from the devices and send it to the utilities. – Interaction: The DER owners get the information about the DER by communicating with smart inverters with wireless technology, such as ZigBee. They can also access the smart inverters through the human machine interface (HMI). FDEMS communicates with DER by the wide area network (WAN)/local area network (LAN) at the facility.

7.2 Cyber-Physical Power System with Large-Scale DER Deployments

155

Fig. 7.1 DER architecture

– Vulnerability points: The vulnerabilities include (i) unauthorized access to DER controllers and smart inverters, (iii) penetration through the facility network, (iv) unauthorized access to smart meters, (v) an unauthorized change in the settings in the FDEMS, and (vi) novice owners who fail to adequately secure their devices. • Domain 2: Distribution Utility Communications and Control – Actors: The utility works as an actor in this domain and can send control commands to the smart inverters such as connecting/disconnecting the DER, regulating the voltage, and managing the amount of penetration allowed. Utilities may also use an FDEMS to handle DER systems located at utility sites such as substations or physical plant sites. The distribution management system ensures the stability of the grid after the addition of the DER. It is also responsible for shutting down the DER in case of an emergency. – Interaction: The utility interacts with the smart inverters and controllers using communication protocols such as Smart Energy Profile (SEP) 2.0. The distribution system uses the WAN/LAN of the utility. – Vulnerability points: The protocols in use need to be checked for vulnerabilities. An attacker could penetrate through the utility network. Malicious commands sent to the DER controllers and/or smart meters can cause issues. • Domain 3: Third Parties – Actors: Key actors within this domain include (i) aggregators, (ii) companies providing power purchasing agreements (PPAs) or energy leases, and (iii)

156

7 Cyber-Physical Security Research Framework for Distributed Energy Resources

DER manufacturers. Aggregators must interact with DER in order to participate in an energy market on behalf of the DER owners. Companies supporting PPAs and energy leases invest in the initial capital expenses of DER and then charge the consumer a monthly rate based on the energy produced by the DER. Manufacturers may also have systems that interconnect with the DER and may perform remote maintenance on the systems. – Interaction: Most of the third-party entities have the ability to monitor the status of DER, and some may also have the ability to directly control their operation. Furthermore, these entities are unique in that they may have connectivity to a very large number of DERs. Aggregators must connect to the DER in order to determine their available energy. Many DER manufacturers provide additional online services that come with their device, such as automatic cloud storage of device data. Many devices are configured to immediately connect back to a manufacturer-controlled cloud environment in order to provide consumers with easy access to data and to support maintenance operations. Companies that provide PPAs and energy leases also often remotely monitor the energy produced by the DER and may be responsible for performing maintenance on the devices remotely. – Vulnerability points: These interconnections introduce centralized points that could potentially be leveraged by attackers to manipulate DER instances. The systems that are used for third-party access may directly interconnect with many more DER instances than the other, more well-defined, DER interconnections. Attacks against these systems have the ability to influence a large number of DERs across multiple distribution grids. Although it is unclear how much control these entities have over the DER, the security of these connections is often outside the control of the utility and the DER owner. • Domain 4: Transmission Operations – Actors: These actors include the ISOs and RTOs that maintain a stable frequency by balancing system based on the operating reliability regulations. In their EMS, there are many advanced applications, such as state estimation (SE) and automatic generation control (AGC). – Interaction: The ISOs/RTOs will probably not directly communicate with smart inverters or DER devices. However, ISOs or RTOs and market operations can affect what the DER systems are requested or required to do, based on tariffs and other agreements [13]. DER operations need to be integrated with the large power grid operations. Distribution utilities may interact with their ISO/RTO as a wholesale market participant. The DER aggregators may also bid into the electricity market for both energy and ancillary services. The operation of the large grid at ISO/RTO level can also impact the operation of DER. Communication protocols on this end commonly include DNP3 and IEC 61850. – Vulnerability points: Many advanced applications in EMS are based on the measurements from sensors, such as remote terminal units (RTUs) or phasor measurement units (PMUs). The compromised measurements can negatively

7.3 Overview of DER Cyber-Physical Security Research Framework

157

influence the functionalities of advanced applications and further influence the power grid operation, which can lead to serious voltage or frequency violations.

7.2.2 Challenges of Maintaining DER Cybersecurity The emerging DER architecture introduces a variety of potential vulnerabilities to various cyber threats. First, the high penetration of DER introduces a huge number of energy devices (e.g., smart inverters and battery controllers) owned and operated at many consumer and utility locations. The number of consumer-owned DER devices incorporated into the grid could vastly outnumber the utility owned and controlled resources. Second, DER span multiple security administrative domains, meaning that the utility may only be able to monitor the security posture of devices up to the smart meter, as the DER owners will likely manage their own devices. Third, the various networks used to control the DER may be interconnected with building automation networks and other information technology (IT) networks, thereby increasing their attack surface. These three key features introduce many new threats to both DER and the broader grid. As identified in Fig. 7.1, a wide variety of devices and networks are required to support DER; however, current research has only addressed a subset of this underlying infrastructure and its interactions. Numerous key research efforts have demonstrated smart meter advances, including (i) security attack analysis for smart meters [16], (ii) intrusion detection approaches for smart meters [17], and (iii) the design of new security mechanisms for smart meters. While secure smart meters play a critical role in DER integration, most DER innovation is occurring “behind the meter” through the integration of new energy sources and cyber-control mechanisms. In addition, the required control techniques must operate across administrative domains (i.e., between utilities and consumers). This creates many new cybersecurity challenges beyond those faced with smart meter deployments. Table 7.1 identifies key cybersecurity challenges introduced by DER and compares these emerging DER challenges against the current smart meter/AMI systems to demonstrate why current research does not meet these needs.

7.3 Overview of DER Cyber-Physical Security Research Framework Figure 7.2 presents an overarching architecture for attack-resilient DER integration that takes into account both the cyber and physical characteristics of the power grid by combining efforts to prevent, detect, and respond to cyber attacks at the cyber, physical device, and utility layers. This overall framework covers the following

158

7 Cyber-Physical Security Research Framework for Distributed Energy Resources

Table 7.1 Emerging key DER security challenges Security challenges Divided administration

Increased cyber-physical interdependencies

Greater impact to grid

Smart meter/AMI The utility owns the entire AMI infrastructure or utilizes a managed service. This ensures that security mechanisms and patches are installed and correctly configured. Utilities also prioritize cybersecurity during system acquisitions. Smart meter attacks have limited cyber-physical interdependencies. Generally, only an attack that disconnects a meter can be detected by monitoring the physics of the grid. While the disconnection of meters will leave consumers without power, it is unlikely to significantly impact the reliability and stability of the distribution grid.

DER Smart inverters will likely be owned by the DER operator, instead of the utility [18], and may not have the technical expertise or incentives to prioritize or maintain the security of their infrastructure.

Preventing, detecting, and mitigating malicious DER operations will heavily depend on analyzing both the cyber and physical properties of the grid.

If there is a high penetration of DER in the grid, the malicious operation of smart inverters may seriously impact the distribution grid by injecting excessive power or intentionally manipulating voltage, which could present a greater risk to the bulk power system stability. Cryptography & key The utility either owns the entire The networks must cross multiple exchange AMI network or utilizes a administrative boundaries so that managed service, which simplifies commands from the utility can the implementation of the control the consumer-owned DER. cryptographic protocols and key Therefore, key exchange and exchanges necessary to protect revocation must occur between communications. multiple parties. Privacy Utilities commonly obtain meter Utilities may be able to measure readings on 15- or 60-min the status of DER resources in intervals, which only provide seconds or minutes. This information on changes with major information could be used to infer loads [19]. increasingly accurate profiles of consumer behavior. More control Smart meters have limited control Smart inverters have advanced functions functions, which typically include control functions that can greatly demand response and load influence the utility’s and disconnects. customer’s ability to control smart inverters.

key topics within the area of DER cybersecurity by addressing the identified key challenges of DER security. • Resilience metrics and design principles for DER: Common cyber vulnerabilities within DER and smart inverters, along with the risk they present to the grid through complex interactions with other devices and applications, must be

7.3 Overview of DER Cyber-Physical Security Research Framework

159

Fig. 7.2 Attack-resilient framework for DER cybersecurity

identified. Resilience metrics and cyber-physical security principles should also be developed to provide increased confidence in DER implementations. Attackresilient security metrics, vulnerability indices, and design principles should be developed to help guide utilities and consumers as they increasingly adopt DER. The metrics will identify how cyber attacks against DER could impact the grid, especially with increasing amounts of DER integration, to inform utilities about DER-related decisions. These metrics and design principles will inform utilities about (i) the percentage of allowable DER penetration to maintain grid reliability while some DER instances are malicious, (ii) how observable the malicious DER actions are within various distribution feeder models, and (iii) what DER functions or commands have the greatest ability to influence grid reliability. In addition, it will provide a foundation for understanding the security mechanisms necessary to protect the grid as the DER integration increases. These security design principles will identify critical DER security properties (e.g., confidentiality, integrity, availability) for various messages and functions. It will then leverage the cyber attack threat models and DER system architectures to provide a ranking of threat impacts and identify trade-offs between the amount of integrated DER, granularity of control capabilities, and cybersecurity of the infrastructure. • Attack prevention for DER: Current gaps between the existing technologies to prevent attacks against DER need to be explored and previous work on the design of security architectures for smart meters should be extended while incorporating many of the unique properties in DER. Security mechanisms for DER at cyber, physical device, and utility layers of the power system will inform DER owners and utilities about what security protections are important to maintain a secure system. To enhance the cybersecurity of the power system with a huge number of DERs, necessary cybersecurity architectures and mechanisms need to be identified and designed for DER integration. Specifically, cryptographic operations (including key exchanges and management), trusted computing operations, and

160

7 Cyber-Physical Security Research Framework for Distributed Energy Resources

access control models for DER should be carefully studied. Both cyber and physical techniques to protect DER from attack need to be identified. • Attack detection for DER: It is imperative that malicious activities within DER be quickly detected. Effective methods should be developed to detect DER anomalies and misuse patterns across both the cyber and physical components. Techniques need to be devised to monitor these patterns to provide higher confidence attack detection. These techniques will then correlate both physical and cyber events to produce high-confidence indicators of attack and will provide actionable data to enable real-time utility response. The tool will operate within the control center to collect data across the various utility infrastructures and DER domains. The attack detection techniques must reveal sufficient information regarding the attack to provide utilities with the ability to appropriately respond. Therefore, information that should be provided along with detection alerts includes (i) the set of affected DER resources, (ii) estimated malicious action (e.g., voltage/frequency violations), and (iii) estimated severity. • Attack response for DER: Proper and prompt response actions should be provided to disconnect offending systems or counteract them through the control of the other DER. The goal of cyber-attack response is to prevent cyber attacks from further impacting the system while ensuring the continuous operation of the systems to the largest extent possible. Once the intrusion detection system (IDS) identifies the likely cause of the anomaly, it will provide fail-safe responses that protect the grid by minimizing the impact of the DER. The response to the attack can be based on the malicious actions performed by the DER, the scale of the attack, and the detection confidence. If the attack is identified at a high confidence level, those DERs should be immediately disconnected from the grid. If lower confidence events are detected, then alternative methods, such as controlling neighboring DER, should be explored to compensate for the malicious DER activities. Various response activities should be studied to determine the optimal approach for different attack scenarios.

7.4 Potential Cyber Attacks on Cyber-Physical Power System with DERs There can be different types of potential cyber attacks with respect to the cyberphysical power system with large-scale DER deployments.

7.4.1 Cyber-Physical-Threat Modeling Research is required to explore the threat models and the associated risks that DER and smart inverters introduce to the reliability of the distribution or even

7.4 Potential Cyber Attacks on Cyber-Physical Power System with DERs

161

bulk transmission grids. The key issues that should be modeled and simulated are identified below: • Cyber-threat model: NESCOR has identified many cybersecurity threats to DER [12]. Key targets of cyber threats include DER controllers, smart inverters, and the interactions between wide-area monitoring, protection, and control (WAMPAC) of the power system and DER. • DER control and communication: The control architectures and communication networks of the DER implementation directly determine the risk exposure from cyber attacks. Multiple devices are involved in controlling DER, especially smart inverters, DER controllers, and battery controllers. Models can be developed using cyber-architectural languages, such as data flow diagrams (DFDs) or the Architectural Analysis and Design Language (AADL). Specific properties of DER control and communication that need to be modeled include (i) communication protocols (e.g., IEEE 1815 (DNP3), IEC 61850-7-420, SEP 2.0, Modbus) tailored for the control of DER devices, (ii) unicast, multicast, and broadcast communication topologies for DER messages, and (iii) smart inverter control functions including volt–var management, frequency–watt management, status reporting, and time synchronization. • Distribution grid and DER: The physical properties of the distribution grid, feeders, and integrated DER also significantly influence the degree to which attackers impact the stability and reliability of the grid. The components include Photovoltaic (PV) systems energy storage, smart inverters, voltage regulators, capacitor banks, transformers, and protections such as relays, reclosers, and fuses. A primary factor is the percentage of DER that can be integrated into the grid while still remaining reliable during cyber attacks. The role of local aggregated controllers, such as microgrid controllers, should also be considered in the evaluation. • Coupled transmission and distribution with DER: Increased integration of DER will not only influence the distribution grid but can also potentially influence the transmission grid. In addition, the disturbances in the transmission grid can influence a large number of DERs in the distribution grid. Therefore, there is a need to perform coupled transmission and distribution modeling and analysis, by extending the power flow analysis for integrated transmission and distribution systems [20, 21] and incorporating DER.

7.4.2 Threat Scenarios Targeting DER An attack against DER could target a number of devices and communication networks owned by either the utility or the DER owner. Furthermore, there may also be a variety of third-party services and entities that are interdependent with the operation of DER. The severity of attacks on the various system components and

162

7 Cyber-Physical Security Research Framework for Distributed Energy Resources

Fig. 7.3 High-level schematic of cyber attacks on DER

entities can be determined by the size of the DER and the number of available DER instances they are connected to. Figure 7.3 is a high-level schematic of potential cyber attacks targeting DER. These are described in the following paragraphs, which are numbered to correspond to the red triangles in Fig. 7.3: • 1) Malicious DER commands sent through utility WAN Utilities may need to remotely communicate with DER in order to control the operating points and monitor the status of the devices. These communications will be critical to maintaining the reliability of the distribution grid, but an attack that can deny, disrupt, or tamper with these messages could prevent the utility from performing necessary control actions. A number of vulnerabilities could enable these attacks, including insecure network protocols, misuse of cryptographic operations, or unauthorized intrusions into the utility DER systems. If these attacks occurred, they could provide the attacker with the ability to control a large number of DER systems, which could produce a serious impact on the distribution grid. Similar attack scenarios identified by NESCOR include [12]: – Compromised DER Sequence of Commands Causes Power Outage (DER.6) – DER SCADA System Issues Invalid Commands (DER.14) – Loss of DER Control Occurs due to Invalid or Missing Messages (DER.9) • 2)–3) Malware or unauthorized control of smart inverters and DER controllers DER requires a wide variety of digital devices to control their operation and provide consumers and utilities information about their operation. Most DERs will likely include smart inverters and DER controllers; others may also include

7.4 Potential Cyber Attacks on Cyber-Physical Power System with DERs

163

battery controllers and even EV controllers. If the attackers can directly access these systems, they will be able to manipulate any of their control functions or spoof status information to the utilities or owners. Attacks that have direct control over the smart inverters could be particularly dangerous because the attack could intelligently manipulate the device’s operation based on the state of the grid. This could help the attacker amplify undesirable grid states. Similar attack scenarios identified by NESCOR include [12]: – Malware Introduced in DER System During Deployment (DER.3) – Threat Agent Modifies Field DER Energy Management System (FDEMS) Efficiency Settings (DER.10) • 4)–6) Attacks from connected building control systems, IT networks, and vehicle systems DER devices will likely be interconnected with a variety of other systems and networks, including various Internet of Things (IoT) devices, and other thirdparty cloud systems and services. Many of these devices and networks will not have a strong security posture and may provide avenues for remote access to the DER components. These interconnections could be used by attackers to access the DER and spoof various commands and messages to change operational settings. These vulnerabilities could be caused by weak authentication of mechanisms or software vulnerabilities within the DER components. Similar attack scenarios identified by NESCOR include [12]: – Threat Agent Spoofs DER Data Monitored by DER SCADA Systems (DER.15) – DER’s Rogue Wireless Connection Exposes the DER System to Threat Agents via the Internet (DER.2) • 7) Poor system administration from novice system owners Many DER systems are likely to be operated by individuals who do not have expertise in cybersecurity. In these scenarios, the devices are unlikely to get critical system updates and may miss key security configurations. Furthermore, these systems may be the object of social engineering attempts directed at the unsuspecting administrators. Similar attack scenarios identified by NESCOR include [12]: – Custom Malware Gives Threat Agent Control of FDEMS (DER.13) • 8) Attacks to WAMPAC applications influencing DER Malicious attacks on WAMPAC applications, such as automatic generation control (AGC) and remedial action schemes (RAS), can produce severe systemlevel frequency or voltage problems, further influencing the operation of a huge number of DERs which can in turn cause serious problems in distribution and transmission grids. For example, AGC issues an area control error (ACE) that reflects the supply–demand mismatch to dispatch the generators and balance

164

7 Cyber-Physical Security Research Framework for Distributed Energy Resources

the generation and demands. When the tie-line bias control is considered as the operation mode of interconnected power grids, ACE will be calculated based on the frequency and tie-line power deviations. If these measurements are compromised, it will lead to the miscalculation of ACE, and in extreme cases the system frequency could go beyond the acceptable range [22]. When the system frequency is too low due to cyber attacks against AGC, smart inverters have to be disconnected, which will lead to a further reduction of generation and may make the system frequency even lower. • 9) Attacks from third parties with DER interconnectivity The third-party aggregators, manufacturer, and energy leasing entities all have connectivity to a potentially large number of DERs, likely across multiple distribution grids. An attack against any of the systems supporting this connectivity could potentially provide an attacker with access to a large number of DER instances. These impacts may be minor if the third-party access is limited to only monitoring the state of the DER. However, if the third party has the ability to change operational set points or software configurations, then attacks against these systems could have serious impacts that may cascade beyond any single distribution grid. Furthermore, the access maintained by these third parties is often not directly controlled by the utility or DER owner, which further complicates the risk management functions. In addition, coordinated attacks against DER can negatively influence the operation of the bulk power system or impact the system stability. – With high penetration of DER, the widespread fault propagation under coordinated and targeted attacks on carefully selected DER can negatively influence WAMPAC applications; for example, they could lead to the misoperation of the RAS system and produce severe unexpected consequences for DER and the grid or even cause cascading blackouts. These complex interdependencies will dramatically increase the risk of power grid operation [23, 24] and should be carefully analyzed. – Coordinated attacks can target system instability by manipulating the operation of a large number of DERs. There has been research on the impact of PV integration on system stability, including small-signal stability [25], voltage stability [26], and transient stability [27]. Manipulating the output power of many DERs, such as PV and battery storage, can change the net load from substations and further impact the system stability. It is possible for an attacker to launch an attack on the DER connected to the distribution system at those most vulnerable load buses in which case manipulating the smallest number of DERs can cause serious stability problems, such as undamped oscillation or voltage collapse. Therefore, it is critical for the system operator to identify those most vulnerable load buses and implement targeted protection of them in order to eliminate the possibility of the abovementioned low-cost high-impact cyber attacks to the greatest extent.

7.5 Attack Impact Analysis and Metrics

165

Table 7.2 Impact levels of DER attacks Impact High

Attack target Large-scale coordinated attack on DER

Utility DMS/DER SCADA server

Moderate

Utility-scale DER

Group of residential DER

Commercial DER

Low

Single residential DER

Rational DER devices may have remotely accessible functions, which can provide an attacker with large-scale access to many DER. For example, many current manufactures or third-party DER operators could have access to large numbers of DER. If these systems have the capability to control the DER, attacks could then have broader impact across many different distribution grids. The utility’s DMS/DER SCADA system could have some control over all residential, commercial, and utility-scale instances. Attacks against these instances could potentially influence multiple DER across the distribution grid. Utility-scale DER could be in the 100-kW to 10-MW range, which could cause many grid misoperations if manipulated by an attacker. On a power grid with high PV penetration, there could be hundreds of residential DER. A coordinated attack against a large number of residential DERs could have a significant impact on grid stability or available load. Larger commercial DER (10–100 kW) could contribute to protection system misoperations and other system stability problems. Single resident DER (1–10 kW) has little impact on the grid but could negatively impact a resident or residents on the same transformer [28].

7.4.3 Attack Threat Ranking Various attacks can be ranked based on their ability to negatively impact the grid (Table 7.2). An attack on a single DER instance is considered critical only if it is associated with a large utility-scale facility (e.g., 1 MW). However, as demonstrated in Fig. 7.3, many systems and networks interconnect with large numbers of smaller DERs. Attacks against these assets have the potential to access many system components and therefore could also introduce serious risks to the grid.

7.5 Attack Impact Analysis and Metrics Attack impact metrics provide qualitative or quantitative mechanisms to evaluate the severity of the cyber attacks, which are used to evaluate how significantly the attack can manipulate the DER control functions. Figure 7.4 illustrates the methodology for the evaluation of DER attacks.

166

7 Cyber-Physical Security Research Framework for Distributed Energy Resources

Fig. 7.4 DER attack impact evaluation

Attack impact evaluation can utilize various threat models to identify potential attacks against DER. These threats to cybersecurity will then be mapped to a physical system model that includes various properties of the distribution feeder and DER. Furthermore, additional properties of the communication architecture will also be explored to evaluate what impacts various attacks have on the system. By manipulating the DER instances, the attacker can influence a number of system actions as identified below: • Disconnect: The DER can be tripped off from the grid. This can prevent the consumer from selling back energy to the grid, and it may also negatively influence grid operation by creating frequency or voltage violations or influencing the system stability. • False trip: If the PV manipulation can masquerade as a fault, the attacker could potentially trigger an incorrect tripping of a protection relay. This attack could cut off power to consumers on a distribution feeder. • Overloads: Under light load conditions or when the load is disconnected either by the operator or by an attacker, the power from PV or other DER devices can flow back to the substation and may cause overloading of the feeder between the DER system and the substation. If PV generation masks the actual load, the unexpected disconnection of PV may cause overloading of the feeder; more generally, manipulating the active power of many DER devices can change the power flow of the distribution system, which can cause further power flow violations [29].

7.6 DER Cyber-Physical Security Design Principles

167

• Voltage/Frequency violations: Malicious control of a smart inverter could cause a violation of acceptable grid voltage and frequency ranges, resulting in grid instabilities and potential outages. High penetration of DER can influence voltage profile and system frequency, depending on the location and capacity of the DER and their loading conditions. • Failed protection: The DER operation can mask a real system fault such that a protection device does not operate correctly, causing a fault to propagate. Reverse power flow caused by DER can lead to exceeding the interruption ratings of circuit protections and sympathetic tripping of adjacent circuits [29]. High PV penetration can change the fault current levels and the protection zone of the protective relays and may influence the coordination of overcurrent relays, fuses, sectionalizers, and auto-reclosers [30]. The misoperations of protective relays may even lead to a cascading event in the distribution grid. • System instability: Manipulating the active and reactive power of a large number of DER can influence the small-signal stability and voltage stability of the power system, which may cause undamped oscillations or voltage collapse. Once the threat and system model have been defined, system simulations can be performed with various simulators to evaluate their impacts on the grid. A variety of quantitative attack impact metrics can be explored based on (i) the amount of load lost, (ii) the number of feeders tripped, (iii) fraction of components not surviving a given attack, (iv) voltage or frequency violations, (v) decreased system stability margins, (vi) time to recover a given fraction of network functionality, (vii) average propagation of cascading failures, and (viii) safety violations.

7.6 DER Cyber-Physical Security Design Principles Designing a secure cyber-physical DER requires a strong understanding of the impacts of various attacks. While foundational computer security design principles have been identified in [31], these principles must be explored within the context of a modern DER environment to identify what security mechanisms must be integrated to ensure that the systems achieve these principles. Key design principles that must be further explored within the context of DER include: • Should we implement rules requiring diversity in DER devices (e.g., smart inverter manufacturers) to minimize the severity of vulnerabilities discovered in a certain manufacturer? • Does the grid depend too heavily on security mechanisms that the utility does not control? • Do any third parties have control of, or access to, an excessive number of DER? • Can DER network provide sufficient availability to forward time-sensitive messages? • Are there DER settings/set points that must be verified by the utility to ensure devices are operating as expected?

168

7 Cyber-Physical Security Research Framework for Distributed Energy Resources

• How accurate do these checks need to be, and when do we need to add highassurance device architectures to ensure adequate security? • Should smart inverters and other DER devices have default states that they should fail over to in the face of a security event?

7.7 Attack Resilience at Cyber, Physical Device, and Utility Layers The research framework for DER cyber-physical security includes cyber, physical device, and utility layer security measures at multiple levels for different attack classes, which enable resiliency by providing techniques to prevent, detect, and respond to an attack throughout the cyber, physical device, and utility layers to ensure that the grid can remain operational during an attack.

7.7.1 Cyber Layer Attack Resilience An overview of the cyber layer attack resilience is presented in Fig. 7.5. More details are provided below.

7.7.1.1

Cyber Layer Attack Prevention

A broad array of cybersecurity mechanisms is needed to prevent attackers from gaining control over DER. We need to explore gaps and challenges in the currently

Fig. 7.5 Overview of cyber layer attack resilience

7.7 Attack Resilience at Cyber, Physical Device, and Utility Layers

169

available security mechanisms and introduce novel techniques tailored to this domain. Trusted Computing Bases: The utility’s increased dependency on DER for grid control operations presents a strong need for trustworthy DER devices. However, because the utility has little administrative control over the various DER devices, it is difficult to establish the appropriate level of trust in critical DER operations. In order to protect the critical cryptographic operations and DER control functions, DER devices should implement a trusted computing base. Research is needed to explore how DER devices implement trusted platform modules (TPMs) and trusted execution environments (TEEs) to support the protection of critical DER operations. These techniques provide additional assurance that software-based vulnerabilities will not provide attackers with access to critical system data. Modern hardwarebased security mechanisms (e.g., ARM TrustZone [32], GlobalPlatform [33]) can be used to protect critical DER devices and functions. While hardware-based techniques can provide improved security, it is important to prioritize the criticality of various system devices, so they can be appropriately protected. Many DER instances have long lifespans due to their large initial capital costs and may have to operate for over 20 years. Therefore, the software must be updatable to address new software requirements and evolving security technologies. This constraint may make it difficult to depend on hardware security modules and secure co-processors, which have limited flexibility. Instead, techniques that separate processes based on their criticality and placing them within TEEs can provide improved security while still providing support to add new functionality. Examples of functions that may need to be isolated include (i) cryptographic functions and key storage, (ii) event auditing/reporting, and (iii) setpoint management. Access Control Mechanisms: As identified in Sect. 7.2, DER devices will likely be accessible by a number of different entities (e.g., manufacturer, utility, aggregator, consumer, PPA). Therefore, granular access control mechanisms are required to prevent these actors from gaining unauthorized access. Currently, there is limited research exploring access control models for DER. The following list explores potential actors and functions that could be used to build such a model. • Manufacturer: The manufacturer may request read-only access to operational data from the smart inverters’ performance to determine inefficiencies or defects in the devices. Furthermore, the manufacturer may need to provide firmware updates to the devices to solve some of these issues. • Consumer: The consumer will likely need read-only access to system parameters, status information, and load data. They may also want to specify operational parameters for the smart inverter because they are responsible for the DER’s initial configuration. Furthermore, the consumer may want to limit the access to usage data from other owners. • Utility: The utility may need to dynamically change the smart inverter parameters and set points in order to control smart inverter responses to many grid events.

170

7 Cyber-Physical Security Research Framework for Distributed Energy Resources

Utilities will also need access to view device parameters to verify that the various DERs are operating as expected and to support various grid analysis functions. • PPA: The third-party provider may own the PV array and may also be responsible for maintaining it; therefore, they will likely possess access to the system. This party will likely need access to PV production data, which they may use to (i) charge the consumer, (ii) monitor the efficiency of the PV array, (iii) debug current settings and parameters, and (iv) collect analytics for determining cost/energy savings and techniques. Role-based access control (RBAC) will be an important technique to help simplify the access control decisions in this space. However, there may also be a need to change access decisions over time depending on the state of the grid and DER. Therefore, attribute based access control (ABAC) mechanisms and models may be necessary to provide more granular access control changes at different times to account for scheduling and forecasting capabilities. These models will include the roles, objects, and temporal properties that should dictate access control decisions for DER. Secure Communications: Secure communications based on strong cryptographic operations and protocols are important to protect DER messages. While the primary communication protocol being proposed for DER, SEP 2.0, utilizes TLS to provide message encryption and authentication, it does not address many challenges faced with large-scale DER deployment. Additional research is needed to address these key challenges enumerated below: • Device discovery and key management: The large number of DER devices requires that utilities implement auto-discovery techniques to ensure they can easily connect with a large number of devices. However, this requires the pre-establishing trust between device manufacturers, utilities, and consumers. Techniques are needed to securely distribute cryptographic keys and manage them through the system’s lifecycle. • Long protocol lifespans: While DER devices will have long lifespans, often cryptographic protocols require periodic updates to address new attacks. For example, TLS has a long history of critical vulnerabilities due to the complexity of its protocols. The current version, 1.2, has numerous concerns with weak ciphers and modes of operations. Fortunately, version 1.3 addresses these issues, by encryption and authentication in sequence, rather than combined authenticated encryption with associated data support. • Network availability: Current AMI networks have strong integrity and confidentiality requirements but do not have strong availability because smart meter operations typically do not have timely operations. However, utilitycontrolled smart inverter functions have greater availability needs because they will probably be used to maintain grid stability. Therefore, techniques to ensure high network availability may be required to support key DER functions.

7.7 Attack Resilience at Cyber, Physical Device, and Utility Layers

171

Fig. 7.6 Cyber layer attack detection

7.7.1.2

Cyber Layer Attack Detection

Techniques to detect DER cyber attacks should expand upon previously researched attack detection methods demonstrated for smart grid while tailoring them to the salient DER communication and control properties. Techniques should be developed to monitor both the network communications and the various control devices within the utility and deployed at the DER locations, and however the utility will likely have limited control over many of these devices. Figure 7.6 demonstrates how data from various sources can be collected and sent to the control center and then be analyzed for potential attacks. Key data sources that could be used to detect potential attacks include WAN network, smart inverters, smart meters, and SCADA measurements. Specification-based techniques can be used to model expected system behavior (e.g., Petri net, hybrid automata) and compare observed events against these models. Specification-based techniques have already demonstrated their effectiveness against smart meters and AMI [17]; however, new models and analysis techniques are necessary to support DER. In particular, specification-based models should be developed for various DER communication protocols (e.g., SEP 2.0, IEC 6185070-9, DNP3), communication patterns (e.g., unicast, multicast), and coupling points (e.g., ECP, PCC). Anomaly-based techniques can help detect DER attacks that may not be easily inferred from specification-based approaches and can be tailored to address the false positives and false negatives that commonly underlie many modern IDS techniques [34]. Statistical models and machine learning techniques can also help identify anomalies and malicious events within the various collected data sources.

7.7.1.3

Cyber Layer Attack Response

Cyber response strategies can be deployed to send different control messages to various devices or modify the network based on the type, scope, and confidence level of

172

7 Cyber-Physical Security Research Framework for Distributed Energy Resources

an attack. Many techniques can be used to modify the network topology in response to potentially malicious nodes. Because the utility WANs often use wireless mesh networks that depend on distributed routing algorithms, disconnecting malicious devices presents challenges. Therefore, algorithms are necessary to rebuild mesh routes around malicious devices. Besides, other responses such as shutting down the network, turning off computers, isolating the network, smart manual activities to replace automated activities, and ensuring that systems providing essential services remain operational so long as they are directly affected by the failure or attack can also be applied under a cyber attack [15].

7.7.2 Physical Device Layer Attack Resilience 7.7.2.1

Physical Device Layer Attack Prevention

Smart inverters are usually designed to fulfill multiple control objectives. For example, PV inverters can achieve anti-islanding detection and low-/high-voltage ride through. One major problem is that these multiple functionalities may conflict with each other under certain grid conditions. An example is that the volt/var function, frequency/watt function, or low-voltage ride through may make anti-islanding detection less effective [35]. Unintentional islanding of distributed generation, which is not permitted by the existing IEEE standards such as IEEE 1547, can result in personnel safety hazards, equipment damage, and interference with grid protection devices. Thus, this is an important vulnerability that an attacker can use to produce a high impact. In order to solve this problem, the degree of freedom needs to be increased and functions need to be realized more independently. As shown in Fig. 7.7a, an additional power electronic interface inverter called an energy buffer can be developed to provide functionalities including (i) low-/high-frequency ride through, (ii) low/high-voltage ride through, (iii) harmonic distortion, (iv) unbalance distortion, and (v) anti-islanding. The energy buffer can be powered by energy storage and flexibly connected at different locations in distribution grids, such as the connecting points of sensitive devices, the critical PCC, and the coupling point of the distribution system, as shown in Fig. 7.7b. By moving some functionalities of smart inverters to an energy buffer and separating the functionalities in different devices, we expect that the functionality conflicts can be eliminated.

7.7.2.2

Physical Device Layer Attack Detection

Various DER control devices, such as smart inverters, must be monitored for malicious activity (e.g., Malware) that could manipulate the operation of DER without the knowledge of the system owner or utility. In order to detect the attack at the physical device layer, smart inverter design must be enhanced to monitor the

7.7 Attack Resilience at Cyber, Physical Device, and Utility Layers

173

Fig. 7.7 Energy buffer

local system status such that the cyber threats can be detected at an early stage. In particular, smart inverters can measure the local voltage and current to detect system anomalies. In this anomaly-based approach, the indices of power quality, voltage/current unbalance, and other events can be designed to identify the cyber attacks. Furthermore, the energy buffers, as shown in Fig. 7.7, can also be used to enhance the detection of cyber attacks at critical buses or the point of common coupling.

7.7.2.3

Physical Device Layer Attack Response

Apart from eliminating the conflicts between different functionalities of the smart inverters, the energy buffer can also be used to improve the fault ride-through capability and further strengthen the system’s ability to survive cyber attacks. The fault can be voltage/frequency sag/swell, harmonic distortion, unbalance distortion, or unintentional islanding. A detailed implementation is shown in Fig. 7.8. Based on device-level detection, different faults can be identified, and corresponding devicelevel control algorithms can be designed to enable fault response. An index matrix can be developed to summarize the criteria of the physical device layer and should be customized for cybersecurity study by featuring a wider scope compared to the criteria required by the existing IEEE standards. Meanwhile, coordination between fault response using energy buffers and conventional protective devices should also be considered. The protective functions of the energy buffer should be focused on localized fault response, while the conventional protective devices are used to prevent further fault propagation in a larger area of the system.

174

7 Cyber-Physical Security Research Framework for Distributed Energy Resources

Fig. 7.8 Energy buffer for attack response

7.7.3 Utility Layer Attack Resilience 7.7.3.1

Utility Layer Attack Prevention

With a large number of DER devices integrated into the distribution grid, there will be tighter interdependence between cyber and physical systems, and thus a deep understanding of the physics of the grid is indispensable in preventing high-impact attacks on DER. If an attacker has information about which DER systems are more vulnerable, he/she will be able to launch an attack on one or several of the most vulnerable DER systems to cause widespread outage propagation. Fortunately, the system operators should better understand their system and can perform a thorough analysis to identify the most vulnerable DER systems. • Identify the DER systems that play important roles in fault propagation: Because the distribution grid with DER integration is operated under physical laws, the outage propagation triggered by cyber attacks on DER systems should have patterns. These useful patterns can be extracted by applying and extending previous work on the transmission system interaction network and interaction model [2, 36, 37], based on the developed cyber-physical models. Using the samples describing the fault propagation, the interaction network can be obtained by advanced statistical algorithms. The DER systems can be ranked and the key DER systems can be identified. By enhancing the cybersecurity of the identified most vulnerable DER systems, the overall system security will be greatly enhanced. • Identify the load buses that can influence the system stability most: Based on the linearized transmission system dynamic model, the sensitivity between the largest real part of the eigenvalues and the load change at various buses can be analytically calculated, which can be further used to identify the load buses

7.7 Attack Resilience at Cyber, Physical Device, and Utility Layers

175

that can negatively impact the small-signal stability most. Similarly, based on the transmission system steady-state model, the load buses that have the lowest voltage stability margins can also be identified. Based on these identified load buses, targeted protection should be implemented for the DER devices that are connected to them in order to prevent the attacker from launching attacks against those load buses under which significant impact is produced at the lowest cost.

7.7.3.2

Utility Layer Attack Detection

Anomalies within the physical distribution grid can also be used to help identify cyber attacks. If attacks are beginning to manipulate the operation of DER, a variety of traditional power meters on the distribution grid, along with a large number of smart meters and micro-PMUs, can be used to infer which DER devices and consumers are misoperating and what malicious functions they are performing. Historical data describing the operation of DER can also help detect anomalies and potential attacks. Due to inaccurate or even unavailable distribution grid and DER system models, an insufficient number of sensors, and DER operation that closely depends on weather conditions, it is difficult to accurately estimate and predict dynamic states of the distribution grid using dynamic state estimation [38–40] for early detection of anomalies. Therefore, it is better to predict the DER system states and anomalies by data-driven approaches and advanced data analytics to detect potential cyber attacks on DER. • Real-time intrusion detection based on forecasting model: Accurate PV power generation forecasting can help detect anomalies in PV operation at the aggregated level, which can be performed by statistical [41–43], artificial intelligence (AI) [44, 45], physical [46, 47], and hybrid approaches [48–51]. • Real-time intrusion detection by machine learning algorithms: Supervised learning, unsupervised learning, or statistics-based learning approaches can be developed by using the collected smart meter and micro-PMU data. Well-known machine learning techniques, such as SVM, self-organizing maps, decision trees, naïve Bayes networks, and ensemble classifiers, need to be evaluated to guide the selection of techniques that are most suitable for DER intrusion detection. Techniques should be developed to increase the reliability and detection accuracy of the intrusion detection system and to reduce the false-alarm rate.

7.7.3.3

Utility Layer Attack Response

When there are major outages caused by targeted DER cyber attacks or attacks on WAMPAC applications, relying only on cyber- or device-level response will be insufficient; a coordinated control at the microgrid level or utility level will be necessary, depending on how great and widespread the impact of a cyber attack is.

176

7 Cyber-Physical Security Research Framework for Distributed Energy Resources

Fig. 7.9 Coordinated hierarchical control for DER

• Coordinated hierarchical emergency control for DER: Although hierarchical control has been widely employed in AC and DC microgrids [52], it mainly focuses on steady-state operation, such as active and reactive power sharing or voltage deviation elimination, but may not be resilient against cyber attacks. To address this problem, an enhanced hierarchical control architecture, such as the one shown in Fig. 7.9, can be developed to achieve flexible operation of DER when a cyber attack has produced a large impact or is continuing to cause outages. In this control hierarchy, an enhanced secondary control diagram can be developed to improve the transient performance of the system, especially when the DER affected by a cyber attack is disconnected. By monitoring the status of each DER unit, three compensating terms (i.e., a power mismatch correction term, a harmonic correction term, and an unbalance correction term), which are mainly designed and generated to mitigate the transient issue, can be added to the reference values in the primary control level. The secondary control will run in real time to deal with the sequential disconnection of several DERs if continuous cyber attacks exist in the system. • Corrective control to improve system stability: After the anomalies of DER operation have been detected, corrective control can be performed for the centralized generators in the transmission system or for the trusted DER to counteract the malicious attacks. The optimal dispatch strategies can be obtained by solving an optimal power flow (OPF) with stability constraints [53]. Efficient algorithms can be developed to solve the corresponding optimization problem, which is nonlinear and non-smooth. Alternatively, sensitivities between the stability margin and the control strategies obtained offline can also be used to implement quick and effective re-dispatch strategies to prevent outage propagation or other severe consequences as much as possible.

References

177

References 1. S.A. Baker, S. Waterman, G. Ivanov, In the Crossfire: critical Infrastructure in the Age of Cyber War (McAfee, San Jose, 2009). 2. Defense Use Case, Analysis of the cyber attack on the Ukrainian power grid, Electricity Information Sharing and Analysis Center (E-ISAC) (2016), pp. 1–29 3. U.S. Department of Energy (DOE)—Energy Sector Control Systems Working Group, Roadmap to Achieve Energy Delivery Systems Cybersecurity (2011). http://energy.gov/sites/ prod/files/Energy 4. North American Electric Reliability Corporation (NERC), Critical Infrastructure Protection (CIP) Standards (2015). http://www.nerc.com/pa/Stand/Pages/CIPStandards.aspx 5. National Institute of Standards and Technology (NIST), NISTIR 7628 Revision 1: Guidelines for Smart Grid Cyber Security (2014). https://nvlpubs.nist.gov/nistpubs/ir/2014/NIST.IR. 7628r1.pdf 6. National Electric Sector Cybersecurity Organization Resource (NESCOR), Wide Area Monitoring, Protection, and Control Systems (WAMPAC)—Standards for Cyber Security Requirements (2012). http://smartgrid.epri.com/doc/ESRFSD.pdf 7. J.D. Taft, A. Becker-Dippmann, Grid Architecture, Release 3.0. Pacific Northwest National Laboratory (PNNL) Report 24044 (2015). http://gridarchitecture.pnnl.gov/media/whitepapers/Grid 8. Hawaii State Energy Office, Hawaii Energy Facts & Figures ( 2015). http://energy.hawaii. gov/wp-content/uploads/2011/10/FF_May2016_FINAL_5.13.16.pdf 9. California Energy Commission, Renewable Energy—Overview (2015). https://ww2.energy. ca.gov/renewables/tracking_progress/documents/ 10. New York State (NYS) Department of Public Service, Reforming the Energy Vision. Report 14-M-0101. http://www3.dps.ny.gov/W/PSCWeb.nsf/96f0fec0b45a3c6485257688006a701a/ 26be8a93967e604785257cc40066b91a/\protect\T1\textdollarFILE/ATTK0J3L.pdf/ Reforming 11. California Public Utilities Commission (CPUC), Recommendations for Updating the Technical Requirements for Inverters in Distributed Energy Resources: Smart Inverter Working Group Recommendations (2014). https://www.cpuc.ca.gov/WorkArea/DownloadAsset.aspx? id=3189 12. National Electric Sector Cybersecurity Organization Resource (NESCOR), Electric Sector Failure Scenarios and Impact Analyses. Version 1.0, Electric Power Research Institute (EPRI). https://smartgrid.epri.com/doc/NESCOR 13. National Electric Sector Cybersecurity Organization Resource (NESCOR), Cyber Security for DER Systems. Version 1.0, Electric Power Research Institute (EPRI). http://smartgrid. epri.com/doc/der 14. J. Qi, A. Hahn, X. Lu, J. Wang, C.-C. Liu, Cybersecurity for distributed energy resources and smart inverters. IET Cyber-Phys. Syst. Theory Appl. 1(1), 28–39 (2016) 15. International Electrotechnical Commission (IEC), IEC TR 62351-12:2016: Resilience and Security Recommendations for Power Systems with Distributed Energy Resources (DER) Cyber-Physical Systems, 1 edn. http://smartgrid.epri.com/doc/der 16. D. Grochocki, J.H. Huh, R. Berthier, R. Bobba, W.H. Sanders, A.A. Cárdenas, J.G. Jetcheva, Ami threats, intrusion detection requirements and deployment recommendations, in 2012 IEEE Third International Conference on Smart Grid Communications (SmartGridComm) (2012), pp. 395–400 17. R. Berthier, W.H. Sanders, H. Khurana, Intrusion detection for advanced metering infrastructures: requirements and architectural directions, in 2010 First IEEE International Conference on Smart Grid Communications (IEEE, 2010), 350–355

178

7 Cyber-Physical Security Research Framework for Distributed Energy Resources

18. B. Seal, F. Cleveland, A. Hefner, Distributed energy management (DER): advanced power system management functions and information exchanges for inverter-based DER devices, modelled in IEC 61850-90-7, Tech. Rep., 2012. http://xanthus-consulting.com/Publications/ documents/Advanced_Functions_for_DER_Inverters_Modeled_in_IEC_61850-90-7.pdf 19. G. Eibl, D. Engel, Influence of data granularity on smart meter privacy. IEEE Trans. Smart Grid 6(2), 930–939 (2014) 20. H. Sun, Q. Guo, B. Zhang, Y. Guo, Z. Li, J. Wang, Master–slave-splitting based distributed global power flow method for integrated transmission and distribution analysis. IEEE Trans. Smart Grid 6(3), 1484–1492 (2014) 21. Z. Li, J. Wang, H. Sun, Q. Guo, Transmission contingency analysis based on integrated transmission and distribution power flow in smart grid. IEEE Trans. Power Syst. 30(6), 3356– 3367 (2015) 22. S. Sridhar, M. Govindarasu, Model-based attack detection and mitigation for automatic generation control. IEEE Trans. Smart Grid 5(2), 580–591 (2014) 23. S.V. Buldyrev, R. Parshani, G. Paul, H.E. Stanley, S. Havlin, Catastrophic cascade of failures in interdependent networks. Nature 464(7291), 1025 (2010) 24. J. Qi, W. Ju, K. Sun, Estimating the propagation of interdependent cascading outages with multi-type branching processes. IEEE Trans. Power Syst. 32(2), 1212–1223 (2017) 25. S. Eftekharnejad, V. Vittal, G.T. Heydt, B. Keel, J. Loehr, Small signal stability assessment of power systems with increased penetration of photovoltaic generation: a case study. IEEE Trans. Sustainable Energy 4(4), 960–967 (2013) 26. K. Kawabe, K. Tanaka, Impact of dynamic behavior of photovoltaic power generation systems on short-term voltage stability. IEEE Trans. Power Syst. 30(6), 3416–3424 (2015) 27. B. Tamimi, C. Cañizares, K. Bhattacharya, System stability impact of large-scale and distributed solar photovoltaic generation: the case of Ontario, Canada. IEEE Trans. Sustainable Energy 4(3), 680–688 (2013) 28. P.P. Barker, R.W. De Mello, Determining the impact of distributed generation on power systems. I. Radial distribution systems, in 2000 Power Engineering Society Summer Meeting (Cat. No. 00CH37134), vol. 3 (IEEE, 2000), pp. 1645–1656 29. R. Seguin, J. Woyak, D. Costyk, J. Hambrick, B. Mather, High-penetration PV Integration Handbook for Distribution Engineers (National Renewable Energy Lab.(NREL), Golden), Tech. Rep. (2016 30. M.A. Haj-ahmed, M.S. Illindala, The influence of inverter-based DGS and their controllers on distribution network protection. IEEE Trans. Ind. Appl. 50(4), 2928–2937 (2014) 31. J.H. Saltzer, M.D. Schroeder, The protection of information in computer systems. Proc. IEEE 63(9), 1278–1308 (1975) 32. ARM Limited, Building a Secure System using TrustZone Technology. Report PRD29GENC-009492C. http://infocenter.arm.com/help/topic/com.arm.doc.prd29-genc-009492c/ PRD29-GENC-009492C_trustzone_security_whitepaper.pdf 33. GlobalPlatform Technology, TEE Internal Core API Specification Version 1.1.2.50 (2018). https://globalplatform.org/wp-content/uploads/2018/06/GPD_TEE_Internal_Core_API_ Specification_v1.1.2.50_PublicReview.pdf 34. S. Axelsson, The base-rate fallacy and the difficulty of intrusion detection. ACM Trans. Inf. Syst. Secur. 3(3), 186–205 (2000) 35. Sandia National Laboratories, Accelerating Development of Advanced Inverters: Evaluation of Anti-Islanding Schemes with Grid Support Functions and Preliminary Laboratory Demonstration (2013). http://prod.sandia.gov/techlib/access-control.cgi/2013/1310231.pdf 36. J. Qi, K. Sun, S. Mei, An interaction model for simulation and mitigation of cascading failures. IEEE Trans. Power Syst. 30(2), 804–819 (2015) 37. W. Ju, J. Qi, K. Sun, Simulation and analysis of cascading failures on an NPCC power system test bed, in 2015 IEEE Power Energy Society General Meeting (2015), pp. 1–5 38. E. Ghahremani, I. Kamwa, Local and wide-area PMU-based decentralized dynamic state estimation in multi-machine power systems. IEEE Trans. Power Syst. 31(1), 547–562 (2015)

References

179

39. K. Sun, J. Qi, W. Kang, Power system observability and dynamic state estimation for stability monitoring using synchrophasor measurements. Control. Eng. Pract. 53, 160–172 (2016) 40. J. Qi, K. Sun, J. Wang, H. Liu, Dynamic state estimation for multi-machine power system by unscented Kalman filter with enhanced numerical stability. IEEE Trans. Smart Grid 9(2), 1184–1196 (2018) 41. S.D. Campbell, F.X. Diebold, Weather forecasting for weather derivatives. J. Am. Stat. Assoc. 100(469), 6–16 (2005) 42. R. Huang, T. Huang, R. Gadh, N. Li, Solar generation prediction using the ARMA model in a laboratory-level micro-grid, in 2012 IEEE Third International Conference on Smart Grid Communications (SmartGridComm) (IEEE, 2012), pp. 528–533 43. G.E. Box, G.M. Jenkins, G.C. Reinsel, G.M. Ljung, Time Series Analysis: Forecasting and Control (Wiley, Hoboken, 2015) 44. K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989) 45. M. Benghanem, A. Mellit, Radial basis function network-based prediction of global solar radiation data: application for sizing of a stand-alone photovoltaic system at Al-Madinah, Saudi Arabia. Energy 35(9), 3751–3762 (2010) 46. E. Lorenz, J. Hurka, D. Heinemann, H.G. Beyer, Irradiance forecasting for the power prediction of grid-connected photovoltaic systems. IEEE IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2(1), 2–10 (2009) 47. A.S.B.M. Shah, H. Yokoyama, N. Kakimoto, High-precision forecasting model of solar irradiance based on grid point value data analysis for an efficient photovoltaic system. IEEE Trans. Sustainable Energy 6(2), 474–481 (2015) 48. K. Benmouiza, A. Cheknane, Small-scale solar radiation forecasting using ARMA and nonlinear autoregressive neural network models. Theor. Appl. Climatol. 124(3-4), 945–958 (2016) 49. W. Ji, K.C. Chee, Prediction of hourly solar radiation using a novel hybrid model of ARMA and TDNN. Sol. Energy 85(5), 808–817 (2011) 50. M. Bouzerdoum, A. Mellit, A.M. Pavan, A hybrid model (SARIMA–SVM) for short-term power forecasting of a small-scale grid-connected photovoltaic plant. Sol. Energy 98, 226– 235 (2013) 51. P. Bacher, H. Madsen, H.A. Nielsen, Online short-term solar power forecasting. Sol. Energy 83(10), 1772–1783 (2009) 52. J.M. Guerrero, J.C. Vasquez, J. Matas, L.G. De Vicuña, M. Castilla, Hierarchical control of droop-controlled AC and DC microgrids—a general approach toward standardization. IEEE Trans. Ind. Electron. 58(1), 158–172 (2010) 53. P. Li, J. Qi, J. Wang, H. Wei, X. Bai, F. Qiu, An SQP method combined with gradient sampling for small-signal stability constrained OPF. IEEE Trans. Power Syst. 32(3), 2372–2381 (2016)

Chapter 8

Distributed Load Sharing Under Cyber Attacks

8.1 Introduction Microgrids are small distributed power systems which integrate distributed generators (DGs), energy storing devices, energy converters, load monitors, etc. [1]. They can either be connected to the main grid or operate autonomously. When connected with the main grid, they can not only consume power from the main grid but can also feed their redundant energy to the main grid. In the islanded mode, they have to balance their own supply and demand by load sharing control. The structure of microgrids is beneficial to take full advantage of distributed energy and maintain the synchronization of various forms of distributed power [2–4]. Security of power systems is a great concern for the system design and management [5, 6]. Recently, it has become even more crucial from both technological and economic perspectives, especially for system operators due to the recent introduction of performance-based rules [7, 8]. Due to an increasing number of cyber attacks, the power systems, specifically microgrids, are becoming more and more vulnerable. The performances of microgrids may seriously deteriorate in the presence of attacks. Typical cyber attacks in microgrids include False Data Injection (FDI) attacks and Denial-of-Service (DoS) attacks [9–11]. FDI attacks can maliciously destroy the system performance by injecting false information into the original data, while DoS attacks may damage the system operations by breaking communications between the agents. Chlela et al. provided an example to show the effect of these attacks on the distributed energy resources (DERs) active power, network frequency, and load active power [10]. An implementation example of FDI attacks in smart grid can be found in [9]. FDI attacks in microgrids have attracted considerable attention in recent years [12–19]. The existing literature has mainly focused on the evaluation of FDI attack effect [12, 13], intrusion detection technologies [14–16], and defense strategies [17–19]. Zhang et al. [12] study the effect of FDI attacks on the dynamic microgrid partitioning process. Chlela et al. [13] develop a hardware platform to examine the impact of an FDI attack on the microgrid performance indices, © Springer Nature Switzerland AG 2023 J. Qi, Smart Grid Resilience, https://doi.org/10.1007/978-3-031-29290-3_8

181

182

8 Distributed Load Sharing Under Cyber Attacks

including the total load lost, the frequency nadir, and latency time to achieve frequency stability. Li et al. [14] provide a conjunctive policy-based majority voting approach to detect the smart FDI attack actions in microgrids. Yang et al. [15] propose a Gaussian-mixture model-based detection method to discover FDI attacks, and a major advantage is that there is no need to predefine a detection threshold. Recognizing the varying of inferred candidate invariants, Beg et al. [16] design an intrusion detection method to judge the presence of FDI attacks. Hao et al. [17] consider the scenario that an FDI attacker injects false data into the intelligent voltage controller in a substation, which can negatively influence the performances of the microgrid. They provide an adaptive Markov strategy to defend against FDI attacks with unpredictable and dynamic behaviors. In order to eliminate an FDI attack that injects false data into the measurements of a microgrid, Rana et al. [18] present a recursive systematic convolutional code to append redundancy in the states of microgrid and a semidefinite programmingbased optimal control policy to defend against FDI attacks. Wang et al. [19] design a topology switch scheme to reduce the effect of FDI attacks on the measurements of microgrid. However, the intrusion detection of cyber attacks and the implementation of defense strategies may result in the loss of economy and the sacrifice of system performances. Therefore, it is important to be able to evaluate the effect of FDI attacks and decide whether it is necessary to implement defense measures. An important problem that needs to be carefully studied is the theoretical analysis of FDI attacks on the stability of microgrids. In this chapter, the system stability of inverter-based microgrid under FDI attacks [20] is discussed. The structure of inverter-based microgrid is introduced, and the model of FDI attacks that can inject false data into the bus agents is presented. A utilization level is then adopted to define a stable region, and the stability of microgrids with respect to the utilization level is theoretically investigated.

8.2 Inverter-Based Microgrid Structure As shown in Fig. 8.1, a microgrid has two layers, i.e., the physical layer and the cyber layer. The physical layer is an interconnected power grid which delivers power from the sources to consumers. The cyber layer consists of a sparse communication network, which is the medium for exchanging data between physical elements in the physical layer.

8.2.1 Physical Layer There are a number of DGs and local loads in the microgrid. Here inverter-based DGs are considered because their operation and control is more flexible as opposed to the conventional rotational machine-based generators. The inverter is an interface between the system and the DG which can be photovoltaic (PV) panels, fuel cells,

8.2 Inverter-Based Microgrid Structure

183

Fig. 8.1 The structure of microgrid. © [2019] IEEE. Reprinted, with permission, from [20]

or micro-turbines [21]. As shown in Fig. 8.1, a circuit breaker is usually utilized to connect the main grid and the microgrid. The microgrid can operate in either gridconnected or islanded mode. In the islanded mode, the microgrid has to maintain the power balance for a safe operation. Assume the microgrid has n buses. If a bus is not equipped with DG, it can be viewed as one having a DG with zero available power generation. Similarly, if a bus is not equipped with load, it can be viewed as one having load with zero demand. Therefore, the microgrid has n DGs and n loads.

8.2.2 Cyber Layer Bus agent (BA), which is installed at each bus, exchanges local information with its neighboring agents and runs a distributed load sharing algorithm to collect the global microgrid information. The computing process of the distributed algorithm only requires a sparse communication network and very limited data. An apparent advantage of such a solution is the flexibility and adaptability to different operating conditions [22]. The communication network of the agents can be formulated as an undirected graph .G = (V , E ), where .V = {v1 , v2 , . . . , vn } is the set of nodes which are the agents in the microgrid, and .E ⊆ V × V is the set of edges [23]. The edge .eij = (i, j ) ∈ E means that there is a communication link between nodes i and j . For an undirected graph .G , the statement that .eij ∈ E ⇐⇒ ej i ∈ E is true. Nodes i and j are adjacent if .eij ∈ E . Let .A = (aij )n×n be the adjacent matrix, in which .aij = 1 if .eij ∈ E and .aij = 0 otherwise. Define .Ni = {j ∈ V | eij ∈ E } as the neighbor set of node i. The Laplacian operation of a graph .G is defined as the positive n semidefinite matrix .L = D − A, where .D = diag{d1 , d2 , . . . , dn} with .di = j =1 aij . It can be easily seen that .L1n = 0, where .1n = [1, 1, · · · , 1] .

184

8 Distributed Load Sharing Under Cyber Attacks

8.3 System Dynamic Model The notations used in this section are listed in Table 8.1.

8.3.1 Small-Signal Model For a microgrid in the islanded mode, the small-signal model has three parts: the DG block, the network block, and the interface block [24]. The inverter-based DG block includes a local primary control loop and a secondary frequency control loop [24]. The local primary control loop has a power controller and an inner current loop (see Fig. 8.2). It can manage the output power in terms of the preset power points. The controllers in the local primary loop follow the proportional–integral (PI) control law. The controller for DG m is given by ip  λm  ref pp ref iq,m = λm + (Qm − Qm ). s

.

(8.1)

Table 8.1 Summary of notations m ref

.id,m

ref

.iq,m

pp

.λm

ip

.λm

pi

.λm

ii

.λm



.λm



.λm

ω

.λm .λω,m

ref

.PSF,m

ref

.PDC,m .Pm .Qm

ref

.Qm .ω0 .ωm

.vd,m .vq,m .id,m .iq,m

DG number Set point of the d-axis (direct) component of the current for the mth DG Set point of the q-axis (quadrature) component of the current for the mth DG Proportional power control gain of the mth DG Integral power control gain of the mth DG Proportional current control gain of the mth DG Integral current control gain of the mth DG Proportional control gain of the mth DG for the secondary frequency control Integral control gain of the mth DG for the secondary frequency control Droop control gain of the mth DG Frequency droop control gain of the mth DG Supplementary real-power set point of the mth DG assigned by the secondary frequency controller Corrective real power set point generated by the power control of the mth DG Instantaneous real power of the mth DG Instantaneous reactive power of the mth DG Reactive power set point of the mth DG Nominal frequency Instantaneous frequency obtained from a phase-locked loop (PLL) Component of the voltage set points on d-axis of the mth DG Component of the voltage set points on q-axis of the mth DG Instantaneous currents on d-axis of the mth DG Instantaneous currents on q-axis of the mth DG

8.3 System Dynamic Model

185

Fig. 8.2 Local primary DG control loops. © [2019] IEEE. Reprinted, with permission, from [20] ip  λm  ref pp ref ref − Pm ). id,m = λm + (PSF,m + PDC,m s  λii  ref pi vd,m = λm + m (id,m − id,m ). s  λii  ref pi vq,m = λm + m (iq,m − iq,m ), s

(8.2) (8.3) (8.4)

ref where .PDC,m refers to the .ω − P characteristic of the frequency droop control and ref ref is the supplementary can be calculated by .PDC,m = λω,m (ω0 − ωm ), and .PSF,m power set point of the m-th DG  assignedby the secondary frequency controller and λiω



ref = λ m can be obtained by .PSF,m m + s (ω0 − ωm ). Network block, the second part of the microgrid model, can be presented in a common reference frame x–y as follows:

 .

Δix Δiy





G = B

−B G



 ΔVx , ΔVy

where .ix = [ix1 , ix2 , . . . , ixn ] , .iy = [iy1 , iy2 , . . . , iyn ] with .ixm , iym , m = 1, . . . , n as the terminal current of the m-th DG in the common x-axis and yaxis, respectively, .Vx = [Vx1 , Vx2 , . . . , Vxn ] , .Vy = [Vy1 , Vy2 , . . . , Vyn ] with .Vxm , Vym , m = 1, . . . , n being the terminal voltage of the m-th DG in the common x-axis and y-axis, respectively, and the matrices .G and .B are obtained from the network admittance matrix [24].

186

8 Distributed Load Sharing Under Cyber Attacks

The third part, interface block, can be modeled as ΔVd = C0 ΔVx − Vx0 S0 Δδ + S0 ΔVy + Vy0 C0 Δδ

.

ΔVq = S0 ΔVx − Vx0 C0 Δδ + C0 ΔVy + Vy0 S0 Δδ Δix = C0 Δid − id0 S0 Δδ − S0 Δiq − iq0 C0 Δδ Δiy = S0 Δid + id0 C0 Δδ + C0 Δiq − iq0 S0 Δδ, where .δi is the individual inverter terminal voltage phase angle in the x–y reference frame and the matrices .C0 = diag{cos(δi0 )} and .S0 = diag{sin(δi0 )}. Then, according to [24], the whole system model can be expressed as follows: EΔx˙ = AΔx + Fr 0 ,

.

(8.5)

where .x = [δ, ω, id , iq , idref , iqref , ud , uq , P , Q, Pref , Vd , Vq , .ix , iy , Vx , Vy ] ,  .r 0 = [ω 0 ] , and the system matrix E is singular. Due to the limitation of space, the expressions of matrices .E and .A are omitted here, and they can be found in Appendix A of [24].

8.3.2 Active Power Reference The active power setting depends on the total power demand  and power generation. The total active power demand is representable as .Pd = nm=1 PmL + PLoss , where .PmL is the active power demand of load at bus m, and .PLoss is the total loss of active power, which is only a small proportion of the total active power demand. Denote max the maximum power generation of the m-th DG. Then we can present the by .PmG  max . Let total available power generation as .PGmax = nm=1 PmG U = min

.

  P d , 1 PGmax

(8.6)

be a common utilization level for all DGs [22, 25, 26]. Assume the load is less than the maximum available power generation. It can be seen that the supply and demand is balanced if the active power generation reference of the m-th DG, i.e., ref ref max .P mG , satisfies .PmG = U PmG . In fact, the balance of supply and demand can be achieved when the load demand .Pd is less than the maximum available power generation .PGmax . We have max ≤ 1. Furthermore, we can see that .U = Pd /P G n .

m=1

ref PmG

=

n m=1

max U PmG

n Pd max = max PmG = Pd . PG m=1

8.4 System Performance Under Attack

187

However, if the load demand is more than the maximum available power generation, i.e., .Pd > PGmax , then DGs should operate in maximum power point tracking (MPPT) mode, and the energy storage needs to compensate for the power shortage.

8.4 System Performance Under Attack 8.4.1 FDI Attack Against Distributed Load Sharing Control The global microgrid information includes the average power demands and available power generations of all BAs. It is the basis of designing the active power references of DGs. However, each BA only has its local information and the information from its neighbors. Hence, a distributed information processing law must be properly designed for the agents in order to obtain the global information. In the cyber layer, the information discovery at agent m can be represented by a linear time-invariant model       PmL (k + 1) PmL (k) umL (k) . = + , PmG (k + 1) PmG (k) umG (k) where .umL (k) and .umG (k) are the control input of load and generation, respectively. The objective of information discovery is to find a distributed control law such that all the states converge to the average value of initial states, i.e., .

lim PmL (k) = P¯L , lim PmG (k) = P¯G , m = 1, 2, . . . , n,

k→∞

k→∞

(8.7)

1 n 1 n PmL (0) and .P¯G = where .P¯L = PmG (0). n m=1 n m=1 Smart grid often suffers from FDI attacks [27, 28]. To analyze the impact of FDI attacks, assume the attacker has full knowledge of the power system [9]. The FDI attack on the information discovery processes can be modeled as follows:  .

     a (k) PmL PmL (k) amL (k) = + , a (k) PmG PmG (k) amG (k)

(8.8)

where .amL (k) and .amG (k) are the FDI data that is injected into the state of agent m a (k) and .P a (k) denote the state of agent m at time k when the at time k, and .PmL mG FDI attack is present. We focus on the discrete average consensus algorithm [29] under FDI attack. It can be seen that a a a .PmL (k + 1) = PmL (k) + wmj L [PjaL (k) − PmL (k)] j ∈Nm (k)

188

8 Distributed Load Sharing Under Cyber Attacks

= PmL (k) + amL (k) +



wmj L [Pj L (k)

j ∈Nm (k)

+aj L (k) − PmL (k) − amL (k)] =

n

wmj L Pj L (k) +

j =1

n

wmj L aj L (k),

j =1

where .Nm (k) is the set of agent m’s neighbors at time k, .wmj L is a positive weight with respect to load for .j ∈ Nm (k) which represents importance degree  of agent j ’s information from the viewpoint of agent m, and .wmmL (k) = 1 − j ∈Nm (k) wmj L . Its equivalent matrix form is given by PaL (k + 1) = WL (k)[PL (k) + AL (k)].

.

(8.9)

This means that the information discovery for the load under an FDI attack at time k + 1 is the linear combination of the information discovery without attack and the attack vector at time k. Similarly, we have

.

a PmG (k + 1) =

n

.

j =1

wmj G Pj G (k) +

n

wmj G aj G (k)

j =1

and the equivalent matrix form PaG (k + 1) = WG (k)[PG (k) + AG (k)]

.

(8.10)

for the power generation. Notice that .WL (k) and .WG (k) are predefined Perron matrices which depend on the structure of graph .G , i.e., WL (k) = E − L (k)L, and WG (k) = E − G (k)L,

.

where .L (k) and G (k) are  given parameters that satisfy .L (k) ∈ (0, 1/ρ), G (k) ∈ (0, 1/ρ) with .ρ = max{ j =i aij } (more details can be found in [30, Section II.C]). Before investigating the impact of an FDI attack on the microgrid, two basic assumptions are presented. Assumption 1 (Non-degeneracy [29]) There exists .w > 0 such that .wmm (k) ≥ w for all m and .w ≤ wmj (k) ≤ 1, or .w = 0, for all .m = j at any time k. Assumption 2 (Balanced communication [31]) For any time k, .1 W(k) = 1 , and .W(k)1 = 1. Assumption 1 can guarantee that each agent updates the states with its neighbors’ information, and Assumption 2 makes sure that all agents converge to the average initial states [32].

8.4 System Performance Under Attack

189

∞ ¯ Lemma ∞1 If Assumptions 1 and 2 hold for .WL and .WG , . k=1 |amL (k)| ≤ BL , and . k=1 |amG (k)| ≤ B¯ G , where .B¯ L and .B¯ G are constant FDI bounds on the load and generation of arbitrary agent m, respectively, then n

1



a lim

PmL (k) − P¯L ≤ nB¯ L , . k→∞ n

(8.11)

n

1



a lim

PmG (k) − P¯G ≤ nB¯ G . k→∞ n

(8.12)

.

m=1

m=1

Proof It is the direct result from Theorems 2 and 3 in [33]. Lemma 1 shows the property of difference between the convergence value of load under FDI attack and that without attack. The difference bound for load (generation) depends on the agents number and the bound of the false data injected into the bus agent. When an FDI attack is present, the utilization level is  a (k) limk→∞ n1 nm=1 PmL P¯La . .U = =  n a (k) P¯Ga limk→∞ n1 m=1 PmG a

(8.13)

∞ ¯ Theorem ∞ 1 If Assumptions 1 and 2 hold for .WL and .WG , . k=1 |amL (k)| ≤ BL , ¯ and . k=1 |amG (k)| ≤ BG , then .

P¯L − nB¯ L P¯L + nB¯ L ≤ Ua ≤ . ¯ ¯ PG + nBG P¯G − nB¯ G

(8.14)

Proof According to Lemma 1, we have |P¯L − P¯La | ≤ nB¯ L , |P¯G − P¯Ga | ≤ nB¯ G ,

.

which is equivalent to .

P¯L − nB¯ L ≤ P¯La ≤ P¯L + nB¯ L.

(8.15)

P¯G − nB¯ G ≤ P¯Ga ≤ P¯G + nB¯ G .

(8.16)

Then (8.14) can be obtained from (8.15)–(8.16).

8.4.2 Effects of FDI Attack on Microgrid Performance The characteristic equation of system (8.5) is .det(λE − A) = 0, where .λ is the eigenvalue to indicate the stability of system (8.5), i.e., the system is stable if the real part of .λ is less than 0 and it is unstable otherwise.

190

8 Distributed Load Sharing Under Cyber Attacks

The characteristic equation is often used to investigate the performance of microgrid systems [24]. In this chapter, we study the impact of FDI attacks on the microgrid performance with respect to the utilization level. Definition 1 A critical utilization interval denoted by .U = (U , U ) is called a /U. stable region if system (8.5) is stable for .U ∈ U , and it is unstable for .U ∈ Now it is ready to show the stability of a microgrid under FDI attack. ∞ ¯ Theorem ∞ 2 If Assumptions 1 and 2 hold for .WL and .WG , . k=1 |amL (k)| ≤ BL , and . k=1 |amG (k)| ≤ B¯ G , then 1. System (8.5) is stable, if  P¯ − nB¯ ¯L  L L P¯L + nB ⊂U. , P¯G + nB¯ G P¯G − nB¯ G

.

(8.17)

2. System (8.5) is unstable, if .

P¯L − nB¯ L P¯L + nB¯ L ≥ U or ≤ U. P¯G + nB¯ G P¯G − nB¯ G

(8.18)

Proof It can be directly obtained from Theorem 1 and Definition 1. Theorem 2 provides an important theoretical result for both the attacker and the defender. From the viewpoint of the FDI attackers, if they know the initial load information and generation information, they can design proper injection data to achieve their objective. If they want to make the system unstable, they can adopt the second statement of Theorem 2 to design an attack strategy. If they only aim at changing the average consensus values of load and power generation, the FDI attack strategy should satisfy the first statement of Theorem 2. From the viewpoint of the defenders, if they have learned the quantitative attack characteristics satisfying the boundedness assumption in this theorem, then they can design a new control policy to relieve the impact of the FDI attack.

8.5 Case Studies In order to show the performance of distributed load sharing under FDI attacks, an illustrative example is provided based on the Canadian urban distribution system in Fig. 8.3. The main parameters of this microgrid are given in Table 8.2. The microgrid is disconnected with the main grid from .t = 0.2 s.

8.5 Case Studies

191

Fig. 8.3 Canadian urban benchmark distribution system. © [2019] IEEE. Reprinted, with permission, from [20] Table 8.2 System parameters Parameters .Sbase .Vbase,1 .Vbase,2 .Vbase,3

Values 10 (MVA) √ √ .120 2/ 3 (kV) √ √ .12.5 2/ 3 (kV) √ √ .208 2/ 3 (kV)

Parameters .Rs .Xs .Rf .Xf

Values 1.73.×10−6 (p.u.) 3.47.×10−5 (p.u.) 0.0029 (p.u.) 0.0041 (p.u.)

Fig. 8.4 Variation of maximal real value of eigenvalues with respect to utilization level © [2019] IEEE. Reprinted, with permission, from [20]

8.5.1 Stable Region An illustrative example of a stable region is shown in Fig. 8.4. In this example, the utilization levels of all agents in Fig. 8.3 are varying in the interval .[0.1, 0.7]. It can be observed that the stable region is .U = (0, 0.38). In other words, when .U ∈ U , the maximal real value of eigenvalues is less than 0, so the system is stable. When .U ∈ / U , the system will become unstable. In general, the stable region is determined by the system parameters and it is challenging to find the analytical expression of the stable region. At the moment, the stable region for the given system can only be derived by numerical computation.

192

8 Distributed Load Sharing Under Cyber Attacks

8.5.2 System Performance Under Attack Strategy 1 Here the system performance is studied when there is an FDI attack with strategy 1: amL (k) = 0.05e−k−1 | sin[2π ξm (k)]|

.

amG (k) = 0.1e−k−1 | cos[2π ηm (k)]|, where .ξm (k) and .ηm (k) are with independent identical uniform distribution .U (0, 1). It is clear that |amL (k)| ≤ 0.05e−k−1 , |amG (k)| ≤ 0.1e−k−1 , k = 1, 2, . . . .

.

Thus, it can be verified that ∞ .

|amL (k)| ≤ B¯ L =

k=1 ∞ k=1



0.05e−k−1 =

k=1

|amG (k)| ≤ B¯ G =

∞ k=1

0.1e−k−1 =

0.05e−2 , 1 − e−1

0.1e−2 . 1 − e−1

It means that the conditions in Theorems 1 and 2 hold for this strategy. For the microgrid system under FDI attack with strategy 1, according to the simulation results, we have .P¯L = 0.1418 and .P¯G = 0.9137, and then it can be easily verified that  P¯ − nB¯ ¯L  L L P¯L + nB = (0.0991, 0.2230) ⊂ U . , P¯G + nB¯ G P¯G − nB¯ G

.

Thus, condition (8.17) holds when attack strategy 1 is implemented. Figs. 8.5, 8.6, 8.7 and 8.8 present the voltage magnitudes, frequencies, loads, and generations under attack strategy 1, respectively. It is seen that a steady state can still be reached in a short time even when the agents are under FDI attacks. This confirms the first statement of Theorem 2. Compared with the microgrid performances in the absence of attack (see Figs. 8.9, 8.10, 8.11 and 8.12), although the voltage magnitudes, frequencies, loads, and powers are still convergent under attack strategy 1, the convergence values are different from those in the absence of attack.

8.5 Case Studies

193 1.4 V1

V(p.u.)

V(p.u.)

1.5

1

V2

1.2 1 0.8

0.5 5

10

15

20

5

Time (s)

15

20

1.5

1.4 V3

1.2

V4

V(p.u.)

V(p.u.)

10

Time (s)

1

1

0.5

0.8 5

10

15

5

20

10

15

20

Time (s)

Time (s)

Fig. 8.5 Voltage magnitudes under attack strategy 1. © [2019] IEEE. Reprinted, with permission, from [20] 400 1

390

0

380

(rad/s)

(rad/s)

400

370

2

390

0

380 370

5

10

15

20

5

Time (s) 400

20

15

400 3

390

0

380

(rad/s)

(rad/s)

10

Time (s)

370

4

390

0

380 370

5

10

15

20

Time (s)

5

10

15

20

Time (s)

Fig. 8.6 Frequency under attack strategy 1. © [2019] IEEE. Reprinted, with permission, from [20]

8.5.3 System Performance under Attack Strategy 2 Now consider an FDI attack with Strategy 2: amL (k) = 4.5e−k−1 | sin[2π ξm (k)]|

.

amG (k) = 0.1e−k−1 | cos[2π ηm (k)]|. Similar to attack strategy 1, we have

194

8 Distributed Load Sharing Under Cyber Attacks 0.21

0.25

Load(p.u.)

Load(p.u.)

Load 1

0.2

0.15

Load 2

0.2 0.19 0.18

20

40

60

80

20

100

0.25

60

80

100

0.3

Load(p.u.)

Load 3

Load(p.u.)

40

Iterations

Iterations

0.2

0.15

Load 4

0.2 0.1 0

20

40

60

80

100

20

Iterations

40

60

80

100

Iterations

Fig. 8.7 Average load information discovery under attack strategy 1. © [2019] IEEE. Reprinted, with permission, from [20] 0.95 Power 1

Power(p.u.)

Power(p.u.)

0.915 0.91 0.905

Power 2

0.9 0.85 0.8

0.9 0

50

100

0

Iterations 0.93

100

0.96 Power 3

Power(p.u.)

Power(p.u.)

50

Iterations

0.92 0.91 0.9

Power 4

0.94 0.92 0.9

0

50

100

Iterations

0

50

100

Iterations

Fig. 8.8 Average power generation information discovery under attack strategy 1. © [2019] IEEE. Reprinted, with permission, from [20]

∞ .

|amL (k)| ≤ B¯ L =

k=1 ∞



4.5e−k−1 =

4.5e−2 , 1 − e−1

0.1e−k−1 =

0.1e−2 . 1 − e−1

k=1

|amG (k)| ≤ B¯ G =

k=1

∞ k=1

P¯L − nB¯ L = 1 > U = 0.38. Thus condition P¯G + nB¯ G (8.18) holds. Figures 8.13, 8.14, 8.15 and 8.16 show the voltage magnitudes, frequencies, loads, and generations under attack strategy 2, respectively. According to Figs. 8.15 and 8.16, the loads and generations can still reach steady states when all the agents are under FDI attacks with strategy 2. However, the performances of For this attack strategy, there is .

8.5 Case Studies

195 1.2 V1

V(p.u.)

V(p.u.)

1.2

1

0.8

V2

1

0.8 5

10

15

20

5

Time (s)

15

20

1.2

1.2 V3

V(p.u.)

V(p.u.)

10

Time (s)

1

V4

1

0.8

0.8 5

10

15

5

20

10

15

20

Time (s)

Time (s)

Fig. 8.9 Voltage magnitudes in the absence of attack. © [2019] IEEE. Reprinted, with permission, from [20] 390 1 0

380

(rad/s)

(rad/s)

390

370

2 0

380

370 5

10

15

20

5

Time (s) 390

15

20

390 3 0

380

(rad/s)

(rad/s)

10

Time (s)

370

4 0

380

370 5

10

Time (s)

15

20

5

10

15

20

Time (s)

Fig. 8.10 Frequency in the absence of attack. © [2019] IEEE. Reprinted, with permission, from [20]

voltage magnitudes and frequencies are significantly influenced. Figures 8.13 and 8.14 indicate that the voltage magnitudes and frequencies drastically run up and down, and the microgrid cannot achieve a steady state when attack strategy 2 is launched. In practice, the per unit voltage is around one and will usually not get to a state with a per unit voltage greater than 1.2. Since the frequency is usually maintained within a tight bound, the DG may have already been tripped after its frequency is out of bound, either instantaneously or after a certain time delay depending on the actual frequency [34]. Thus, a limit can be set for voltage and frequency to stop the simulation when hitting that limit.

196

8 Distributed Load Sharing Under Cyber Attacks 0.16 Load 1

0.14

Load(p.u.)

Load(p.u.)

0.16

0.12

Load 2

0.14

0.12 20

40

60

80

100

20

60

80

100

0.15 Load 3

0.14 0.13

Load(p.u.)

0.15

Load(p.u.)

40

Iterations

Iterations

Load 4

0.1

0.05

0.12 20

40

60

80

20

100

40

60

80

100

Iterations

Iterations

Fig. 8.11 Average load information discovery in the absence of attack. © [2019] IEEE. Reprinted, with permission, from [20]

Power 1

0.9

Power(p.u.)

Power(p.u.)

0.92 Power 2

0.9

0.85

0.8

0.88 0

50

0

100

100

1 Power 3

0.9

Power(p.u.)

0.92

Power(p.u.)

50

Iterations

Iterations

Power 4

0.95

0.9

0.88 0

50

Iterations

100

0

50

100

Iterations

Fig. 8.12 Average power generation information discovery in the absence of attack © [2019] IEEE. Reprinted, with permission, from [20]

8.5 Case Studies

197 1.5

V(p.u.)

V(p.u.)

1.5

1

1 0.5 V2

V1

0.5

0 5

10

15

20

5

Time (s) 1.5

15

20

1.5

V(p.u.)

V(p.u.)

10

Time (s)

1

1 V4

V3

0.5

0.5 5

10

15

20

5

Time (s)

10

15

20

Time (s)

Fig. 8.13 Voltage magnitudes under attack strategy 2. © [2019] IEEE. Reprinted, with permission, from [20]

385 1

0

380 375

(rad/s)

(rad/s)

385

2

375 370

370 5

10

15

5

20

10

15

20

4

0

15

20

Time (s)

Time (s) 385

385 3

380 375

0

(rad/s)

(rad/s)

0

380

370

380 375 370

5

10

Time (s)

15

20

5

10

Time (s)

Fig. 8.14 Frequency under attack strategy 2. © [2019] IEEE. Reprinted, with permission, from [20]

198

8 Distributed Load Sharing Under Cyber Attacks 1 Load 1

Load(p.u.)

Load(p.u.)

1

0.5

0

Load 2

0.5

0 20

40

60

80

100

20

Iterations 1

60

80

100

1 Load 3

Load(p.u.)

Load(p.u.)

40

Iterations

0.5

0

Load 4

0.5

0 20

40

60

80

100

20

Iterations

40

60

80

100

Iterations

Fig. 8.15 Average load information discovery under attack strategy 2. © [2019] IEEE. Reprinted, with permission, from [20] 0.95 Power 1

0.91 0.905

Power(p.u.)

Power(p.u.)

0.915

0.9

Power 2

0.9 0.85 0.8

0

50

100

0

Iterations 0.93

100

0.96 Power 3

0.92 0.91 0.9

Power(p.u.)

Power(p.u.)

50

Iterations Power 4

0.94 0.92 0.9

0

50

Iterations

100

0

50

100

Iterations

Fig. 8.16 Average power generation information discovery under attack strategy 2. © [2019] IEEE. Reprinted, with permission, from [20]

References 1. D. Chen, Y. Xu, A.Q. Huang, Integration of DC microgrids as virtual synchronous machines into the AC grid. IEEE Trans. Ind. Electron. 64(9), 7455–7466 (2017) 2. J.M. Guerrero, M. Chandorkar, T.-L. Lee, P.C. Loh, Advanced control architectures for intelligent microgrids—part i: decentralized and hierarchical control. IEEE Trans. Ind. Electron. 60(4), 1254–1262 (2013) 3. A. Ovalle, G. Ramos, S. Bacha, A. Hably, A. Rumeau, Decentralized control of voltage source converters in microgrids based on the application of instantaneous power theory. IEEE Trans. Ind. Electron. 62(2), 1152–1162 (2015) 4. P. Sreekumar, V. Khadkikar, Direct control of the inverter impedance to achieve controllable harmonic sharing in the islanded microgrid. IEEE Trans. Ind. Electron. 64(1), 827–837 (2017) 5. N. Liu, J. Chen, L. Zhu, J. Zhang, Y. He, A key management scheme for secure communications of advanced metering infrastructure in smart grid. IEEE Trans. Ind. Electron. 60(10), 4746–4756 (2013)

References

199

6. J. Qi, A. Hahn, X. Lu, J. Wang, C.-C. Liu, Cybersecurity for distributed energy resources and smart inverters. IET Cyber-Phys. Syst. Theory Appl. 1(1), 28–39 (2016) 7. C.K. Veitch, J.M. Henry, B.T. Richardson, D.H. Hart, Microgrid Cyber Security Reference Architecture (Sandia National Laboratories, Albuquerque, 2013) 8. C. Zhao, J. He, P. Cheng, J. Chen, Analysis of consensus-based distributed economic dispatch under stealthy attacks. IEEE Trans. Ind. Electron. 64(6), 5107–5117 (2017) 9. Y. Liu, P. Ning, M.K. Reiter, False data injection attacks against state estimation in electric power grids. ACM Trans. Inf. Syst. Secur. 14(1), 13:1–13:33 (2011) 10. M. Chlela, G. Joos, M. Kassouf, Impact of cyber-attacks on islanded microgrid operation, in Proceedings of the Workshop on Communications, Computation and Control for Resilient Smart Energy Systems (ACM, 2016), p. 1 11. G. Liang, J. Zhao, F. Luo, S.R. Weller, Z.Y. Dong, A review of false data injection attacks against modern power systems. IEEE Trans. Smart Grid 8(4), 1630–1638 (2017) 12. X. Zhang, X. Yang, J. Lin, W. Yu, On false data injection attacks against the dynamic microgrid partition in the smart grid, in 2015 IEEE International Conference on Communications (ICC) (IEEE, 2015), pp. 7222–7227 13. M. Chlela, G. Joos, M. Kassouf, Y. Brissette, Real-time testing platform for microgrid controllers against false data injection cybersecurity attacks, in 2016 IEEE Power and Energy Society General Meeting (PESGM) (IEEE, 2016), pp. 1–5 14. B. Li, R. Lu, W. Wang, K.-K.R. Choo, Distributed host-based collaborative detection for false data injection attacks in smart grid cyber-physical system. J. Parallel Distrib. Comput. 103, 32–41 (2017) 15. X. Yang, P. Zhao, X. Zhang, J. Lin, W. Yu, Toward a gaussian-mixture model-based detection scheme against data integrity attacks in the smart grid. IEEE Internet Things J. 4(1), 147–161 (2017) 16. O.A. Beg, T.T. Johnson, A. Davoudi, Detection of false-data injection attacks in cyber-physical DC microgrids. IEEE Trans. Industr. Inform. 13(5), 2693–2703 (2017) 17. J. Hao, E. Kang, J. Sun, Z. Wang, Z. Meng, X. Li, Z. Ming, An adaptive Markov strategy for defending smart grid false data injection from malicious attackers. IEEE Trans. Smart Grid 9(4), 2398–2408 (2016) 18. M.M. Rana, L. Li, S.W. Su, Cyber attack protection and control in microgrids using channel code and semidefinite programming, in 2016 IEEE Power and Energy Society General Meeting (PESGM) (IEEE, 2016), pp. 1–5 19. S. Wang, W. Ren, Stealthy false data injection attacks against state estimation in power systems: switching network topologies, in 2014 American Control Conference (IEEE, 2014), pp. 1572– 1577 20. H. Zhang, W. Meng, J. Qi, X. Wang, W.X. Zheng, Distributed load sharing under false data injection attack in an inverter-based microgrid. IEEE Trans. Ind. Electron. 66(2), 1543–1551 (2019) 21. A. Pilloni, A. Pisano, E. Usai, Robust finite-time frequency and voltage restoration of inverterbased microgrids via sliding-mode cooperative control. IEEE Trans. Ind. Electron. 65(1), 907– 917 (2017) 22. W. Meng, X. Wang, S. Liu, Distributed load sharing of an inverter-based microgrid with reduced communication. IEEE Trans. Smart Grid 9(2), 1354–1364 (2016) 23. Q. Li, C. Peng, M. Chen, F. Chen, W. Kang, J.M. Guerrero, D. Abbott, Networked and distributed control method with optimal power dispatch for islanded microgrids. IEEE Trans. Ind. Electron. 64(1), 493–504 (2017) 24. S. Liu, X. Wang, P.X. Liu, Impact of communication delays on secondary frequency control in an islanded microgrid. IEEE Trans. Ind. Electron. 62(4), 2021–2031 (2015) 25. W. Zhang, Y. Xu, W. Liu, F. Ferrese, L. Liu, Fully distributed coordination of multiple DFIGs in a microgrid for load sharing. IEEE Trans. Smart Grid 4(2), 806–815 (2013) 26. Y. Xu, W. Zhang, W. Liu, X. Wang, F. Ferrese, C. Zang, H. Yu, Distributed subgradient-based coordination of multiple renewable generators in a microgrid. IEEE Trans. Power Syst. 29(1), 23–33 (2014)

200

8 Distributed Load Sharing Under Cyber Attacks

27. R. Deng, G. Xiao, R. Lu, Defending against false data injection attacks on power system state estimation. IEEE Trans. Industr. Inform. 13(1), 198–207 (2017) 28. M. Esmalifalak, G. Shi, Z. Han, L. Song, Bad data injection attack and defense in electricity market using game theory study. IEEE Trans. Smart Grid 4(1), 160–169 (2013) 29. V.D. Blondel, J.M. Hendrickx, A. Olshevsky, J.N. Tsitsiklis, Convergence in multiagent coordination, consensus, and flocking, in Proceedings of the 44th IEEE Conference on Decision and Control (CDC) (2005), pp. 2996–3000 30. R. Olfati-Saber, J.A. Fax, R.M. Murray, Consensus and cooperation in networked multi-agent systems. Proc. IEEE 95(1), 215–233 (2007) 31. R. Olfati-Saber, R.M. Murray, Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans. Autom. Control 49(9), 1520–1533 (2004) 32. A. Olshevsky, J.N. Tsitsiklis, Convergence speed in distributed consensus and averaging. SIAM J. Control. Optim. 48 (1), 33–55 (2009) 33. J. He, M. Zhou, P. Cheng, L. Shi, J. Chen, Consensus under bounded noise in discrete network systems: an algorithm with fast convergence and high accuracy. IEEE Trans. Cybern. 46(12), 2874–2884 (2016) 34. I.S. Association et al., IEEE 1547 Standard for Interconnecting Distributed Resources with Electric Power Systems (IEEE Standards Association, Piscataway, 2003)

Chapter 9

Deep Learning Based Attack Detection for Microgrid Control

9.1 Introduction Microgrids are formed when distributed generators (DGs), energy storage systems, and loads are clustered as a single controllable entity to operate either independently or in conjunction with the main grid [1–3]. Microgrids can provide strong support to the grid by alleviating stresses, reducing feeder losses, and improving reliability, efficiency, and scalability [1]. For controlling microgrids, either centralized or distributed approach can be adopted [4]. In centralized control, a high-bandwidth, point-to-point communication is required between the central controller and local DG control units [1], which increases the communication and computational costs [1, 5]. The central controller also suffers from single point of failure [5]. By contrast, the distributed control that utilizes a sparse communication network provides a promising solution [1, 2], in which each DG only has access to the information of itself and its neighboring DGs [6], reducing computational complexity and the requirement on communication network and improving scalability, reliability, and resilience [1]. For the distributed control of AC microgrids, droop-free distributed control has recently been proposed [1, 2], which successfully achieves the objectives of average voltage regulation and active–reactive power sharing among the DGs. However, the control formulation in [1, 2] relies on extensive use of PI controllers and may not always theoretically guarantee convergence. Thus a generalized distributed control framework based on a formally formulated optimization problem is required for optimally coordinating the voltage regulation and power sharing objectives in AC microgrid control. Despite the advantages of distributed control, cyber-physical security has become a major concern [7, 8]. Due to a lack of central authority and relatively low security levels, distributed controllers are more susceptible to cyber attacks than their centralized counterparts. A malicious entity may inject false measurements to the exchanged data by attacking the nodes or the communication links [7]. Because of the collaborative nature of state update in the distributed control, a simple cyber © Springer Nature Switzerland AG 2023 J. Qi, Smart Grid Resilience, https://doi.org/10.1007/978-3-031-29290-3_9

201

202

9 Deep Learning Based Attack Detection for Microgrid Control

attack such as false data injection (FDI) attack on an agent may make the controller deviate from the optimal solution or even make the system unstable [9]. In [10], FDI attacks on both nodes and communication links are presented in a droop-controlled microgrid and a mitigation approach based on the convergence of dual variables is developed. To mitigate the impact of FDI attack on frequency synchronization of AC microgrids, a distributed observer based attack detection strategy has been studied in [5]. The authors in [11] consider the application of a time-varying communication graph for FDI attack detection and mitigation purpose. In [12], the Kullback–Libeler (KL) divergence criterion is proposed for detecting FDI attacks. In [8], a distributed robust state estimation approach is integrated with the distributed control to enhance the attack resilience. Experiments have demonstrated that data-driven deep learning algorithms can identify abnormal activities that cannot be detected by conventional bad data detection [13]. Various deep learning algorithms have been proposed to identify FDI attacks [14–16]. Detecting these attacks in real time usually involves preprocessing historical sensor data for offline training and then using the trained model for new data in an online manner using multivariate data streams [13, 17]. This multivariate time series data scheme presents a challenge for classification [18]. Although different attack detection strategies are proposed for microgrids, application of multi-label deep learning based detection approach is relatively new [19]. In this chapter, the secondary control of AC microgrids is formulated as a constrained optimization problem with voltage and frequency as control variables [20]. A primal–dual gradient based distributed solving algorithm is developed. Then FDI attacks against the distributed control are designed for the DG output voltage, active power, and reactive power measurements. FDI attack detection is further formulated as a multi-label classification problem by considering the inconsistency and co-occurrence dependencies. The multivariate time series of power flow measurements are preprocessed and fed into deep learning models for feature extraction and attack detection.

9.2 Distributed Control and FDI Attack 9.2.1 Cyber-Physical Representation of AC Microgrids Let the buses be those at the output of the LC filter of each DG. The remaining buses in the network are eliminated by Kron reduction. Denote the bus admittance matrix of the reduced network by .Y. The linearized approximation of the active and reactive power utilization ratios of DG i is [21]: λ Pi =

.

N    Gij vj − Bij θj /P i. j =1

(9.1)

9.2 Distributed Control and FDI Attack

λQi = −

203

N    Bij vj + Gij θj /Qi ,

(9.2)

j =1

where N is the number of DGs, .vj and .θj are the voltage magnitude and phase angle of bus j , .Gij and .Bij are the real and imaginary parts of the admittance matrix .Y, and .P i and .Qi are the active and reactive power limits of DG i. For distributed control implementation, we consider a sparse communication network that is modeled as a directed graph .G = {V , E } in which nodes are agents and edges are the communication links connecting nodes. The communication network can be represented by an adjacency matrix .A = [aij ] ∈ RN ×N , where .aij > 0 if there is a connection between nodes i and j and .aij = 0 otherwise. It is assumed that .G has a spanning tree and a balanced Laplacian matrix [1, 2].

9.2.2 Secondary Control Problem Formulation The design objectives of the secondary control of AC microgrids include (1) regulating frequency back to nominal frequency, (2) regulating the average voltage of the output buses of all inverters to the rated voltage, and (3) achieving proportional active and reactive power sharing among all inverters. Thus the following optimization problem is defined for DG i: 1 .min fi = v,ω 2

 j ∈Ni

s.t.

  2   2 1 2 aij λPi − λPj + aij λQi − λQj + (ωi − ω0 ) ε j ∈Ni

1 v/N − v r = 0,

(9.3)

where .v r is the rated voltage in per unit and .ω0 is the nominal frequency, .ε > 0 is a constant, .v = [v1 , v2 , · · · , vN ] is the DG output voltage vector, .ω = [ω1 , ω2 , · · · , ωN ] is the DG frequency vector, and .Ni is the set of neighbors of DG i in .G . Note that the linearized approximation in (9.1)–(9.2) is adopted in order to get a convex optimization problem. Let .h(v) = 1 v/N − v r . The Lagrangian function for the optimization problem (9.3) can be defined as Li (v, ω) = fi + μi h,

.

(9.4)

where .μi is the Lagrange multiplier for the equality constraint. In this Lagrangian function, the output voltage from all DGs is needed to compute the global average which may not be locally available due to the distributed implementation of the control. To address this problem, a distributed average voltage estimator is implemented. Specifically, the average voltage of all inverter output buses, .1 v[n]/N, can be estimated by DG .i = 1, . . . , N as .viav [n] using the following distributed observer

204

9 Deep Learning Based Attack Detection for Microgrid Control

based on dynamic consensus [1]: viav [n] = vi [n] +

n  

.

  aij vjav [t] − viav [t] Δt,

(9.5)

t=0 j ∈Ni

where .Δt is the step size. It is proven in [1] that for .∀i = 1, 2, · · · , N, .viav converges to a consensus value that is the true global average voltage when the communication network has a spanning tree and a balanced Laplacian matrix.

9.2.3 Distributed Control Algorithm To implement the distributed control in Sect. 9.2.2, the gradients of the Lagrangian function .Li with respect to .vi and .ωi are evaluated as follows: .

∂Li ∂(fi + μi h) = ∂vi ∂vi  G Gij  ii = aij (λPi − λPj ) − Pi Pj j ∈N i

+



aij (λQi − λQj )

 −B

ii

Qi

j ∈Ni

+

 Bij  Qj

+ μi Dvi |viav − v r |

.

∂Li ∂(fi + μi h) = ∂ωi ∂ωi   −B Bij  ii + = aij (λPi − λPj ) Pi Pj j ∈N i

+



aij (λQi − λQj )

j ∈Ni

 −G

ii

Qi

+

 Gij  ∂θi ∂ωi Qj

1 + (ωi − ω0 ), ε where .Dvi is the operator for subgradient with respect to .vi . Using Leibniz’s rule for differentiation under the integral sign [22], there is ∂θi ∂ . = ∂ωi ∂ωi



t

t−Δt

  ωi (τ ) − ω0 dτ = Δt.

(9.6)

9.2 Distributed Control and FDI Attack

205

A primal–dual gradient [23] based distributed algorithm is implemented to solve the optimization problem (9.3). For agent i, the variable update equations are vi [n + 1] = vi [n] − α∂Li /∂vi.

(9.7)

θi [n + 1] = θi [n] + (ωi [n] − ω0 )Δt.

(9.8)

.

ωi [n + 1] = ωi [n] − ε∂Li /∂ωi   −B Bij  ii = ω0 − ε aij (λPi [n] − λPj [n]) + Pi Pj j ∈Ni   −G  Gij  ii + + aij (λQi [n] − λQj [n]) Δt. Qi Qj j ∈N

(9.9)

i

μi [n + 1] = μi [n] + γ |viav [n + 1] − v r |.

(9.10)

When the DGs achieve active and reactive power sharing in steady state, the second term on the right-hand side of (9.9) becomes zero, which implies that the microgrid will achieve frequency synchronization. The algorithm is presented as Algorithm 9.1. Algorithm 9.1 Primal–Dual Gradient Based Distributed Control Algorithm for DG i Initialization: Set vi [0] = vimeas , ωi [0] = ωimeas , θi [0] = θimeas , μi [0] = 0, and viav [0] = vi [0]. Set n = 0. (S.1) update vi [n + 1] based on (9.7) (S.2) update θi [n + 1] based on (9.8) (S.3) update ωi [n + 1] based on (9.9) (S.4) update viav [n + 1] based on (9.5) (S.5) update μi [n + 1] based on (9.10) (S.6) increase n by 1 and go to (S.1)

9.2.4 FDI Attack Against Distributed Control The following three types of FDI attack models are considered. 1. Attacks on voltage measurements: The DG output voltage measurements are randomly changed by injecting an attack vector .uav . The attack vector .uav is a normally distributed random vector with zero mean and standard deviation as 20% of the initial voltage. Due to the presence of attack, the output voltage of the DGs can be written as .va = v + uav , where .va and .v, respectively, represent the corrupted and actual measurements. 2. Attacks on active and reactive power measurements: For active power .P and reactive power .Q measurements, the corrupted measurements .Pa and .Qa can

206

9 Deep Learning Based Attack Detection for Microgrid Control

be written as .Pa = P + uaP and .Qa = Q + uaQ , respectively, where .uaP and a are normally distributed random attack vectors with zero mean and standard .u Q deviations as 20% of the initial values. 3. Attacks on voltage, active power, and reactive power measurements: An extreme scenario is considered in which attack vectors .uav , .uaP , and .uaQ are injected to the DG output voltage, active power, and reactive power measurements.

9.3 Deep Learning Based Multi-label Attack Detection 9.3.1 Multi-label Classification Problem Formation Using one classifier to simultaneously evaluate multiple classes creates a substantial computational advantage over using multiple classifiers. For example, a single model can simultaneously detect anomaly events (i.e., change detection) and identify the anomaly types (i.e., anomaly diagnosis) after certain time steps using multivariate time series. Formally, FDI attack detection can be defined as a problem of multivariate time series classification with multi-label classes. Figure 9.1 illustrates the problem setting and the notation. Given the historical data of n time series with length T , i.e., .X = (x1 , · · · , xn ) ∈ Rn×T , assume there is no anomaly before T . For the time segment, given a set of labels .Y , the task is to classify the data as that of regular behavior or anomaly. The input multivariate time series .X are voltage, active power, reactive power, and frequency, and the output labels 1×4 are normal, load change, voltage attack, and power attack. .Y ∈ R

9.3.2 Data Preparation and Preprocessing Features are extracted from the recorded multivariate time series, which are first preprocessed through a series of nonlinear transformations.

Fig. 9.1 Overall architecture of the deep learning based FDI attack detection as a multi-label classification problem. © [2021] IEEE. Reprinted, with permission, from [20]

9.3 Deep Learning Based Multi-label Attack Detection

207

• Concatenation: There are four time series matrices that represent the four features for classification and six time series in each matrix that represent each of the six DGs. All of these time series are concatenated to create a multivariate time series feature matrix with 24 rows. • Downsampling: Each 22-s simulation contains data that is sampled twice every 0.0001s. This data is down sampled to 100 samples per second. • Window Slicing: The data matrix is segmented into windows of 24 by 500, which is (6 DGs .× 4 features) by 500 time points. This is done using the window slicing technique in [24], where a window of the specified size slides across the time dimension of the data matrix with no overlap, effectively segmenting the time series. • Labeling: The four labels are normal, load change, attack on voltage, and attack on power. The samples from the first 30 s before the event are labeled as normal. The samples 15 s after the event is introduced are labeled according to the occurring events.

9.3.3 Deep Learning Models Time series classification, especially that of multivariate nature, is an ongoing topic of research in machine learning. Here InceptionTime [25] that is considered to be the state-of-the-art for time series classification model is compared with the baseline ResNet model [26]. The basic structure of these models is shown in Fig. 9.1. Both feature residual connections are comprised of 1D convolutions and batch normalization. Inception has six residual blocks with max pooling and features a bottleneck layer, while ResNet has three residual blocks with no pooling or bottleneck. The residual block layer connects to a global average pooling layer which then feeds to a dense fully connected network. The sigmoid function is employed at the output layer and the nonbinary values are transformed to binary using Matthew’s correlation coefficient threshold calibration. The dataset consists of 1200 samples that are split with 33% reserved for testing. Glorot’s uniform initialization is employed for all models. The training is done using the mini-batch size of 15. The loss function is minimized for model optimization. The multi-label classification scheme minimizes the total loss—the sum of the binary cross-entropy loss function over each of the four labels in all training samples: L =

.

C    Ti · log2 Oi + (1 − Ti ) · log2 (1 − Oi ) ,

(9.11)

i=1

where .C = 4 is the class/label number, .Oi is the predicted label of output node i, and .Ti is the target label. The optimization is processed by a variant of Stochastic Gradient Descent (SGD), Adam, and the learning rate (with a minimum of 0.0001)

208

9 Deep Learning Based Attack Detection for Microgrid Control

is reduced by a factor of 0.5 each time the model’s training loss has not improved for 10 consecutive epochs. Traditionally, both the InceptionTime and ResNet models use the ReLu activation function between convolutional layers. This function was created in an attempt to rectify the vanishing gradient problem exhibited by logistic functions such as sigmoid and hyperbolic tangent. While rectifying the vanishing gradient, ReLu suffers from the “dying ReLu” problem due to the lack of gradient in the negative region, where the function is equal to 0. The swish activation function has been proposed as an alternate to the ReLU. The Swish creates a sigmoidal shape gradient in the negative region, where the sigmoidal shape can be tuned or trained using the parameter .β in the swish function as .f (x) = x·sigmoid(βx), where .β = 1 is used in these experiments. This function has been shown to outperform the ReLu function in many situations [27]. The InceptionTime and ResNet models are modified to use the Swish function and the results are compared to the ReLu variants.

9.4 Performance Evaluation 9.4.1 Test System and Control Performance The distributed control based on Algorithm 9.1 is tested on the modified IEEE 34-bus distribution test system shown in Fig. 9.2. This system has 6 DGs and 9 loads. The line parameters are adopted from [28]. The simulations are performed in Matlab without including the detailed zero-level control of the inverters. The communication network used for the distributed control is shown on the left-hand side of Fig. 9.2. The .α, .ε, and .γ in (9.7), (9.9), and (9.10) are, respectively, selected as .0.000075, .0.000075, and .0.0001. For data preparation, 110 test cases are generated for each of the scenarios— load change, voltage attack under load change, active/reactive power attack under load change, and voltage and active–reactive power attack under load change. For

Fig. 9.2 Modified IEEE 34-bus distribution test system (blue circles indicate DGs and red rectangles indicate loads; the communication network is shown on the left). © [2021] IEEE. Reprinted, with permission, from [20]

9.4 Performance Evaluation

209

Fig. 9.3 DG output voltages, active/reactive power, and frequency when only load change applied at 30 s. © [2021] IEEE. Reprinted, with permission, from [20]

Fig. 9.4 DG output voltages, active/reactive power, and frequency when both FDI attack on voltage, active power, and reactive power measurements and load change are applied at 30 s. © [2021] IEEE. Reprinted, with permission, from [20]

applying load change, we consider a normally distributed random change with zero mean and standard deviation as 20% of the initial loads. In Fig. 9.3, a load change is applied at 30 s without any FDI attack. In Fig. 9.4, both load change and FDI attack on voltage and active/reactive power measurements are applied at 30 s. The DG output voltages, active/reactive power sharing, and output frequencies under the distributed control are shown in these two figures. Although the same load change is considered, different steady states are obtained due to the presence of FDI attack. The FDI attack can mislead the microgrid operator as normal load change events and is challenging to detect. This motivates the deep learning based multi-label attack detection approaches.

210

9 Deep Learning Based Attack Detection for Microgrid Control

9.4.2 Deep Learning Performance Metrics The results are evaluated with precision, recall, and F1 score, which are typical multi-label classification performance metrics to evaluate deep learning based models. The precision is the ratio .tp/(tp + fp), where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative. The recall is the ratio .tp/(tp + f n) where tp is the number of true positives and f n the number of false negatives. The recall reflects the ability of the classifier to find all the positive samples. F1 score can be interpreted as a weighted average of the precision and recall: F1 = 2 ·

.

Precision × Recall , Precision + Recall

(9.12)

where F1 micro is calculated globally by counting the total true positives, false negatives, and false positives, while F1 macro is calculated for each label, and their unweighted mean is then calculated. The relative trade-off between tp rate and fp rate is depicted using AUC, the area under the ROC curve (receiver operating characteristic curve). This measure shows how good the model is at making correct predictions. An AUC closer to 1 signifies excellent performance. The micro-average and macro-average ROC curves over all labels are plotted to assess the multi-label performance. Further analysis is done on the performance of single labels.

9.4.3 FDI Attack Detection Results The ResNet and Inception models are both tested for their abilities to distinguish between load change only and attack only scenarios. Both models are above 99% accurate in this case. The following results show how well the models can identify attacks when they are coordinated with load changes. The three performance metrics are compared among the models with the results shown in Table 9.1. The implementation of the Swish activation function significantly improves the overall accuracy of the models. This shows that more samples have all labels correctly classified when using Swish. The ROC curves are shown in Fig. 9.5. All models have macro- and microaverage AUC greater than or equal to 0.92, showing very good performance over Table 9.1 Model performance results

Model Inception ResNet Inception ResNet

Activation ReLU ReLU Swish Swish

Precision (%) 91.03 95.97 97.06 97.13

Recall (%) 95.31 95.12 96.27 97.84

F1 (%) 87.13 96.84 97.87 96.45

References

211

Fig. 9.5 ROC curves. Top: Inception and ResNet wish ReLU. Bottom: Inception and ResNet with Swish. © [2021] IEEE. Reprinted, with permission, from [20]

multiple labels. The individual labels show that the power attack detection was slightly worse than the rest of the labels. The performance of Inception and ResNet is comparable, but ResNet is shown to have better performance in both the ReLu and Swish models. ResNet with Swish is the best performing model with the highest overall accuracy at 91%. The ResNet model also trains twice as fast as the Inception due to less residual connections, making it a more desirable model.1

References 1. V. Nasirian, Q. Shafiee, J.M. Guerrero, F.L. Lewis, A. Davoudi, Droop-free distributed control for AC microgrids. IEEE Trans. Power Electron. 31(2), 1600–1617 (2016) 2. S.M. Mohiuddin, J. Qi, Droop-free distributed control for AC microgrids with precisely regulated voltage variance and admissible voltage profile guarantees. IEEE Trans. Smart Grid 11(3), 1956–1967 (2020) 3. D.E. Olivares, A. Mehrizi-Sani, A.H. Etemadi, C.A. Cañizares, R. Iravani, M. Kazerani, A.H. Hajimiragha, O. Gomis-Bellmunt, M. Saeedifard, R. Palma-Behnke et al., Trends in microgrid control. IEEE Trans. Smart Grid, 5(4), 1905–1919 (2014) 4. H. Sun, Q. Guo, J. Qi, V. Ajjarapu, R. Bravo, J. Chow, Z. Li, R. Moghe, E. Nasr-Azadani, U. Tamrakar, G. N. Taranto, R. Tonkoski, G. Valverde, Q. Wu, G. Yang, Review of challenges and research opportunities for voltage control in smart grids. IEEE Trans. Power Syst. 34(4), 2790–2801 (2019)

1 Source code and testing data for reproducing the results are available at https://github.com/IRES-

FAU/Multi-Label-Attack-Detection-for-Microgrids.

212

9 Deep Learning Based Attack Detection for Microgrid Control

5. S. Abhinav, H. Modares, F.L. Lewis, F. Ferrese, A. Davoudi, Synchrony in networked microgrids under attacks. IEEE Trans. Smart Grid 9(6), 6731–6741 (2018) 6. S.M. Mohiuddin, J. Qi, A unified droop-free distributed secondary control for grid-following and grid-forming inverters in AC microgrids, in IEEE Power and Energy Society General Meeting (2020), pp. 1–5 7. J. Qi, A. Hahn, X. Lu, J. Wang, C.-C. Liu, Cybersecurity for distributed energy resources and smart inverters. IET Cyber-Phys. Syst. Theory Appl. 1(1), 28–39 (2016) 8. S.M. Mohiuddin, J. Qi, Attack resilient distributed control for AC microgrids with distributed robust state estimation, in 2021 IEEE Texas Power and Energy Conference (TPEC) (2021), pp. 1–6 9. H. Zhang, W. Meng, J. Qi, X. Wang, W.X. Zheng, Distributed load sharing under false data injection attack in an inverter-based microgrid. IEEE Trans. Ind. Electron. 66(2), 1543–1551 (2019) 10. L. Lu, H.J. Liu, H. Zhu, C. Chu, Intrusion detection in distributed frequency control of isolated microgrids. IEEE Trans. Smart Grid 10(6), 6502–6515 (2019) 11. A. Bidram, B. Poudel, L. Damodaran, R. Fierro, J.M. Guerrero, Resilient and cybersecure distributed control of inverter-based islanded microgrids. IEEE Trans. Ind. Informat. 16(6), 1–1 (2019) 12. A. Mustafa, B. Poudel, A. Bidram, H. Modares, Detection and mitigation of data manipulation attacks in AC microgrids. IEEE Trans. Smart Grid 11(3), 2588–2603 (2020) 13. X. Niu, J. Li, J. Sun, K. Tomsovic, Dynamic detection of false data injection attack in smart grid using deep learning, in 2019 IEEE Power Energy Society Innovative Smart Grid Technologies Conference (ISGT) (2019), pp. 1–6 14. J. Wei, G.J. Mendis, A deep learning-based cyber-physical strategy to mitigate false data injection attack in smart grids, in 2016 Joint Workshop on Cyber- Physical Security and Resilience in Smart Grids (CPSR-SG) (2016), pp. 1–6 15. Y. He, G.J. Mendis, J. Wei, Real-time detection of false data injection attacks in smart grid: a deep learning-based intelligent mechanism. IEEE Trans. Smart Grid 8(5), 2505–2516 (2017) 16. S. Basodi, S. Tan, W. Song, Y. Pan, Data integrity attack detection in smart grid: a deep learning approach. Int. J. Secur. Netw. 15, 15 (2020) 17. Y. Zhang, J. Wang, B. Chen, Detecting false data injection attacks in smart grids: a semisupervised deep learning approach. IEEE Trans. Smart Grid, 12, 1–1 (2020) 18. H.I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, P.-A. Muller, Deep learning for time series classification: a review. Data Min. Knowl. Disc. 33(4), 917–963 (2019) 19. S. Wang, S. Bi, Y.-J.A. Zhang, Locational detection of the false data injection attack in a smart grid: a multilabel classification approach. IEEE Internet Things J. 7(9), 8218–8227 (2020) 20. S.M. Mohiuddin, J. Qi, S. Fung, Y. Huang, Y. Tang, Deep learning based multi-label attack detection for distributed control of ac microgrids, in 2021 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm) (IEEE, 2021), pp. 233–238 21. J. Yang, N. Zhang, C. Kang, Q. Xia, A state-independent linear power flow model with accurate estimation of voltage magnitude. IEEE Trans. Power Syst. 32(5), 3607–3617 (2017) 22. Y. Xu, Z. Qu, J. Qi, State-constrained grid-forming inverter control for robust operation of AC microgrids, in 2020 European Control Conference (ECC) (2020), pp. 471–474 23. Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 (Springer Science & Business Media, Berlin, 2013) 24. A. Le Guennec, S. Malinowski, R. Tavenard, Data augmentation for time series classification using convolutional neural networks, in ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data (2016) 25. H. Ismail Fawaz, B. Lucas, G. Forestier, C. Pelletier, D.F. Schmidt, J. Weber, G.I. Webb, L. Idoumghar, P.A. Muller, F. Petitjean L.B.-F. G. e.a. Ismail Fawaz, H., InceptionTime: finding AlexNet for time series classification. Data Min. Knowl. Disc. 34, 1936–1962 (2020)

References

213

26. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778 27. P. Ramachandran, B. Zoph, Q.V. Le, Swish: a self-gated activation function. Neural. Comput. Appl. 16, 5 (2017) 28. N. Mwakabuta, A. Sekar, Comparative study of the IEEE 34 node test feeder under practical simplifications, in Proceeding 39th North American Power Symposium (2007), pp. 484–491

Part IV

Smart Grid Resilience Under System Interdependency

Chapter 10

Interdependency Between Power System Outages by Branching Process

10.1 Introduction As a high-level probabilistic model, the Galton–Watson branching process [1, 2] can statistically describe how the number of outages propagates in a cascading blackout and the statistics of the total number of outages [3–7]. Its simplicity allows a high-level understanding of the cascading process without getting entangled in the complicated mechanisms of cascading. Although branching process does not directly represent any of the physics or mechanisms of the outage propagation, after it is validated it can be used to predict the total number of outages. The parameters of the branching process can be estimated from a much smaller dataset, and then predictions of the total number of outages can be made based on the estimated parameters. The ability to do this via branching process with much less data is a significant advantage that enables practical applications. In one-type branching process, the initial outages propagate randomly to produce subsequent outages in generations. Each outage (a “parent” outage) independently produces a random nonnegative integer number of outages (“children” outages) in the next generation. The children outages then become parents to produce another generation until the number of outages in a generation becomes zero. The distribution of the number of children from one parent is called the offspring distribution. The mean of this distribution is the parameter .λ, which is the average number of children outages for each parent outage and can quantify the tendency for the cascade to propagate in the sense that larger .λ corresponds to faster propagation. For cascading blackout, .λ < 1 and the outages will always die out. In real cascading blackouts, different types of outages such as line outages, load shedding, and isolated buses can exist simultaneously. More importantly, these outages are usually interdependent, and thus their propagation can be better understood only when they can be described jointly. Also, if we want to evaluate the time that is needed to restore the system after a cascading outage event, we need to know how many buses and lines are still in service, as well as the amount of the load © Springer Nature Switzerland AG 2023 J. Qi, Smart Grid Resilience, https://doi.org/10.1007/978-3-031-29290-3_10

217

218

10 Interdependency Between Power System Outages by Branching Process

shed. But we may not have all these data and thus need to predict some of them by only using the available data. This will require the analysis of the interdependency of different types of outages. Multi-type branching process is a generalization of one-type branching process. Each type i outage in one generation (a type i “parent” outage) independently produces a random nonnegative integer number of outages of the same type (type i “children” outages) and any other type (type t “children” outages where .t = i). All generated outages in different types comprise the next generation. The process ends when the number of outages in all types becomes zero. For an n-type branching process, there are .n2 offspring distributions and thus .n2 offspring means, which can be arranged into a matrix called the offspring mean matrix .Λ. The criticality of multi-type branching process is determined by the largest eigenvalue of .Λ. The process will always extinct if the largest eigenvalue of .Λ is less than or equal to one [1, 2]. In this chapter, the interdependency different types of outages in power systems, such as line outages, the load shed, and isolated buses, are described by the Galton– Watson multi-type branching process [7]. The parameters of branching processes are estimated by the Expectation Maximization (EM) algorithm [8]. The joint distributions of total outages are efficiently estimated by multi-type branching processes via the Lagrange–Good inversion [9]. The multi-type branching process discussed in this chapter can not only quantify the interdependency between different types of outages in power systems but can also be used to study the interactions between different infrastructure systems, such as between electric power systems and communication networks [10], natural gas networks [11], water systems [12], and transportation networks [13, 14].

10.2 Estimating Multi-type Branching Process Parameters A total of M cascades for n type of outages are arranged as generation 0 cascade 1 .

cascade 2 .. .

(Z01,1 , · · · (Z02,1 , · · · .. .

, Z01,n ) , Z02,n )

generation 1 (Z11,1 , · · · (Z12,1 , · · · .. .

, Z11,n ) , Z12,n )

··· ··· ··· .. .

cascadeM (Z0M,1 , · · · , Z0M,n ) (Z1M,1 , · · · , Z1M,n ) · · · Here .Zgm,t is the number of type t outages in generation g of cascade number m and n is the number of types of outages. Each cascade has a nonzero number of outages in generation zero for at least one type of outages and each type of outage should have a nonzero number of outages at least for one generation. The shortest cascades

10.2 Estimating Multi-type Branching Process Parameters

219

stop in generation one, but some cascades will continue for several generations before terminating. Note that continuous data such as the load shed need to be first discretized by the method in [6]. When estimating the offspring mean matrix and the empirical joint distribution of the initial outages, we do not need all M cascades but only .Mu ≤ M cascades. For n-type branching processes where .n ≥ 2, the offspring mean .λ will be generalized to the offspring mean matrix .Λ. Different from the branching processes with only one type, for which the criticality is directly determined by the offspring mean .λ, the criticality of multi-type branching processes is determined by the largest eigenvalue of .Λ, which is denoted by .ρ. If .ρ ≤ 1, the multi-type branching process will always extinct. If .ρ > 1, the multi-type branching process will extinct with a probability .0 ≤ q < 1 [2]. The largest eigenvalue .ρ of the mean matrix can be estimated as the total number of all types of children divided by the total number of all types of parents by directly using the simulated cascades and ignoring the types [15]: M ∞ u 

n 

Zgm,t

m=1 g=1 t=1

ρˆ =

.

M ∞ u 

n 

(10.1)

. Zgm,t

m=1 g=0 t=1

When the number of type j children to type i parents .S (i,j ) and the total number of type i parents .S (i) are observed, .λij (the expected number of type j children generated by one type i parent) can be estimated by a maximum likelihood estimator that is the total number of type j children produced by type i parents divided by the total number of type i parents [16]: λˆ ij =

.

S (i,j ) , S (i)

(10.2)

where .S (i,j ) and .S (i) can be described by using the simulated cascades as S (i,j ) =

Mu  ∞ 

.

m,i→j

Zg

.

(10.3)

m=1 g=1

S

(i)

=

Mu  ∞ 

Zgm,i ,

(10.4)

m=1 g=0 m,i→j

where .Zg is the number of type j offspring generated by type i parents in generation g of cascade m. However, it is usually impossible to have so detailed information. For cascading blackouts, it is difficult to determine the exact number of type j outages that are produced by type i outages, due to too many mechanisms in cascading. In other

220

10 Interdependency Between Power System Outages by Branching Process m,i→j

words, .Zg in (10.3) cannot be determined, and thus .Y (i,j ) cannot be decided and the mean matrix cannot be estimated. To solve this problem, EM algorithm [8] is used. It is a method for finding maximum likelihood estimates of parameters in statistical models where the model depends on unobserved latent variables. Besides, the offspring distributions of branching processes are assumed to be Poisson. There are general arguments suggesting that the choice of a Poisson offspring distribution is appropriate [5, 6] since offspring outages being selected from a large number of possible outages have very small probability and are approximately independent. The EM algorithm mainly contains two steps, which are E-step and M-step. For the estimation of the offspring mean matrix of an n-type branching process, the EM algorithm can be formulated as follows: (0)

ˆ . 1. Initialization: Set initial guess of the mean matrix as .Λ Since for cascading blackouts the outages will always die out, we have .0 ≤ λij ≤ 1. Based on this, all elements of the initial mean matrix are set to be 0.5, which is the midpoint of the possible range. ˆ (k) . 2. E-step: Estimate .S (i,j )(k+1) based on .Λ Under the assumption that the offspring distributions are all Poisson, for generation .g ≥ 1 of cascade m, the number of type j offspring produced by type .t = 1, . . . , n parents follows Poisson distribution m,t→j

Zg

.

  m,t ˆ (k) λtj . ∼ Pois Zg−1

(10.5)

Thus the number of type j offspring in generation .g ≥ 1 of cascade m produced by type i parents in generation .g − 1 of the same cascade is: m,i→j .Zg

=

m,j Zg

m,i ˆ (k) λij Zg−1 . n  m,t ˆ (k) Zg−1 λtj

(10.6)

t=1 m,i→j

After obtaining .Zg for all generations .g ≥ 1 of cascades .m = 1, . . . , Mu , we are finally able to calculate .S (i,j ) by using (10.3). ˆ (k+1) based on .S (i,j )(k+1) . 3. M-step: Estimate .Λ ˆ (k+1) can be estimated After obtaining .S (i,j )(k+1) , the updated mean matrix .Λ with the estimator given in (10.2). 4. End: Iterate the E-step and M-step until .

max

i,j ∈{1,··· ,n}

 (k+1) (k)  ˆλ − λˆ ij  < , ij

(10.7)

where . is the tolerance that is used to control the accuracy and .λˆ (k+1) is the final ij estimate of .λij .

10.3 Estimating Joint Probability Distribution of Total Outages

221

10.3 Estimating Joint Probability Distribution of Total Outages 10.3.1 n-Type Branching Process The probability generating function for the type i individual of an n-type branching process is fi (s1 , · · · , sn ) =

∞ 

.

pi (u1 , · · · , un )s1u1 · · · snun ,

(10.8)

u1 ,··· ,un =0

where .pi (u1 , · · · , un ) is the probability that a type i individual generates .u1 type 1, · · · , .un type n individuals. If we assume that the offspring distributions for various types of outages are all Poisson, (10.8) can be easily written after the offspring mean ˆ is estimated by the method in Sect. 10.2. matrix .Λ According to [2] and [9], the probability generating function, .wi (s1 , · · · , sn ), of the total number of various types of individuals in all generations, starting with one individual of type i, can be given by

.

wi = si fi (w1 , · · · , wn ),

.

i = 1, . . . , n.

(10.9)

When branching process starts with more than one type of individuals, the total number of various types can be determined by using the Lagrange–Good inversion in [9], in which the following theorem is given. Theorem 1 [9] If the n-type branching process starts with .r1 individuals of type 1, r2 of type 2, etc., then the probability that the whole process will have precisely .m1 of type 1, .m2 of type 2, etc. is equal to the coefficient of

.

s1m1 −r1 · · · snmn −rn

.

in    sμ ∂fμ  , f1m1 · · · fnmn δμν − fμ ∂sν 

.

(10.10)

where .||aμν || denotes the determinant of the .n × n matrix whose entry is .aμν (μ, ν = 1, . . . , n) and .δμν is Kronecker’s delta (.= 1 if .μ = ν, otherwise .= 0). We denote the coefficient of .s1m1 −r1 · · · snmn −rn as .c(r1 , · · · , rn ; m1 , · · · , mn ). For an n-type branching process, the empirical joint probability distribution of the number of initial outages .(Z01 , · · · , Z0n ) can be obtained as

222

10 Interdependency Between Power System Outages by Branching Process emp

dZ0 (z01 , · · · , z0n ) = P (Z01 = z01 , · · · , Z0n = z0n )

.

=

Mu 1  1[Z0m,1 = z01 , · · · , Z0m,n = z0n ], Mu

(10.11)

m=1

where .1[event] is the indicator function that evaluates to one when the event happens and evaluates to zero when the event does not happen. Given the joint probability distribution of initial sizes .P (Z01 , · · · , Z0n ) and the generating functions in (10.8), the joint probability distribution .d est (y1 , · · · , yn ) of 1 , · · · , Y n ) can then be calculated as the total number of various types .(Y∞ ∞ 1 n dYest∞ (y1 , · · · , yn ) =P (Y∞ = y1 , · · · , Y∞ = yn )

.

z01 =y1 ,··· ,z0n =yn



=

 P (Z01 = z01 , · · · , Z0n = z0n )

z01 ,··· ,z0n =0 z01 +···+z0n =0

 · c(z01 , · · · , z0n ; y1 , · · · , yn ) .

(10.12)

10.3.2 Two-Type Branching Process The empirical joint probability distribution of the number of initial outages .(Z01 , Z02 ) can be obtained by (10.11). Assume the offspring distributions for various types of outages are all Poisson. Then the probability generating functions for a two-type branching process can be written as ∞ 

f1 (s1 , s2 ) =

.

u1 =u2 =0

f2 (s1 , s2 ) =

∞ 

.

u1 =u2 =0

λu111 λu122 e−λ11 −λ12 u1 u2 s1 s2 u1 ! u2 !

(10.13)

λu211 λu222 e−λ21 −λ22 u1 u2 s1 s2 , u1 ! u2 !

(10.14)

where the parameters .λ11 , .λ12 , .λ21 , and .λ22 can be estimated by the method in Sect. 10.2. In (10.10), the .n × n matrix whose determinant needs to be evaluated is actually ⎡



⎢ 1 − s1 ∂f1 − s1 ∂f1 ⎢ f1 ∂s1 f1 ∂s2 .⎢ ⎢ ⎣ s2 ∂f2 s2 ∂f2 − 1− f2 ∂s1 f2 ∂s2

⎥ ⎥ ⎥. ⎥ ⎦

10.3 Estimating Joint Probability Distribution of Total Outages

223

The joint probability distribution of the two-type branching process can be obtained by evaluating (10.12) with elementary algebra. Since the coefficients in (10.13)–(10.14) will decrease very fast with the increase of the order of .s1 and .s2 , we can use a few terms to approximate the generating functions to reduce the calculation burden while guaranteeing accurate enough results. Also, the probability obtained by (10.12) will decrease with the increase of .y1 and .y2 . Therefore, we do not need to calculate the negligible probability for too large blackout size. Specifically, we can only calculate the joint probability for .

y1 = z01 , . . . , z01 + τ1

(10.15)

y2 = z02 , . . . , z02 + τ2 ,

(10.16)

and .

where .τ1 and .τ2 are integers properly chosen for a trade-off of calculation burden and accuracy. Too large .τ1 or .τ2 will lead to unnecessary calculation for blackout sizes with negligible probability. Too small .τ1 or .τ2 will result in loss of accuracy by neglecting blackout sizes with not so small probability.

10.3.3 Validation of Estimated Joint Distribution emp

The empirical joint distribution .dY∞ (y1 , · · · , yn ) can be calculated by emp

1 n dY∞ (y1 , · · · , yn ) =P (Y∞ = y1 , · · · , Y∞ = yn )

.

=

1 = y ,··· ,Yn = y ) N(Y∞ 1 n ∞ , M

(10.17)

1 = y , · · · , Y n = y ) is the number of cascades for which there are where .N (Y∞ 1 n ∞ .y1 type 1 outages, .· · · , and .yn type n outages. 1 , · · · , Y n ) estimated in The joint distribution of n types of blackout size .(Y∞ ∞ Sect. 10.3 can be validated by comparing it with the empirically obtained joint distribution in (10.17). 1 , · · · , Y n ) as 1. Joint entropy is defined for n random variables .(Y∞ ∞ 1 n H (Y∞ , · · · , Y∞ )   ··· P (y1 , · · · , yn ) log2 [P (y1 , · · · , yn )], =−

.

y1

(10.18)

yn

where .P (y1 , · · · , yn ) log2 [P (y1 , · · · , yn )] is defined to be 0 if .P (y1 , · · · , yn ) = 0.

224

10 Interdependency Between Power System Outages by Branching Process

The joint entropy for the estimated and the empirical joint distribution can be, respectively, denoted by .H est and .H emp . Then the estimated joint distribution can be validated by checking if .H est /H emp is close to 1.0. 2. The marginal distribution for each type of outages is calculated after estimating the joint distribution of the total outages, which is compared with the empirical marginal distribution directly calculated from the simulated cascades in order to validate the estimated joint distribution. 3. The conditional largest possible total outages (CLOs) of one type of blackout size are calculated when the total outage of the other types of blackout size is known. For example, for a two-type branching process, for .i, j ∈ {1, 2} and .i = j , given the total outage of one type of blackout size .yi , we can get the total outage of another type of blackout size .yj that satisfies j

i P (Y∞ ≤ yj |Y∞ = yi ) = pconf ,

.

(10.19)

where j

i P (Y∞ ≤ yj |Y∞ = yi )

.

yj i = y , Y j = k)  P (Y∞ i ∞ = , ∞  i = y , Y j = l) k=0 P (Y∞ i ∞

(10.20)

l=0

pconf is the confidence level close to .1.0 and .P (A|B) is the conditional probability of event A given B. If the total outage of type i is .yi , from the joint distribution we know that the total outage of type j will not exceed .yj with a high probability .pconf . We can calculate the .yj from either the empirical joint distribution or the estimated joint distribution by branching process and compare them to check if the .yj from the estimated joint distribution is close to that from the empirical joint distribution.

.

10.4 Number of Cascades Needed How many cascades are needed to empirically obtain a reliable joint distribution of total outages? How many cascades are needed to get a reliable estimate of the offspring mean matrix and the joint distribution of the initial outages which can further guarantee that the estimated joint distribution of total outages is close enough to the reliable empirical joint distribution? Here, we answer these questions and determine the lower bounds for M and .Mu .

10.4 Number of Cascades Needed

225

10.4.1 Determining Lower Bound for M More cascades tend to contain more information about the property of cascading failures of a system. The added information brought from the added cascades will make the joint entropy of the joint distribution empirically obtained from the cascades increase. However, the amount of information will not always grow with the increase of the number of cascades but will saturate after the number of cascades is greater than some number .M min , which can be determined by gradually increasing the number of cascades, recording the corresponding joint entropy of the empirical joint distribution, and finding the smallest number of cascades that can lead to the saturated joint entropy (the amount of information). Assume there are a total of .NM different M’s ranging from a very small number to a very large number, which are denoted by .Mi , i = 1, 2, . . . , NM . The joint entropy of the joint distribution of total outages obtained from .Mi cascades is denoted by .H emp (Mi ). For .i = 1, . . . , NM − 2, we define emp

σi = σ (Hi

.

),

(10.21)

emp

where .Hi = [H emp (Mi ) · · · H emp (MNM )] and .σ (·) is the standard deviation of a vector. The .σi for .i = NM − 1 and .i = NM are not calculated since we want to calculate the standard deviation for at least three data points. Very small and slightly fluctuating .σi indicates that the joint entropy begins to saturate after min , where . is a small .Mi . The .Mi corresponding to .σi ≤ σ is identified as .M σ real number. .M min original cascades can guarantee that the accuracy on statistical values of interest is good and thus can provide a reference joint distribution of the total outages.

10.4.2 Determining Lower Bound for Mu When we only want to obtain good enough estimate of the joint distribution of total sizes, we do not need as many as .M min cascades but only .Mumin cascades to make sure that the information extracted from .Mumin cascades by branching process can capture the general properties of the cascading failures. Since both .H emp and .H est vary with .Mu , we denote them by .H emp (Mu ) and est .H (Mu ). .H emp (Mu ) can be directly obtained from the cascades by (10.17) and (10.18) and .H est (Mu ) can be calculated by (10.12) and (10.18). When .Mu is not large enough, it is expected that there will be a big mismatch between .H emp (Mu ) and .H est (Mu ), indicating that the estimated joint distribution from branching process cannot well capture the property of the joint distribution of the cascades. But with the increase of .Mu more information will be obtained, and thus the mismatch will gradually decrease and finally stabilization. In order to indicate the stabilization, we define

226

10 Interdependency Between Power System Outages by Branching Process

R(Mu ) =

.

|H est (Mu ) − H emp (Mu )| , H emp (Mu )

(10.22)

start from a small integer .Mu0 and increase it gradually by .ΔM each time and calculate the standard deviation of .R(Mu ) for the latest three data points by σ˜ i = σ (Ri ), i ≥ 2,

.

(10.23)

where i denotes the latest data point and Ri = [R(Mui−2 ) R(Mui−1 ) R(Mui )].

.

(10.24)

Then .Mumin is determined as the smallest value that satisfies .σ˜ i ≤ H , where .H is used to determine the tolerance for stabilization. By decreasing .ΔM, we can increase the accuracy of the obtained .Mumin . But smaller .ΔM will increase the times of calculating the joint distribution by branching processes. When more types of outages are considered, greater .Mumin will be needed, in which case larger .ΔM can be chosen to avoid too many times of calculating the joint distribution. A cascading outage dataset is produced by the open-loop AC OPA simulation [17, 18] on the IEEE 118-bus test system, which is standard except that the line flow limits are determined with the same method in [6]. The probability for the initial line outage is .p0 = 0.0001 and the load variability .γ = 1.67, which are the same as [6]. To test the multi-type branching process model, the simulation is run so as to produce .M = 50,000 cascades with a nonzero number of line outages at the base case load level. In each generation, the number of line outages and the number of isolated buses are counted and the continuously varying amounts of the load shed are discretized as described in [6] to produce integer multiples of the chosen discretization unit. For determining .M min , we choose .NM = 50 and the data points are linearly scaled. The .σ is chosen as .0.002. In order to determine .Mumin , we choose .Mu0 , .H , and .ΔM as 100, .0.002, and 100 for one type of outages and 1000, .0.002, and 500 for multiple types of outages, since the .Mumin for multiple outages case is expected to be greater and we need to limit the calculation burden. The determined .M min and .Mumin for different types of outages are listed in Table 10.1. The .Mumin used for estimation is significantly smaller than .M min , thus helping greatly improve the efficiency.

10.5 Estimated Parameters of Branching Processes The . in (10.7) is chosen as .0.01. The EM algorithm that is used to estimate the offspring mean matrix of the multi-type branching processes can quickly converge. The number of iterations .N ite is listed in Table 10.2.

10.5 Estimated Parameters of Branching Processes

227

Table 10.1 Number of cascades needed Number of types 1 1 1 2 2

Type Line outage Load shed Isolated bus Line outage and load shed Line outage and isolated bus

Table 10.2 Number of iterations of EM algorithm

.M

min

min

.Mu

18,000 36,000 33,000 39,000 37,000

1400 1900 900 6500 5500

Type Line outage and load shed Line outage and isolated bus

.Mu

.N

39,000 6500 37,000 5500

7 7 4 4

ite

Table 10.3 Estimated parameters of branching processes by (10.1) and the EM algorithm using min cascades

.M

Type Line outage Load shed Isolated bus Line outage and load shed

ˆ .λ

.ρˆ

ˆ .Λ

0.45 0.48 0.14

– – –

– – – 



0.55

.

 Line outage and isolated bus



0.60

.

0.45



0.42

0.0018 0.029 0.45

0.40



6.0 × 10−5 0.0049

The estimated branching process parameters are listed in Table 10.3, where .λˆ is the offspring mean for one type of outages estimated by the method in [6]. The estimated largest eigenvalue of the offspring mean matrix .ρˆ is greater than the estimated offspring means for only considering one type of outages, indicating that the system is closer to criticality when we simultaneously consider two types of outages. This is because different types of outages, such as line outage and the load shed, can mutually influence each other, thus aggregating the propagation of cascading. In this case, only considering one type of outages will underestimate the extent of outage propagation. ˆ is the estimated expected discretized number of the load shed when The .λˆ 12 in .Λ one line is tripped and .λˆ 21 is the estimated expected number of line outages when ˆ we can see that line outages tend one discretization unit of load is shed. From .Λ, to have a greater influence on the load shed and isolated buses, but the influence of the load shed or isolated buses on line outages is relatively weak. This is reasonable since in real blackouts it is more possible for line tripping to cause the load shed or isolated buses. Sometimes line outages directly cause load shed or isolated buses, for example, the simplest case occurs when a load is fed from a radial line.

228

10 Interdependency Between Power System Outages by Branching Process

Table 10.4 Estimated parameters of branching processes by (10.1) and the EM algorithm using min cascades

.Mu

Type Line outage Load shed Isolated bus Line outage and load shed

ˆ .λ

.ρˆ

ˆ .Λ

0.45 0.49 0.15

– – –

– – – 



0.56

.

 Line outage and isolated bus



0.61

.

0.45



0.43

0.0020 0.027 0.45

0.39



5.5 × 10−5 0.0040

Note that there is some mismatch between the largest eigenvalue of the offspring mean matrix .ρˆ estimated from (10.1) and that calculated from the estimated ˆ by the EM algorithm. The estimator in (10.1) is the offspring mean matrix .Λ maximum likelihood estimator of the largest eigenvalue of the offspring mean matrix [15], which does not need to make any assumption about the offspring distribution. By contrast, in order to estimate the offspring mean matrix, a specific offspring distribution, such as Poisson distribution, is assumed. As mentioned in Sect. 10.2, there are general arguments suggesting that the choice of a Poisson offspring distribution is appropriate. However, the offspring distribution is only approximately Poisson but not necessarily exactly Poisson. Numerical simulation of multi-type branching processes with Poisson offspring distributions shows that the ˆ do agree with each other. estimated .ρˆ and the largest eigenvalue of the estimated .Λ Therefore, the largest eigenvalue estimated from (10.1) without any assumption of the offspring distribution is expected to be more reliable and the closeness of the system to criticality should thus be determined based on the estimated .ρˆ from (10.1). The estimated parameters for branching processes by only using .Mumin cascades are listed in Table 10.4, which are very close to those estimated by using .M min cascades, indicating that .Mumin cascades are enough to get a good estimate.

10.6 Estimated Joint Distribution of Total Outages In (10.13) and (10.14), the highest orders for both .s1 and .s2 are chosen as 4. In (10.15) and (10.16), .τ1 and .τ2 are chosen based on the number of initial outages from the samples of cascades and the trade-off between calculation burden and accuracy. For line outages and the load shed, .τ1 and .τ2 are chosen as 12 and 9, respectively. For line outages and isolated buses, .τ1 and .τ2 are chosen as 12 and 18, respectively. It has been shown for one-type branching process that it is much more time efficient to estimate the parameters of a branching process from a shorter simulation run and then predict the distribution of total outages by branching process than it is to run much longer simulation in order to accumulate enough cascades to

10.6 Estimated Joint Distribution of Total Outages

229

Table 10.5 Joint entropy of distributions Type Line outage Load shed Isolated bus Line outage and load shed Line outage and isolated bus

emp

.Mu

.H

18,000 1400 36,000 1900 33,000 900 39,000 6500 37,000 5500

3.50 3.48 3.52 3.53 2.63 2.59 6.99 6.94 5.33 5.30

.H

est

3.91 3.92 3.56 3.57 2.64 2.61 7.08 7.06 6.45 6.44

empirically estimate the distribution [5, 6]. Here, we estimate the joint distribution of total outages with multi-type branching process by using .Mumin M min cascades and compare it with the empirical joint distribution obtained from .M min cascades. To quantitatively compare the empirical and estimated joint distributions, the joint entropy is calculated and listed in Table 10.5. The joint entropy of the estimated joint distributions is reasonably close to that of the empirical joint distributions. Also, the joint entropy of the distributions for two types of outages is significantly greater than that for one type of outages, meaning that new information can be obtained by jointly analyzing two types of outages. After estimating the joint distributions, the marginal distributions for each type of outage can also be calculated. In Figs. 10.1 and 10.2, we show the marginal distribution of line outages and the load shed for the two-type branching process of line outages and the load shed. The empirical marginal distributions of total outages (dots) and initial outages (squares) calculated from .M min = 39,000 are shown, as well as a solid line indicating the total outages predicted by the multi-type branching process from .Mumin = 6500 cascades. The branching process data is also discrete but is shown as a line for ease of comparison. It is seen that the branching process prediction with .Mumin = 6500 cascades matches the marginal distribution empirically obtained by using .M min = 39,000 cascades very well. Similar results for the marginal distribution of the line outages and isolated buses for the twotype branching process of line outages and isolated buses are shown in Figs. 10.3 and 10.4, for which .M min = 37,000 and .Mumin = 5500. The CLOs defined in Sect. 10.3.3 when the total number of line outages is known can also be calculated from either the empirical joint distribution using min cascades or the estimated joint distribution from branching process using .M min M min cascades. The .p .Mu conf in (10.19) is chosen as .0.99. The CLOs for the load shed and the isolated buses are, respectively, shown in Figs. 10.5 and 10.6, which indicate that the CLOs estimated by multi-type branching process using a much smaller number of cascades match the empirically obtained CLOs very well.

10 Interdependency Between Power System Outages by Branching Process

Fig. 10.1 Estimated marginal probability distribution of the number of line outages by using min = 6500 cascades when .Mu line outages and the load shed are considered. © [2017] IEEE. Reprinted, with permission, from [7]

100 -1

10 probability

230

10-2 10-3 initial outage total outage branching process

-4

10

10

1

10 number of line outages

100

10 probability

Fig. 10.2 Estimated marginal probability distribution of the load shed by using .Mumin = 6500 cascades when line outages and the load shed are considered. © [2017] IEEE. Reprinted, with permission, from [7]

0

-1

10-2 10-3

10

initial outage total outage branching process

-4

10

3

10 total load shed (MW)

100 -1

10 probability

Fig. 10.3 Estimated marginal probability distribution of the number of line outages by using min = 5500 cascades when .Mu line outages and isolated buses are considered. © [2017] IEEE. Reprinted, with permission, from [7]

2

-2

10

-3

10

initial outage total outage branching process

-4

10

10

0

1

10 number of line outages

10.7 Predicted Joint Distribution from One Type of Outage To further demonstrate and validate the proposed multi-type branching process model, we estimate the joint distribution of the total sizes of two types of outages by only using the predetermined offspring mean matrix and the distribution of initial line outages, as follows: ˆ is calculated offline from .Mumin cascades, as shown 1. The offspring mean matrix .Λ in Table 10.4.

10.7 Predicted Joint Distribution from One Type of Outage 100

probability

Fig. 10.4 Estimated marginal probability distribution of the number of isolated buses by using min = 5500 cascades when .Mu line outages and isolated buses are considered. © [2017] IEEE. Reprinted, with permission, from [7]

231

10

-1

10

-2

10

-3

10

-4

10

0

1

10 number of isolated buses

empirical branching process

CLO of the load shed

Fig. 10.5 Estimated CLOs for the load shed when the total number of line outages is known. © [2017] IEEE. Reprinted, with permission, from [7]

initial outage total outage branching process

103

102 0

5

10

15

20

25

number of line outages

empirical branching process

102

CLO of isolated buses

Fig. 10.6 Estimated CLOs for the isolated buses when the total number of line outages is known. © [2017] IEEE. Reprinted, with permission, from [7]

101

100

10-1

0

5

10

15

20

25

number of line outages

2. To mimic online application, .Mumin cascades are randomly chosen from the min − M min cascades for test. The empirical joint distribution of line outages .M u and the load shed (isolate buses) is calculated as a reference. 3. We estimate the joint distribution of line outage and the load shed (isolated buses) ˆ in step 1 and the distribution of initial line outages, assuming there by using the .Λ are no data about the load shed (isolated buses) for which initial outage is set to be zero with probability one. 4. We compare the marginal distributions and the CLOs calculated from the estimated and empirical joint distributions. The predicted marginal probability distributions of the load shed and isolated buses are shown in Figs. 10.7 and 10.8. The prediction is reasonably good even if

10 Interdependency Between Power System Outages by Branching Process

Fig. 10.7 Estimated marginal probability distribution of the load shed assuming there are no load shed data. © [2017] IEEE. Reprinted, with permission, from [7]

100 -1

10 probability

232

-2

10

-3

10

initial outage total outage branching process

-4

10

2

10

100

probability

Fig. 10.8 Estimated marginal probability distribution of number of isolated buses assuming there are no isolated bus data. © [2017] IEEE. Reprinted, with permission, from [7]

3

10 total load shed (MW)

10

-1

10

-2

10

-3

10

-4

10

initial outage total outage branching process 0

1

10 number of isolated buses

we do not have the distribution of initial load shed or isolated buses. Also, Fig. 10.7 shows that the prediction of the load shed is very good when the blackout size is small while the prediction is not as good when the blackout size is large. By contrast, as shown in Fig. 10.8, the prediction of the number of isolated buses is good for both small and large blackout sizes. This is mainly because the initial outage of the load shed can be greater than zero with a nonnegligible probability and assuming the initial outage is zero with probability one can influence the accuracy of the prediction. However, the initial number of isolated buses is zero or one with a high probability (.86.21% in this case) since in the initial stage the possibility that some buses are isolated from the major part of the system is very low, and thus assuming the initial number of isolated buses is zero with probability one does not obviously influence the prediction. The empirically obtained and estimated CLOs of the load shed and the isolated buses when the number of line outages is known are shown in Figs. 10.9 and 10.10, respectively. In both figures, .Mumin cascades are used to get the empirical and estimated CLOs. The prediction of the CLOs when there is no data for the load shed or isolated buses (especially the former one) is not as good as the case with those data (the prediction of the CLOs for the isolated buses is better than that for the load shed for the same reason as that for the prediction of the marginal distribution discussed above). However, the multi-type branching processes can generate useful and sometimes very accurate predictions for those outages whose data are unavailable, which can further provide important information for the

10.8 Estimated Propagation of Three Types of Outages

empirical branching process

CLO of the load shed

Fig. 10.9 Estimated CLOs for the load shed when the number of line outages is known assuming there are no load shed data. © [2017] IEEE. Reprinted, with permission, from [7]

233

103

102 0

5

10

15

20

25

number of line outages

empirical branching process

102

CLO of isolated buses

Fig. 10.10 Estimated CLOs for isolated buses when the number of line outages is known assuming there are no isolated bus data. © [2017] IEEE. Reprinted, with permission, from [7]

101

100

10-1

0

5

10

15

20

25

number of line outages

operators when the system is under a cascading outage event or is in restoration. It is also seen that the estimated CLOs from the branching process seem to be more statistically reliable than the empirically obtained CLOs from the same number of cascades which can oscillate as the number of line outages increases. Comparing the empirically obtained CLOs in Figs. 10.9 and 10.5, we can see that the oscillation in Fig. 10.5 is not that obvious, mainly because it uses much more simulated cascades to obtain the empirical CLOs.

10.8 Estimated Propagation of Three Types of Outages The parameters of the three-type branching process are estimated. By using the method in Sect. 10.4.1, we determine .M min = 46,000 when considering line outages, the load shed, and isolated buses simultaneously. The EM algorithm for estimating the offspring mean matrix of the multi-type branching processes converges in six steps. The estimated largest eigenvalue of offspring mean matrix, the offspring mean matrix, and the joint entropy of the empirical joint distribution are listed in Table 10.6. It is seen that line outages tend to have a greater influence on the load shed and isolated buses, but the influence of the load shed or isolated buses on line outages is relatively weak. The largest eigenvalue of the offspring mean matrix is greater than that for the two-type branching processes, indicating that the system is even closer to criticality when considering the mutual influence of

234

10 Interdependency Between Power System Outages by Branching Process

Table 10.6 Estimated parameters for three types of outages Type

.ρˆ

ˆ .Λ

Line outage, load shed, and isolated bus

0.64

.⎣





0.44 0.39 0.39 ⎦ 0.0035 0.024 0.018 7.4 × 10−7 0.079 4.2 × 10−4

.H

emp

8.54

three types of outages. Besides, the joint entropy is also greater compared with the two-type branching process, although the increase of joint entropy from two-type to three-type is not as high as that from one-type to two-type.

References 1. K.B. Athreya, P.E. Ney, Branching Processes, vol. 196 (Springer Science & Business Media, Springer, 2012) 2. T.E. Harris, The Theory of Branching Processes (Courier Corporation, North Chelmsford, 2002) 3. K. Sun, Y. Hou, W. Sun, J. Qi, Power System Control Under Cascading Failures: Understanding, Mitigation, and System Restoration (Wiley-IEEE Press, New York, 2019) 4. I. Dobson, J. Kim, K.R. Wierzbicki, Testing branching process estimators of cascading failure with data from a simulation of transmission line outages. Risk Anal. 30(4), 650–662 (2010) 5. J. Kim, K.R. Wierzbicki, I. Dobson, R.C. Hardiman, Estimating propagation and distribution of load shed in simulations of cascading blackouts. IEEE Syst. J. 6(3), 548–557 (2012) 6. J. Qi, I. Dobson, S. Mei, Towards estimating the statistics of simulated cascades of outages with branching processes. IEEE Trans. Power Syst. 28(3), 3410–3419 (2013) 7. J. Qi, W. Ju, K. Sun, Estimating the propagation of interdependent cascading outages with multi-type branching processes. IEEE Trans. Power Syst. 32(2), 1212–1223 (2017) 8. A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Series B Stat. Methodol. 39(1), 1–22 (1977) 9. I.J. Good, Generalizations to several variables of Lagrange’s expansion, with applications to stochastic processes, in Mathematical Proceedings of the Cambridge Philosophical Society, vol. 56(04) (Cambridge University Press, Cambridge, 1960), pp. 367–380 10. M. Parandehgheibi, E. Modiano, D. Hay, Mitigating cascading failures in interdependent power grids and communication networks, in 2014 IEEE International Conference on Smart Grid Communications (SmartGridComm) (IEEE, 2014), pp. 242–247 11. T. Li, M. Eremia, M. Shahidehpour, Interdependency of natural gas network and power system security. IEEE Trans. Power Syst. 23(4), 1817–1824 (2008) 12. T. Adachi, B.R. Ellingwood, Serviceability of earthquake-damaged water systems: effects of electrical power availability and power backup systems on system vulnerability. Reliab. Eng. Syst. Saf. 93(1), 78–88 (2008) 13. S.M. Rinaldi, J.P. Peerenboom, T.K. Kelly, Identifying, understanding, and analyzing critical infrastructure interdependencies. IEEE Control. Syst. Mag. 21(6), 11–25 (2001) 14. Z. Guo, F. Afifah, J. Qi, S. Baghali, A stochastic multiagent optimization framework for interdependent transportation and power system analyses. IEEE Trans. Transp. Electrification 7(3), 1088–1098 (2021) 15. P. Guttorp, Statistical Inference for Branching Processes, vol. 122 (Wiley-Interscience, Hoboken, 1991)

References

235

16. F. Maaouia, A. Touati et al., Identification of multitype branching processes. Ann. Stat. 33(6), 2655–2694 (2005) 17. S. Mei, X. Weng, A. Xue et al., Blackout model based on OPF and its self-organized criticality, in 2006 Chinese Control Conference (IEEE, 2006), pp. 1673–1678 18. S. Mei, Y. Ni, G. Wang, S. Wu, A study of self-organized criticality of power system under cascading failures based on AC-OPF with voltage stability margin. IEEE Trans. Power Syst. 23(4), 1719–1726 (2008)

Chapter 11

Interdependency Between Power System Outages by Coupled Interaction Model

11.1 Introduction Due to the limitations of the physical simulation models and complex network based approaches, high-level statistical models such as branching processes [1–4] have been proposed to extract useful information from either simulated cascades or utility outage data. Branching process is introduced to model the propagation of line outages and efficiently estimate the distribution of the number of line outages in [1, 5]. The propagation of load shed is also analyzed with branching process [2, 3]. In Chap. 10, the interdependency between different types of outages is analyzed by multi-type branching process [4]. Although branching process can capture the distribution of line outages and the load shed, it does not distinguish components that fail in cascading failure propagation and thus cannot identify critical components for cascading mitigation. The recent studies on interaction network [6–9] and influence graph [10–12] provide another useful way to extract propagation patterns in the original cascades [13]. In Chap. 2, the estimation of the interaction network for line outages is formulated as a parameter estimation problem with incomplete data and is efficiently solved with the Expectation Maximization (EM) algorithm [7]. Influence graphs are constructed by Markovian process in [10–12], in which the transition probability is estimated by Bayesian inferring [12]. In addition to line outages, the load shed and electrical distances are analyzed by a multi-layer interaction graph in [8]. Although the load shed is considered as one layer, the cause of the load shed at buses is equally distributed to the failed lines, which may not always be reasonable as the contribution of line outages to the load shed can be significantly different. Although the existing high-level statistical models provide useful tools to capture the pattern of outage propagation, it is still very challenging to analyze the detailed interdependency between line outages and the load shed, especially with limited samples. In this chapter, a coupled interaction matrix is introduced to describe the interdependency between line outages and the load shed at buses and is efficiently estimated by the EM algorithm with a small number of original cascades [14]. © Springer Nature Switzerland AG 2023 J. Qi, Smart Grid Resilience, https://doi.org/10.1007/978-3-031-29290-3_11

237

238

11 Interdependency Between Power System Outages by Coupled Interaction Model

A coupled interaction model that utilizes the coupled interaction matrix and the distribution of initial outages is presented to efficiently simulate cascading failures with both line outages and the load shed at buses. Critical links are identified with the interaction matrix based on a defined comprehensive severity index that considers the consequences of both line outages and the load shed. The critical link based mitigation strategy is implemented and is shown to be able to significantly reduce both line outages and the load shed.

11.2 Coupled Interaction Matrix Cascading failures in power systems involve successive line outages and load shed at buses. According to the outage sequence from either the utility outage record [1] or cascading failure simulations [6], a cascade can be divided into different generations. Assume there are .Nl lines and .Nb buses with load in the system. The index numbers of lines and load buses are converted so that the set of lines is .{1, 2, · · · , Nl } and the set of buses with load is .{1, 2, · · · , Nb }. Assume there are M cascades in the dataset:

cascade 1 . .. cascade m . .. cascade M m,g

generation 0 1,0 .F .. . m,0 .F . .. M,0 .F

generation 1 1,1 .F .. . m,1 .F .. . M,1 .F

m,g

m,g

generation 2 1,2 .F .. . m,2 .F .. . M,2 .F

... ... . .. ... . .. ...

m,g

Here .F m,g = {SL , SB , Z m,g }, .SL and .SB are, respectively, the set of lines that outage and the set of buses with load shed in generation g of cascade m, and .Z m,g ∈ ZNb is the vector for the amount of discretized load shed at each bus in generation g of cascade m. In both utility outage data and cascading failure simulations, the load shed is m,g usually recorded in MW. Let .Xv be the load shed at bus v in MW in generation g of cascade m. Discretization should be first performed on the load shed data by the technique in [3, 4]. If .Δv MW is the chosen unit of discretization for bus v, an integer multiples of .Δv MW can be obtained for the load shed at bus v as Zvm,g = int

.



 m,g Xv + 0.5 , Δv

(11.1)

where .int[x] is the integer part of x. A systematic approach for choosing .Δv will be discussed in Sect. 11.3.

11.2 Coupled Interaction Matrix

239

The interaction model in Chap. 2 only focuses on the propagation of line outages. In this chapter, the interaction analysis is not only for line outages but also considers the interdependency between line outages and the load shed at different buses. This is useful because the load shed corresponds to more direct economic losses and is of more interest by the utilities. The analysis of interdependency between line outages and the load shed at buses enables better capture of the cascading failure propagation patterns and can help develop more effective mitigation strategies.

11.2.1 Definition of Coupled Interaction Matrix In order to capture the interdependency between line outages and the load shed, a coupled interaction matrix .B ∈ R(Nl +Nb )×(Nl +Nb ) is defined as  B LL B LB , B BL B BB

 B=

.

where .B LL , .B LB , .B BL , and .B BB are defined as follows. Assume .Mu ≤ M cascades are used for estimating the coupled interaction matrix. • .B LL ∈ RNl ×Nl captures the interactions between lines outages, which is the same LL , as the interaction matrix in Chap. 2. Its entry in the ith row and j th column, .bij is the empirical probability that line j fails following the outage of line i. • .B LB ∈ RNl ×Nb captures the interactions from line outages to the load shed at buses. As has been adopted and verified in [2–4], Poisson distribution can capture the statistical properties for offspring outages being selected from a large number of possible outages that have small probability and are approximately independent. Therefore, we use Poisson distribution to approximate the distribution of the discretized load shed at buses following line outages. For example, to capture the discretized load shed at bus v following the outage of line i, the mean of LB , which is the entry in the ith row and vth Poisson distribution is recorded as .biv column of .B LB . If line i fails in generation g, .kv units of discretized load shed at bus v will occur in generation .g + 1 with probability LB .piv

 LB kv b LB e−biv . = iv kv !

(11.2)

• .B BL ∈ RNb ×Nl captures the interactions from the load shed to line outages. Note that the interactions between the load shed at buses and the further line outages are not necessarily causal but actually capture what might consequently happen in the next generation following the load shedding event in current generation. When load shedding cannot eliminate all line overloading or other technical constraint violations [15, 16], line outages can still be observed following the

240

11 Interdependency Between Power System Outages by Coupled Interaction Model

load shedding event. As has been verified in [4], line outages following load shedding are rare. Therefore, we consider a simplified case in which line outages are not sensitive to the amount of load shed at buses. When load shedding occurs BL , at bus u, line j will fail in the next generation with a constant probability .buj which is the entry in the uth row and j th column of .B BL . • .B BB ∈ RNb ×Nb captures the interactions between the load shed at buses. Similar to .B BL , .B BB captures the successive load shedding in consecutive generations that is also not necessarily causal. As the system has been significantly weakened when the cascading outage evolves, successive load shedding may occur at different buses or at the same bus more than once. We assume each unit of the discretized load shed at bus u independently generates the discretized load shed BB . Therefore, if .k units of load at bus v by a Poisson distribution with mean .buv u are shed at bus u in generation g, .kv units of discretized load shed at bus v will occur in generation .g + 1 with the following probability: BB .puv

 BB kv ku buv BB e−ku buv . = kv !

(11.3)

The matrix .B determines how components, either lines or buses, interact with each other. The nonzero entries of .B are called links. Link .l : i → j corresponds to .B’s nonzero entry in the ith row and j th column. By putting all links together, a directed network .G (C , L ) called coupled interaction network can be obtained: the vertices .C are components, including both lines and buses, and the directed link .l ∈ L represents that the destination vertex component fails following the source vertex component outage by a probability that is greater than 0. Different from Chap. 2, there are four different types of links: .L → L links, .L → B links, .B → L links, and .B → B links.

11.2.2 Definition of an Auxiliary Matrix To estimate the coupled interaction matrix .B, the following auxiliary matrix .A ∈ R(Nl +Nb )×(Nl +Nb ) is needed:  LL LB  A A , .A = ABL ABB whose four sub-matrices are defined below. • .ALL ∈ RNl ×Nl has entry .aijLL as the total number of times that line j fails LL can be estimated following the outage of line i in the dataset. Based on .aijLL , .bij as

11.3 Estimating Coupled Interaction Matrix by EM Algorithm

LL .bij

=

aijLL NiL

,

241

(11.4)

where .NiL is the total number of outages of line i in the .Mu cascades. LB as the total discretized amount of load shed at bus • .ALB ∈ RNl ×Nb has entry .aiv LB can be estimated v following the outage of line i in the .Mu cascades. Then .biv by LB biv =

.

LB aiv

NiL

.

(11.5)

BL as the total number of outages of line j following • .ABL ∈ RNb ×Nl has entry .auj BL is estimated as the load shed at bus u, from which .buj

BL buj =

.

BL auj

NuB

,

(11.6)

where .NuB is the total number of times that bus u has nonzero discretized load shed in the .Mu cascades. BB as the total discretized amount of load shed at bus • .ABB ∈ RNb ×Nb has entry .auv v following the load shed at bus u. As the load shed at bus v after each unit of BB load shed at bus u independently follows Poisson distribution, whose mean .buv can be estimated by BB buv =

.

BB auv m Mu G −1  m=1 g=0

,

(11.7)

m,g

Zu

where .Gm is the number of generations in cascade m and the denominator is the total discretized amount of load shed at bus u in the .Mu cascades.

11.3 Estimating Coupled Interaction Matrix by EM Algorithm When there are multiple line outages and the load shed at multiple buses in two successive generations, it is challenging to decide the actual interactions between outages, either line outages or the load shed, and further determine .A and .B. To address this issue, the EM algorithm is adopted to infer the interactions between outages [4, 7, 17]. The EM algorithm iterates over E-step and M-step to maximize the likelihood estimation of the coupled interaction matrix .B until convergence. As

242

11 Interdependency Between Power System Outages by Coupled Interaction Model

the capability of Poisson distribution to capture the propagation of load shed can be improved by the choice of the discretization unit .Δv [3], the discretization units for load buses are adaptively updated with the EM algorithm. The detailed procedure is described as follows: (0)

1. Initialization of discretization units: An initial discretization unit .Δv = 50 MW is chosen for each load bus v. The load shed at buses in the original .Mu cascades is processed by (11.1) with the initial discretization units. 2. Initialization of .A and .B: Each component in generation .g + 1 is initially assumed to follow each outage in generation g. An initial matrix .A(0) can thus be constructed by processing the original .Mu cascades, based on which the initial (0) can be calculated by (11.4)–(11.7). .B 3. E-step: Update .A(k+1) based on .B (k) m,g(k+1) m,g(k+1) In the .(k+1)th iteration, .pij and .puj , the probability of line j outage in generation .g + 1 following line i outage and that following the load shed at bus u in generation g of cascade m, are estimated as LL(k)

m,g(k+1)

pij

.

=

1−





m,g

c∈SL

bij

LL(k)

1 − bcj







m,g

c∈SB

BL(k)

1 − bcj



BL(k)

m,g(k+1)

=

puj

1−





m,g

c∈SL m,g(k+1)

buj

LL(k)

1 − bcj







m,g

c∈SB

BL(k)

1 − bcj

m,g(k+1)

.

m,g+1

Similarly, .piv and .puv , the probability of having .Zv units of discretized load shed at bus v in generation .g + 1 following the outage of line i and that following the load shed at bus u in generation g of cascade m, are estimated as m,g(k+1)

piv

.

=

1−

 m,g

LB(k) piv   LB(k) 1 − pcv

m,g

c∈SL m,g(k+1)

puv

.

 BB(k) 1 − pcv

c∈SB BB(k)

=

1−

 m,g

c∈SL



puv

LB(k)

1 − pcv



 m,g



BB(k)

1 − pcv

,

c∈SB

where .piv and .puv can be calculated by (11.2)–(11.3) based on .B (k) . The entries of .ALL and .ABL are updated as LB(k)

BB(k)

11.3 Estimating Coupled Interaction Matrix by EM Algorithm

243

m

LL(k+1) .a ij

=

Mu G

−2

m,g(k+1)

pij

(11.8)

.

m=1 g=0 m

BL(k+1) auj

=

Mu G

−2

m,g(k+1)

puj

,

(11.9)

m=1 g=0 m,g(k+1)

where .pij

m,g

is zero if line .i ∈ / SL

m,g SB LB

zero if bus .u ∈ / The entries of .A

m,g+1

or line .j ∈ / SL

m,g+1 SL .

m,g(k+1)

. And .puj

is

or line .j ∈ / and .ABB are updated as m

LB(k+1) .a iv

=

Mu G

−2

m,g(k+1)

Zvm,g+1 piv

(11.10)

m=1 g=0 m

BB(k+1) .auv

=

Mu G

−2

m,g(k+1)

Zvm,g+1 puv

,

(11.11)

m=1 g=0 m,g(k+1)

m,g

m,g+1

m,g(k+1)

where .piv is zero if line .i ∈ / SL or bus .v ∈ / SB . And .puv is m,g m,g+1 / SB or bus .v ∈ / SB . zero if bus .u ∈ The interactions of the load shed following line outages are also recorded to update the discretization units. Take the load shed at bus v following the outages of line i as an example. First let .Uv denote the maximum amount of discretized LB(k+1) ∈ RUv +1 record the number of load shed at bus v. Let the vector .C iv cumulative times of the load shed at bus v following the .NiL times of the outage of line i in .Mu cascades. For the nth outage of line i, if there is no discretized m,g+1 LB(k+1) (0) increases by 1. Otherwise, if .Zv units of load shed at bus v, .Civ discretized load shed at bus v follow the nth outage of line i with a probability m,g(k+1) m,g+1 m,g(k+1) LB(k+1) LB(k+1) , .Civ (0) and .Civ (Zv ) increase by .1 − piv and of .piv m,g(k+1) .p , respectively. iv 4. M-step: Update .B (k+1) based on .A(k+1) (k+1) 4.1) The estimation of interaction matrix .B˜ ∈ R(Nl +Nb )×(Nl +Nb ) is updated (k+1) with .A according to (11.4)–(11.7). 4.2) The sample variance of the discretized load shed at bus v following the outage of line i is calculated as Uv  2(k+1)

Siv

.

=

l=0

LB(k+1)

Civ

 LB(k+1) 2 (l) l − b˜iv

NiL − 1

.

(11.12)

244

11 Interdependency Between Power System Outages by Coupled Interaction Model

4.3) For each load bus v, .Δv is updated as

Δ(k+1) v

.

 2(k+1) Siv NiL ˜ LB(k+1) biv i∈SvLB , = Δ(k) v LB(k+1)  b˜

NiL iv2(k+1) i∈SvLB

(11.13)

Siv

LB(k+1)

where .SvLB is the set of row indices at which the vth column of .B˜ has nonzero entries. More detailed explanation of (11.13) can be found in the Appendix. 4.4) Using the updated discretization units, the load shed at buses in the original LB(k+1) and .Mu cascades is reprocessed by (11.1). And the entries in .B LB(k+1) BB(k+1) BB(k+1) are updated from .B ˜ .B and .B˜ as LB(k+1)

biv

.

(k)

=

BB(k+1) buv =

Δv

LB(k+1) . b˜ (k+1) iv

Δv

Δ(k) Δ(k+1) v ˜ BB(k+1) u . b (k) (k+1) uv Δu Δv

(11.14)

(11.15)

5. End: Steps 3–5 are repeated until the following condition is satisfied:

σB =

.

Nl +Nb  +Nb Nl (k+1) (k) 2 bij − bij

i=1 j =1 N

≤ ε,

(11.16)

where .N = N=0 if .N=0 , the number of nonzero entries in .B (k+1) − B (k) , is greater than 0, otherwise .N = 1, and .ε is a preset threshold. After the convergence of the EM algorithm, the estimated matrix .B and the discretization units in the last iteration will be used for the cascading outage simulation and mitigation.

11.4 Coupled Interaction Model for Cascading Failure Simulation As large blackouts are usually rare, simulating large blackouts with physical cascading failure models is rather inefficient, while the number of available utility outage data sample is limited. By contrast, highly probabilistic interaction models [6, 9, 12] can efficiently generate a large number of cascades to better estimate

11.4 Coupled Interaction Model for Cascading Failure Simulation

245

the blackout size [18]. Moreover, the effect of mitigation schemes can also be efficiently tested by modifying the parameters of highly probabilistic models. A coupled interaction model can simulate the propagation of line outages and the load shed utilizing the probability distribution of initial outages and the coupled interaction matrix .B. The coupled interaction model has the following four steps: • Step 1: Generate initial outages Set .g = 0. The line outages in generation 0 are independently generated according to their occurrence frequency in generation 0 of the original cascades. Specifically, line i fails in generation 0 by probability:

πiL =

.

Mu    1 i ∈ SLm,0

m=1

Mu

,

(11.17)

where .1[event] is an indicator function that evaluates to one if the event happens and evaluates to zero when the event does not happen. Similarly, bus u has k units of discretized load shed in generation 0 by probability: Mu  B πu,k =

.

m=1

1[Zum,0 = k] Mu

.

(11.18)

All generated outages comprise .SLm,0 , .SBm,0 , and .Z m,0 . Considering the time scale of cascading failures, most cascading failure simulation models do not include any repair process of the failed lines. As each line fails at most once in a simulation, once line i fails the entries in the ith column of .B LL and .B BL are set to be zero. • Step 2: Generate further line outages m,g m,g Each outage in .SL and .SB independently generates line outages in m,g generation .g + 1. Line j fails in generation .g + 1 following line i outage in .SL m,g LL and .bBL , respectively. and the load shed at bus u in .SB with probability .bij uj m,g+1

All sampled line outages comprise .SL . The columns of .B LL and .B BL that are corresponding to the failed lines are set to be zero. • Step 3: Generate further load shed m,g For line .i ∈ SL , the discretized load shed at bus v follows Poisson distribution m,g LB with mean .biv . For bus .u ∈ SB , the discretized load shed at bus v follows m,g BB Poisson distribution with mean .Zu buv . The load shed at bus v is independently m,g m,g sampled for each outage in .SL and .SB . Since the total load shed at bus v from generation 0 to generation .g + 1 cannot exceed its total discretized load .Zvt ,  m,g+1 the load shed at bus v in generation .g + 1 is recorded as .Zv = min Zvt − g   m,g+1 m,g+1  m,g+1 m,g+1 m,l m,g Z + u∈S m,g Zv←u , where .Zv←i and .Zv←u i∈S v←i l=0 Zv , L

B

246

11 Interdependency Between Power System Outages by Coupled Interaction Model

are the load shed at bus v in generation .g + 1 generated from line outage i and m,g+1 the load shed at bus u, respectively. All buses with load shed comprise .SB and the corresponding discretized load shed is recorded in .Z m,g+1 . • Step 4: End m,g+1 m,g+1 Simulation ends if both .SL and .SB are empty. Otherwise, increase g by one and go back to Step 2. Steps 1–4 can be repeated to simulate many cascades for better understanding and mitigating cascading failures. Compared with detailed cascading failure models, the coupled interaction model is highly probabilistic and is thus much more time efficient.

11.5 Critical Link Identification As both the number of line outages and the amount of load shed have been used as the measures of blackout size [18], the mitigation effect of cascading outages can also be measured by the comprehensive cost of line outages and the load shed. The severity of a link .l : S → T is measured by the line outages and load shed propagated through l, where S and T are, respectively, the source outage and target outage and could be a line outage or a bus with load shedding. For each link .l : S → T , an acyclic interaction subgraph .Gl (Cl , Ll ) in which there is a path from vertex T to any other vertex can be obtained from the coupled interaction network. As a line cannot outage twice in one propagation path while a bus may shed load for multiple times in one path, different vertices in the subgraph .Gl (Cl , Ll ) cannot represent the same line outage event but can be load shedding events at the same bus. To extract the acyclic subgraph, the links to a vertex as line outage event from other vertices at the same or higher levels are removed as in [6]. For the link to a vertex as load shedding event from other vertices at the same or higher levels, we remove the link, add a new vertex for the load shedding event at the next level of the source vertex of the removed link, and connect the source vertex of the removed link to the newly added vertex. This eliminates the loops in the original subgraph by adding the vertices for the load shedding events at the same buses at multiple levels. Figure 11.1 shows an example of such an acyclic interaction subgraph. In the coupled interaction network, the load shedding at bus x follows both the target Fig. 11.1 Illustration of an interaction subgraph starting with link .l : S → T . © [2021] IEEE. Reprinted, with permission, from [14]

11.5 Critical Link Identification

247

outage T at level 0 and the line e outage at level 1. To capture the load shedding at bus x in the propagation path, a new vertex is added at level 2 for the load shedding event at bus x. With the interaction subgraph, the severity index for link l can be calculated as follows: • Step 1: Estimate the expected number of outages of T – When S denotes line i outage and T denotes line j outage, given the total number of line i outage in the .Mu cascades as .NiL , the expected value of the number of line j outage following line i outage is estimated as LL EjL = NiL bij .

.

(11.19)

– When S denotes line i outage and T denotes the load shed at bus v, the expected value of the amount of load shed at bus v at level 0 is estimated as LB EvB,0 = NiL biv Δv .

.

(11.20)

The number of times of load shedding at bus v at level 0 following line i outage is estimated as  LB  Nˆ vB,0 = NiL 1 − e−biv ,

.

(11.21)

 LB  where . 1 − e−biv is the probability of having nonzero events for a Poisson LB . distribution with mean .biv – When S denotes the load shed at bus u and T denotes line j outage, given the total number of times that bus u has load shedding in the .Mu cascades as .NuB , the expected value of the number of line j outage is estimated as BL EjL = NuB buj .

.

(11.22)

– When S denotes the load shed at bus u and T denotes the load shed at bus v, the expected value of the amount of load shed at bus v at level 0 is estimated as m

B,0 .Ev

=

Mu G

−1

m,g

BB Zu buv Δv .

(11.23)

m=1 g=0

The expected value of the number of times of load shedding at bus v at level 0 following the load shed at bus u is estimated as m

ˆ vB,0 = .N

Mu G

−1 

m,g BB 1 − e−Zu buv , m=1 g=0

(11.24)

248

11 Interdependency Between Power System Outages by Coupled Interaction Model

 m,g BB  where . 1 − e−Zu buv is the probability of observing nonzero load shed at m,g bus v following the load shed of .Zu at bus u. • Step 2: Estimate the expected value of the outages at level 1 based on the expected value of outage T and .B – When T denotes line j outage, the expected value of the outage of line e at level 1 can be estimated by EeL = EjL bjLLe .

.

(11.25)

The expected value of the amount of load shed at bus x at level 1 can be estimated by ExB,1 = EjL bjLB x Δx ,

.

(11.26)

and the expected value of the number of times of load shedding at bus x at level 1 is  LB ˆ xB,1 = EjL 1 − e−bj x . .N (11.27) – When T denotes the load shed at bus v, the expected value of the number of outage of line e at level 1 can be estimated as BL EeL = Nˆ vB,0 bve .

.

(11.28)

The load shed at bus x at level 1 following outage T as the load shed at bus v can be estimated as ExB,1 =

.

EvB,0 BB b Δx , Δv vx

(11.29)

and the expected value of the number of times of load shedding at bus x at level 1 is ˆ xB,1 .N

=

Nˆ vB,0

B,0 BB   E b − vB,0 vx ˆ 1 − e N v Δv ,

(11.30)

  B,0 BB ˆ B,0 where . 1 − e−Ev bvx /(Nv Δv ) is the probability of having positive load shed at bus x, in which the discretized amount of load shed at bus v at level 0 in one of the .Nˆ vB,0 load shedding is approximated by its average value B,0 /(N ˆ vB,0 Δv ). .Ev

11.6 Coupled Interaction Network of IEEE 300-Bus System

249

• Step 3: Estimate the expected value of the outages at higher levels The expected value of the outages at level .k ≥ 2 can be calculated with the estimated outages at level .k − 1, similar to the calculations in Step 2. • Step 4: Calculate the comprehensive severity index .Il To measure the total number of line outages propagated through link l, the severity index .IlL is calculated as

IlL =

.

EcL ,

(11.31)

c∈ClL

where .ClL is the set of lines at all levels of the interaction subgraph starting with link l. To measure the total amount of load shed propagated through link l, the severity index .IlB is calculated as IlB =

K−1



.

EcB,k ,

(11.32)

k=0 c∈C B,k l

where K is the number of levels in the subgraph and .ClB,k is the set of buses at level k of the interaction subgraph starting with link l. Then a comprehensive severity index .Il is calculated as Il = cL

.

IB IlL + cB l t , Nl X

(11.33)

where .cL and .cB are, respectively, the cost coefficients for line outages and load shed set by the operators of the system, and .Xt is the total load of the system in MW. Based on the ranking of .Il , critical links can be identified from the perspective of consequences in both line outages and the load shed. Note that the severity index L B .Il is only considering the cost of the load shed (line outages) when .c (.c ) is set L B as zero. In practical application, the ratio between .c and .c can be adjusted by the operators according to their preference or the actual costs of line outages and the load shed. Mitigation strategies can further be implemented based on the identified critical links to reduce the risk of blackouts.

11.6 Coupled Interaction Network of IEEE 300-Bus System The IEEE 300-bus system has 191 buses with load and 411 lines [19]. The total load and total generation capacity are, respectively, 23,848 MW and 32,678 MW. The

250

11 Interdependency Between Power System Outages by Coupled Interaction Model

Table 11.1 Number of nonzero entries in the coupled interaction matrix .B

Matrix Number Proportion

.B

LL

1704 1.01%

.B

BL

36 0.05%

.B

LB

246 0.31%

.B

BB

130 0.36%

.B

2116 0.58%

open-loop OPA [20] is adopted to generate 30,000 original cascades for this system. The parameters of the OPA model in [20] are set as .γ = 2, .α = 0.95, .β = 0.30, and .p0 = 0.001. The initialization of the discretization units in the EM algorithm should consider the amount of load at different buses. A too large discretization unit can lead to a lot of the load shed at some buses to zero. The initial discretization units for all load buses are chosen as 50 MW. In the original cascades for the IEEE 300-bus system, the load shed at 88% of the buses is not less than 25 MW and thus is discretized as nonzero integers according to (11.1). The load shed that is greater than 25 MW accounts for 99.21% of the total amount of load shed. In Sect. 11.2, the lines and the buses with load are numbered from 1 to .Nl and from 1 to .Nb , respectively, for convenience. Here, the bus and line index numbers in both the coupled interaction network and the following analysis are those in the original IEEE 300-bus system to avoid confusion. Ten percent of the original cascades (.Mu = 3000) are randomly selected to estimate the coupled interaction matrix .B. The .ε in (11.16) is set as 0.01. Table 11.1 lists the number of nonzero entries in the four sub-matrices of .B and the proportion of nonzero entries. The total number of nonzero entries in .B is 2116, which only accounts for 0.58% of the .6022 entries. Also, .B BL is much sparser than the other sub-matrices, indicating that the chance for load shed to be followed by line outages is very low. Figure 11.2 shows the coupled interaction network of the IEEE 300-bus system. The red and green vertices, respectively, denote lines and buses with load. The red, blue, yellow, and green arrows are the links in .B LL , .B LB , .B BL , and .B BB , respectively. Most links start from red vertices, indicating that line outages are the dominant factors in cascading failure propagation. This is because a more dramatic redistribution of power flow and more heavy-loaded lines tend to be observed after the outages of some critical lines rather than after load shedding at some buses. This is consistent with the conclusion in [4]. LB of the sub-matrix .B LB captures the interaction that the load shed at The entry .biv bus v follows the outage of line i. The interactions can be analyzed by the connection of line i and bus v in the transmission system. It is found that bus v is connected with line i only for 9.76% of the 246 interactions in .B LB . In 16.67% of the interactions, line i is connected with generation buses rather than load buses. This indicates that line outages may not only impact the load supply at the buses that are connected to the outage line. This can be explained by various complicated causes of the load shed in cascading outages, including but not limited to the isolation of load buses, disconnection of generation buses, and the transmission capacity limits.

11.7 Validation of Coupled Interaction Model

251

Fig. 11.2 Coupled interaction network for IEEE 300-bus system. © [2021] IEEE. Reprinted, with permission, from [14]

11.7 Validation of Coupled Interaction Model LB for the interactions To test the fit goodness of the Poisson distribution with mean .biv between the outage of line i and the load shed at bus v recorded in .C LB iv , the Kolmogorov–Smirnov (K–S) test [21] at the significance level of 0.05 is performed. The results show that the assumption of Poisson distribution is supported by the K–S test for 98.78% of the interactions in .B LB in Table 11.1. Similarly, the K–S test is also performed for the load shed at bus v following the different amounts of discretized load shed at bus u. The results show that the Poisson distribution assumption is supported by the K–S test for all of the interactions in .B BB in Table 11.1. This illustrates that the Poisson distribution assumption is valid for the propagation of load shed, which is consistent with Chap. 10. A total of 30,000 cascades are simulated using the coupled interaction model in Sect. 11.4. The distributions of the total number of line outages from 30,000 original cascades and 30,000 simulated cascades are compared in Fig. 11.3. The distributions of the amount of load shed are compared in Fig. 11.4. It is seen that the simulated cascades show a consistent distribution of line outages and load shed as the original cascades. Moreover, as seen in Figs. 11.3 and 11.4, 3000 original cascade outages can only estimate blackout size at the probability level of .10−3 , while 30,000 simulated cascades from the coupled interaction model can extend the probability level beyond .10−4 . To further validate the impact of the interactions captured by .B BL and .B BB , the EM algorithm and the coupled interaction model are modified to only include .B LL and .B LB . The modified interaction model is referred to as the reduced interaction model. Figures 11.3 and 11.4 compare the probability distributions of the outages from the interaction model and the reduced interaction model. Figure 11.3 shows that .B BL has a marginal impact on the propagation of line outages. This is because the number of nonzero elements in .B BL is much smaller than that in .B LL . However, it is found from Fig. 11.4 that .B BB has a noticeable impact on the propagation of

252

11 Interdependency Between Power System Outages by Coupled Interaction Model

Fig. 11.3 Probability distribution of the total number of line outages. © [2021] IEEE. Reprinted, with permission, from [14]

Fig. 11.4 Probability distribution of the total amount of load shed. © [2021] IEEE. Reprinted, with permission, from [14]

load shed, especially on large load shed. This is due to the fact that large blackouts often involve successive load shedding, which is effectively captured by .B BB . As only 10% of original cascades are used to estimate the interaction matrix .B, the highly probabilistic interaction model can significantly improve the simulation efficiency. Simulating 30,000 cascades by OPA model takes 9214 s, while the simulation with the coupled interaction model takes only 47 s. The time required to generate 3000 original cascades is .3000/30,000 × 9214 ≈ 921 s, and the time for the EM algorithm is 52 s. Therefore, a speedup of .9214/(921 + 52 + 47) ≈ 9.03 can be achieved by the coupled interaction model for simulating 30,000 cascades.

11.8 Choosing Critical Links for Mitigation To mitigate the propagation of cascading failure through link .l : S → T , remedial actions can be implemented to reduce the probability of outage T when source outage S occurs. Table 11.2 lists the top 5 of each type of links ranked according to L B .Il in (11.33) with .c = 1 and .c = 1.5. The target outage T in .L → B and .B → B links is load shed at a bus. As the location and amount of load shed in cascading outages are determined manually by the operators or automatically by the relays, the priority of the buses in .L → B and .B → B links for load shed can be adjusted with respect to the source outages. For manual load shedding, the priority of the bus in target outages can be adjusted

.L

→ L link line 268 .→ line 309 line 251 .→ line 252 line 223 .→ line 222 line 309 .→ line 268 line 268 .→ line 307

2.35 1.96 1.89 1.44 1.31

.Il

→ B link line 307 .→ bus 171 line 309 .→ bus 171 line 268 .→ bus 171 line 268 .→ bus 204 line 307 .→ bus 204

.L

7.93 6.97 5.67 5.62 4.74

.Il

→ L link bus 204 .→ line 309 bus 152 .→ line 249 bus 99 .→ line 178 bus 170 .→ line 268 bus 152 .→ line 247

.B

Table 11.2 Top 5 critical links ranked according to .Il when .cL = 1 and .cB = 1.5 0.32 0.23 0.21 0.19 0.15

.Il

→ B link bus 120 .→ bus 100 bus 106 .→ bus 106 bus 106 .→ bus 163 bus 146 .→ bus 146 bus 138 .→ bus 120 .B

0.19 0.16 0.12 0.12 0.10

.Il

11.8 Choosing Critical Links for Mitigation 253

254

11 Interdependency Between Power System Outages by Coupled Interaction Model

by increasing its load shed cost in the optimal power flow problem. For automatic load shedding, the adaptive load shedding [22], which is supported by the widearea monitoring system, can adjust the priority of buses online with respect to the disturbances. To mitigate the load shed as the target outage T , the load shed cost of outage T is increased in the optimal power flow problem in OPA model. After increasing the load shed cost by ten times for the critical .L → B and .B → B links in Table 11.2, an average reduction of 4.4% of the number of times of outage T and 6.1% of the amount of load shed is observed. The load shed of outage T is not reduced effectively, mainly because in cascading failure models such as OPA the load shed is mainly caused by generation shortage in islands or insufficient transmission capacity following the outage of critical lines, and the increase of load shed cost cannot avoid the occurrence of most load shedding. As the load shed cannot be effectively mitigated by directly reducing its probability after the occurrence of source outage S, another way to indirectly mitigate load shedding is to reduce the probability of line outages as the source outages in critical .L → B links. On the other hand, the target outage T in .L → L and .B → L links is a line outage. To mitigate line outages as the target outage T , the probability of outage T can be reduced by blocking Zone 3 relay [6] or increasing the transfer margin of the targeted line [23]. It is seen in Table 11.2 that the indices of .B → L links are much smaller than those of .L → L links. Moreover, critical .B → L links tend to have .L → L → B link chains pointing to them. Take the link “bus 204 .→ line 309” as an example, which is the top one .B → L link in Table 11.2. The link chain “line 268 .→ line 307 .→ bus 204” points to “bus 204 .→ line 309.” Mitigating the critical link “line 268 .→ line 307” reduces the probability of load shedding at bus 204 and further line 309 outage. In addition, as line 309 outage is likely to follow both the critical link “line 268 .→ line 309” and the link chain “line 268 .→ bus 204 .→ line 309”, reducing the probability of line 309 outage by the .B → L link “bus 204 .→ line 309” can be covered by the .L → L link. Based on the above analysis, it is practical to mitigate the propagation of cascading failures by only considering the critical .L → L links. In Sect. 11.9, we will only consider critical .L → L links based mitigation.

11.9 Cascading Failure Mitigation Here mitigation strategies are implemented based on the identified .L → L critical links by blocking Zone 3 relay [6]. For example, for critical link .l : line i → line j , when line i fails, the relay of line j is blocked to reduce its tripping probability to 10% of its original probability so that the control center could perform remedy control and stop failure propagation.

11.9 Cascading Failure Mitigation

255

Fig. 11.5 Probability distribution of the number of line outages under different mitigation strategies. © [2021] IEEE. Reprinted, with permission, from [14]

Fig. 11.6 Probability distribution of the amount of load shed under different mitigation strategies. © [2021] IEEE. Reprinted, with permission, from [14]

The following four mitigation strategies are implemented: 1. 2. 3. 4.

Relay blocking is based on 30 randomly selected links. Relay blocking is based on top 30 links ranked by .Il with .cL = 0 and .cB = 1. Relay blocking is based on top 30 links ranked by .Il with .cL = 1 and .cB = 0. Relay blocking is based on top 30 links ranked by .Il with .cL = 1 and .cB = 1.5.

For each mitigation strategy, 150,000 cascades are generated by the OPA model with relay blocking for the corresponding links [6]. When the source outage S happens in generation g of cascade m, the relay is blocked for the target outage T in all remaining generations of cascade m. Figures 11.5 and 11.6 compare the probability distributions of the number of line outages and the amount of load shed before and after mitigation. As in Fig. 11.5, the number of line outages is mitigated the most by the third mitigation strategy since only line outage consequence is considered (.cB = 0). However, the reduction of load shed is not as effective as the strategies that consider load shed consequences. Similarly, as in Fig. 11.6, when the second strategy only considers the cost of load shed (.cL = 0), the amount of load shed is mitigated the most, but the line outage is not mitigated very well. By properly setting .cL /cB = 1/1.5, the fourth mitigation strategy is able to make a trade-off and reduce both line outage and load shed to a great extent, leading to the best mitigation effect in terms of both line outage and load shed consequences. To evaluate the mitigation effect, the size of cascades is classified into small, medium, and large cascade in terms of the total number of line outages (.τ L ), the total amount of load shed (.τ B ), or a comprehensive cost (.τ ). The comprehensive

256

11 Interdependency Between Power System Outages by Coupled Interaction Model

Table 11.3 Probabilities of different cascade sizes measured by .τ L

Medium < τ L ≤ 25 0.1095 0.1040 0.0142 0.0015 0.0018

Small L ≤ 15 0.8872 0.8932 0.9856 0.9985 0.9982



Cases No mitigation Random L B .c =0, .c =1 L B .c =1, .c =0 L B .c =1, .c =1.5

.15

Large L > 25 0.0033 0.0028 1.53.×10−4 0 0



Table 11.4 Probabilities of different cascade sizes measured by .τ B Cases No mitigation Random L B .c =0, .c =1 L B .c =1, .c =0 L B .c =1, .c =1.5

Small B ≤ 1500 0.9129 0.9144 0.9916 0.9584 0.9910

Medium < τ B ≤ 3000 0.0798 0.0781 0.0082 0.0405 0.0087



Table 11.5 Probabilities of different cascade sizes measured by .τ

.1500

Small ≤ 0.15 0.9483 0.9485 0.9963 0.9901 0.9971



Cases No mitigation Random L B .c =0, .c =1 L B .c =1, .c =0 L B .c =1, .c =1.5

Large B > 3000 0.0073 0.0075 2.13.×10−4 0.0011 3.34.×10−4



Medium < τ ≤ 0.30 0.0506 0.0502 0.0037 0.0098 0.0029

.0.15

Large > 0.30 0.0011 0.0013 0 8.67.×10−5 0



cost .τ is calculated by assuming that the ratio between the cost of the normalized line outage and that of the normalized load shed is .1/1.5 as τ=

.

τL τB + 1.5 t . Nl X

(11.34)

Tables 11.3, 11.4, and 11.5 compare the probabilities of small, medium, and large cascades under the four mitigation strategies. The samples for large cascades are no less than 11 in order to estimate the probabilities of large cascades with a confidence level of 95% [24]. As in Tables 11.3 and 11.4, when .cL = 1 and .cB = 1.5 the mitigation effect in terms of line outage is similar to the case in which .cL = 1 and B = 0, while the mitigation effect in terms of load shed is similar to the case in .c which .cL = 0 and .cB = 1. It is seen from Table 11.5 that a proper choice of .cL and .cB enables the fourth strategy to provide the most reduction of the medium and large cascades in terms of the comprehensive cost. In addition to line outages and the load shed, other types of events can also be incorporated into the coupled interaction model. For example, similar to [4], the number of isolated buses can also be analyzed together with line outages and the

Appendix: Discretization Unit for Each Load Bus

257

load shed. The tripping events of generators reported in recent blackouts [26], which may be caused following the voltage disturbances after line faults, also deserve attention. The coupled interaction model is promising for capturing the propagation of isolated buses or the generator outages, which are discrete events similar to line outages in blackouts.

Appendix: Discretization Unit for Each Load Bus (k+1)

Let .λ˜ iv and .σ˜ iv be the mean and the variance of the discretized load shed (k) at bus v following the outage of line i when .Δv is chosen as the discretization (k+1) unit for bus v. If we change the discretization unit for bus v to .Δv , the mean (k+1) (k+1) (k) (k+1) 2(k+1) changes to .λiv = λ˜ iv Δv /Δv and the variance changes to .σiv =     2(k) (k) 2 (k+1) 2 σ˜ iv Δv / Δv . Since the mean of the Poisson distribution equals its variance, a better choice (k+1) (k+1) 2(k+1) of .Δv should make the mean .λiv close to the variance .σiv as much (k+1) should consider all the lines whose as possible. Moreover, the choice of .Δv outages have interactions with the load shed at bus v. Thus, to choose the optimal (k+1) discretization unit .Δv for bus v, the following optimization problem (11.35) can be solved:  (k+1)  2(k+1)

 (k+1)  σiv λiv L = . min f Δv Ni + (k+1) (11.35) 2(k+1) σ λ LB iv iv i∈S 2(k+1)

v

(k+1)

s.t. λiv

.

2(k+1)

σiv

(k)

Δv

(k+1) λ˜ . (k+1) iv Δv  (k) 2 Δv 2(k+1) = σ˜ iv . (k+1) Δv

=

(11.36)

(11.37)

It is easy to obtain the optimal solution as

Δ(k+1) v

.

2(k+1)  σ˜ NiL ˜iv(k+1) λiv i∈SvLB = Δ(k) . v  (k+1) λ˜ iv

NiL 2(k+1) i∈SvLB

σ˜ iv

(11.38)

258

11 Interdependency Between Power System Outages by Coupled Interaction Model

(k+1) LB(k+1) 2(k+1) 2(k+1) (k+1) If we consider .λ˜ iv ≈ b˜iv and .σ˜ iv ≈ Siv , then .Δv can be chosen as  2(k+1) Siv NiL ˜ LB(k+1) biv i∈SvLB (k+1) . .Δv = Δ(k) v  LB(k+1) b˜

NiL iv2(k+1) i∈SvLB

Siv

References 1. I. Dobson, Estimating the propagation and extent of cascading line outages from utility data with a branching process, IEEE Trans. Power Syst. 27(4), 2146–2155 (2012) 2. J. Kim, K.R. Wierzbicki, I. Dobson, R.C. Hardiman, Estimating propagation and distribution of load shed in simulations of cascading blackouts. IEEE Syst. J. 6(3), 548–557 (2012) 3. J. Qi, I. Dobson, S. Mei, Towards estimating the statistics of simulated cascades of outages with branching processes. IEEE Trans. Power Syst. 28(3), 3410–3419 (2013) 4. J. Qi, W. Ju, K. Sun, Estimating the propagation of interdependent cascading outages with multi-type branching processes. IEEE Trans. Power Syst. 32(2), 1212–1223 (2017) 5. I. Dobson, B.A. Carreras, D.E. Newman, A branching process approximation to cascading load-dependent system failure, in Proceeding 37th Hawaii International Conference on System Sciences (2004), pp. 1–10 6. J. Qi, K. Sun, S. Mei, An interaction model for simulation and mitigation of cascading failures. IEEE Trans. Power Syst. 30(2), 804–819 (2015) 7. J. Qi, J. Wang, K. Sun, Efficient estimation of component interactions for cascading failure analysis by EM algorithm. IEEE Trans. Power Syst. 33(3), 3153–3161 (2018) 8. W. Ju, K. Sun, J. Qi, Multi-layer interaction graph for analysis and mitigation of cascading outages. IEEE J. Emerg. Sel. Top. Circuits Syst. 7(2), 239–249 (2017) 9. J. Qi, Utility outage data driven interaction networks for cascading failure analysis and mitigation. IEEE Trans. Power Syst. 36(2), 1409–1418 (2021) 10. P.D. Hines, I. Dobson, E. Cotilla-Sanchez, M. Eppstein, “Dual graph” and “random chemistry” methods for cascading failure analysis, in Proceedings 46th Hawaii International Conference on System Sciences, 2013, pp. 2141–2150 11. P.D.H. Hines, I. Dobson, P. Rezaei, Cascading power outages propagate locally in an influence graph that is not the actual grid topology. IEEE Trans. Power Syst. 32(2), 958–967 (2017) 12. K. Zhou, I. Dobson, Z. Wang, A. Roitershtein, A.P. Ghosh, A Markovian influence graph formed from utility line outage data to mitigate large cascades. IEEE Trans. Power Syst. 35(4), 3224–3235 (2020) 13. K. Sun, Y. Hou, W. Sun, J. Qi, Power System Control Under Cascading Failures: Understanding, Mitigation, and System Restoration (Wiley-IEEE Press, New York, 2019) 14. L. Wang, J. Qi, B. Hu, K. Xie, A coupled interaction model for simulation and mitigation of interdependent cascading outages. IEEE Trans. Power Syst. 36(5), 4331–4342 (2021) 15. A. Shandilya, H. Gupta, J. Sharma, Method for generation rescheduling and load shedding to alleviate line overloads using local optimisation. IEE Proc. C Gener. Transm. Distrib. 140(5), 337–342 (1993) 16. D. Novosel, R.L. King, Using artificial neural networks for load shedding to alleviate overloaded lines. IEEE Trans. Power Deliv. 9(1), 425–433 (1994) 17. K. Saito, R. Nakano, M. Kimura, Prediction of information diffusion probabilities for independent cascade model, in KES (Springer, 2008), pp. 67–75

References

259

18. M. Vaiman, K. Bell, Y. Chen, B. Chowdhury, I. Dobson, P. Hines, M. Papic, S. Miller, P. Zhang, Risk assessment of cascading outages: methodologies and challenges. IEEE Trans. Power Syst. 27(2), 631 (2012) 19. Power Systems Test Case Archive. Univ. Washington. http://www.ee.washington.edu/research/ pstca/ 20. I. Dobson, B. Carreras, V. Lynch, D. Newman, An initial model for complex dynamics in electric power system blackouts, in Proceedings of the 34th Hawaii International Conference on System Sciences (2001), pp. 710–718 21. F.J. Massey Jr, The Kolmogorov-Smirnov test for goodness of fit, J. Am. Stat. Assoc. 46(253), 68–78 (1951) 22. H.H. Alhelou, M.E.H. Golshan, T.C. Njenda, N.D. Hatziargyriou, An overview of UFLS in conventional, modern, and future smart power systems: challenges and opportunities. Electr. Power Syst. Res. 179, 106054 (2020) 23. C. Chen, W. Ju, K. Sun, S. Ma, Mitigation of cascading outages using a dynamic interaction graph-based optimal power flow model. IEEE Access 7, 168 637–168 648 (2019) 24. I. Dobson, B.A. Carreras, D.E. Newman, How many occurrences of rare blackout events are needed to estimate event probability? IEEE Trans. Power Syst. 28(3), 3509–3510 (2013) 25. M. Kimura, K. Saito, H. Motoda, Minimizing the spread of contamination by blocking links in a network, in AAAI, vol. 8 (2008), pp. 1175–1180 26. Australian Energy Market Operator (AEMO)), Black System South Australia 28 September 2016 (2017). https://www.aemo.com.au

Chapter 12

Interdependency Between Smart Grid and Transportation Network

12.1 Introduction Smart grid and transportation network are increasingly interdependent. The majority of modern transportation management, operations, and control systems are powered by electricity [1]; power system supplies, maintenance, and restoration (e.g., crew dispatch and mobile generators) rely on an efficient transportation network [2]. The emergence of transportation electrification (e.g., electric vehicles (EVs)) further strengthens the interdependence between these two systems [3]. However, the majority of literature on planning and operation of transportation and power systems tackles these two systems separately, with outputs of one system served as exogenous inputs for the other [4]. In smart grid literature, EVs with controllable charging and vehicle-to-grid (V2G) capabilities are considered as part of the smart grid to improve power system efficiency and reliability [5–8]. To estimate power demand, EVs’ arrival and departure rates are typically assumed to be known [4]. For example, [9] investigates the provision of reactive power from EVs to reduce the probability of power quality violation given EV plug-in and departure time, [10] includes a discussion on EVs participating in frequency regulation in the power system treating travel demand as given, [3] uses robust optimization and reformulates transportation equilibrium as charging demand uncertainty set for distribution network optimization, and [11] investigates the potential benefits of dynamic network reconfiguration to distribution network, with EVs’ spatial–temporal availability and their charging demand estimated from transportation network models. In transportation literature, range anxiety of EV drivers and their charging behaviors have been integrated into transportation system modeling and charging infrastructure planning [12], which typically treat power systems as exogenous. For example, [13] studies the coordinated parking problem of EVs to support V2G services, assuming that EV parking can be centrally coordinated and the EVs parking demand at each parking facility is given. The literature on charging infrastructure deployment in the transportation network [14, 15] typically treat © Springer Nature Switzerland AG 2023 J. Qi, Smart Grid Resilience, https://doi.org/10.1007/978-3-031-29290-3_12

261

262

12 Interdependency Between Smart Grid and Transportation Network

locational electricity prices as exogenous variables influencing the facility choice decisions of EV drivers. Separating smart grid and transportation analyses is justifiable when EV penetration level is low and the feedback effects between these two systems are negligible. With an increasing EV penetration level in the future, the EV travel and charging patterns will largely affect the spatial and temporal distribution of power demand as well as ancillary service availability; on the other hand, charging costs, power availability, and ancillary service incentives from power system will influence the charging behaviors of EVs and impact transportation mobility. Therefore, alternative modeling strategies are urgently needed to capture the close couplings and feedback effects between the interdependent smart grid and transportation network. To address this issue, some recent studies aim to model the two systems simultaneously using network modeling techniques [4]. Most of them have a bilevel structure where a central planner at the upper level optimizes system welfare; traffic equilibrium and optimal power flow are presented at the lower level as constraints. For example, [16] determines the optimal allocation of public charging stations to maximize social welfare considering transportation and power transmission system equilibrium, [17] aims to decide optimal charging prices to minimize system cost, including power loss in the distribution grid and travel time in the transportation network, with traffic equilibrium and distribution optimal power flow as constraints, [18] minimizes total power and transportation costs by choosing optimal power and transportation expansion decisions. The problem is linearized and reformulated as an exact mixed integer convex program, and [19] proposes detailed models for the two systems separately, where a third non-profit entity is responsible for the cooperation of the systems by minimizing the total social costs associated with both of them. However, smart grid and transportation network planning is not centrally controlled by a single decision entity. Therefore, co-optimizing these two systems only provides a lower bound of the cost, which may not be achievable due to the decentralized nature of the decision making in power and transportation systems. In addition, mathematical programming with complementarity constraints, whose mathematical properties and computation algorithms have been extensively investigated in the past [20, 21], is still extremely challenging to solve, not to mention for large-scale transportation and power networks facing uncertainties. To model a decentralized transportation and power interaction and mitigate the computational challenges, [22] uses a simulation-based approach to iteratively solve transportation at least cost vehicle routing problems and optimal power flow problems and communicates locational marginal prices and charging demand between these two systems. However, the convergence of the proposed algorithm may not be guaranteed. This work is further extended in [23] to consider power, transportation, and social systems as different layers coupled in an Internet of Things (IoT) framework. In [24], it is assumed that each charging station has a renewable energy source installed at the same location, and the electricity price is determined based on forecasting the generation of these sources and the feed-in tariff. However, in [23, 24], power system operation (e.g., power flow constraints) is ignored.

12.1 Introduction

263

Limited study considers endogenous flow-dependent travel time and charging prices for charging decision making in the context of decentralized multi-agent transportation and power system modeling. For example, in [25–27], charging location choice is modeled only based on travel distance. In [28], a more realistic approach is proposed for modeling the path selection of EV drivers by allowing en route charging. However, charging cost is not involved in the decision making. In [18, 29], a linearized transportation and power distribution system model is presented for the EV charging station and transportation system expansion planning. The charging station selection of EVs is based only on the time of travel, without considering other factors such as the preferability of the charging station and the charging price. In [11, 16], charging prices as well as flow-dependent travel time are considered, but charging prices are exogenous. Most existing research suffers from one or more limitations, which could be centralized decision making, exogenous charging prices, deterministic agent decision making, and simplified intra- and inter-systems interactions. Each of these limitations hampers the further investigation of interdependent smart grid and transportation system due to an increasing trend of transportation electrification, in terms of both planning and operation. First, only if EV behavior is properly modeled in an analytical framework, we can analyze how to plan and operate transportation infrastructure systems to influence the EV behavior and system interaction outcomes. Second, some of these characteristics are fundamental characteristics of both systems, such as uncertainty of renewable generation and decentralized decision making structure. Ignoring these characteristics will only provide a lower bound of system costs, which can cause significant bias for decision making and may not be implementable in reality. Third, one of the critical interdependencies between smart grid and transportation system is charging demand. Modeling endogenous prices can capture a feedback loop of charging demand and charging costs in spatially dependent transportation and power networks. Modeling endogenous prices also allows for further incentive design of charging infrastructure investment and charging costs to optimize both transportation and power systems. Addressing the abovementioned limitations in a holistic modeling framework faces significant challenges from both modeling and computational perspectives. First, modeling decentralized decision making of smart grid and transportation system with endogenous prices typically leads to a highly non-convex problem, where equilibrium existence/uniqueness and algorithm convergence are generally not guaranteed. Second, the curse of dimensionality brought by high-dimensional uncertain parameters, non-convex system interactions, and large-scale transportation and power systems requires coordination of novel modeling approaches and algorithm design. In this chapter, a multi-agent optimization framework [30] is presented for studying the interdependency between smart grid and transportation system considering uncertain renewable generation and EV routing and charging location choices. Decentralized key decision-makers are modeled with prices and travel time endogenously determined by the model in a unified framework, with equilibrium

264

12 Interdependency Between Smart Grid and Transportation Network

existence and uniqueness proved. An exact convex reformulation of the non-convex equilibrium problem based on strong duality further leads to both scenario and system decomposition and parallel computing.

12.2 Mathematical Modeling Five types of stakeholders are explicitly modeled: renewable energy investors, conventional generators, independent system operator (ISO), and EV/conventional vehicle drivers, who interact with one another in a unified framework. A perfectly competitive market is assumed for the power supply and charging market, i.e., individual decision-makers in both supply and demand sides (e.g., generators, drivers) do not have market power to influence equilibrium prices through unilaterally altering his/her decisions. For example, power suppliers will decide their investment and production quantities to maximize their own profits, considering the locational electricity prices in the wholesale market. Electricity and charging prices are endogenously determined within the model. The notations used in this section are listed in Tables 12.1–12.3.

12.2.1 Renewable Energy Investor Modeling In a perfectly competitive market, since prices are considered exogenous by investors, the investment decisions can be calculated by aggregating a large number of investors into a dummy investor, whose cost functions are aggregated costs for all investors [14]. The decision making problems of renewable energy investors are formulated as a two-stage stochastic programming problem: .

max

uS ,g S ≥0

s.t.



    S S − CiS,O (gi,ξ ) − CiS,I (uSi ). ρi,ξ gi,ξ i∈I S

giS (ξ ) ≤ ξi uSi , ∀i ∈ I S , ξ ∈ Ξ .  ciS uSi ≤ B.

(12.1a)

i∈I S

(12.1b) (12.1c)

i∈I S

Table 12.1 Notations for sets and indices .Ξ .G (N .I

, A )/G (I , E )

S /I C /I T

.Ii

¯ .S¯ (.R/S ) .R/

Set of uncertain capacity factors, indexed by .ξ Transportation/power graphs, with .N , I (indexed by .n, i) being vertex sets and .A , E (indexed by .a, e) being edge sets Node sets of renewable/conventional generators/charging stations, indexed by i Node sets connecting to node i, indexed by j Set of origins/destinations of conventional (electric) vehicles, indexed by .r, s

12.2 Mathematical Modeling

265

Table 12.2 Notations for parameters and functions .α .β/ε .ν .A

B .bij

S,I

.Ci

(·)/CiS,O (·)

C (·)

.Ci .E

rs

.ers

C , uC i

.li

.li .P (ξ ) .Qr .q¯rs .tta (·) .ttrs .Urs .va

ν

.zi

Coefficient for augmented Lagrangian function Coefficients/error terms in drivers’ utility function Iteration number Node-link incidence matrix of transportation network Investment budget for renewable generators Susceptance of transmission line .(i, j ) Aggregated investment/operation cost functions of renewable generators at location i Aggregated production cost functions of conventional generators at location i O–D incidence vector of O–D pair rs with .+1 at origin and .−1 at destination Average EV charging demand from r to s Lower and upper bound of conventional generation at location i Power load (excluding charging load) at node i Probability measure of scenario .ξ EV initial quantity at location r Conventional vehicle travel demand from r to s Link travel time function of link a Equilibrium path travel time from r to s Deterministic component of utility measures for drivers going from r to s Aggregated traffic flow on link a Renewable investment at i at iteration .ν

Table 12.3 Notations for variables .γ .θi .λi .ρi

rs

.τ  n .di

.fij /va

S /g C i

.gi

.pi .qrs

S

.ui

rs

.xa

Dual variables for non-anticipativity constraints Phase angle at node i Charging price at location i Wholesale electricity price at location i Dual variable of rs flow conservation at node .n Energy purchased by ISO at node i Transmission/transportation link flow Electricity generation from renewable/conventional generators at node i Charging load at node i EV travel demand from node r to node s Capacity of renewable generator at node i Link traffic flow on link a that travels from node r to node s

In the first stage, investors decide the renewable energy generation capacity .uS considering the future uncertain renewable capacity factors .ξ at each location in the future, which is determined by uncertain factors such as renewable radiance intensity, weather, and temperature [31]. In the second stage, given a realization of renewable generation uncertainties .ξ and the first stage decision variable .uS , renew-

266

12 Interdependency Between Smart Grid and Transportation Network

able generators determine the generation quantities .g S . The second-stage model is at hourly level. The investors aim to maximize their long-term expected profits [32], which is calculated as the total expected net revenue (revenue subtracting operational cost) in the second stage minus investment cost in the first stage, as shown in (12.1a). Without loss of generality, assume .CiS,I (·) has a quadratic form, with a positive quadratic coefficient to reflect an increasing marginal cost of land procurement and .CiS,O (·) is a linear function [14]. Constraint (12.1b) guarantees the power output of renewable generator does not exceed the capacity .uSi times the uncertain renewable intensity parameter .ξi . Constraint (12.1c) is the budget constraint for the total investment. Note that since we focus on long-term planning, model (12.1) could be adapted to model different renewable energy resources, such as solar panels and wind turbines, whose total output capacity is uncertain and influenced by natural resource availability.

12.2.2 Conventional Generators For each scenario .ξ ∈ Ξ , conventional generators solve the following optimization problem to determine their generation quantity .giC , ∀i ∈ I C . Although the decision variables for conventional generators, ISO, and drivers are all scenario dependent, the notation .ξ is omitted for brevity. .

max

g C ≥0

s.t.

   ρi giC − CiC (giC ) .

(12.2a)

i∈I C C giC ≤ uC i , ∀i ∈ I .

(12.2b)

giC ≥ liC , ∀i ∈ I C

(12.2c)

Objective (12.2a) maximizes the profits of  conventional generators at each scenario ξ , which is calculated as the total revenue . i∈I C ρi giC minus the total production  cost . i∈I C CiC (giC ). Assume .CiC (·) has a quadratic form, which is consistent with the settings in IEEE test systems.1 Constraints (12.2b)–(12.2c) are the upper and lower bounds for power generation at each conventional generator location .i ∈ I C .

.

12.2.3 ISO Modeling While conventional Level 1, Level 2, and DC fast charging infrastructure is connected to low-voltage distribution network, recent proposal on 350 kW–1 MW

1 Source:

https://matpower.org/docs/ref/matpower5.0/.

12.2 Mathematical Modeling

267

ultra-fast chargers may need to be connected to transmission or sub-transmission systems [4]. In this chapter we focus on inter-city travel with ultra-fast charging stations that are directly connected to the sub-transmission network. ISO monitors, controls, and coordinates the operation of electric power systems. While ISO has many specific tasks, we focus on their daily operation to determine the power purchase and transmission plan to maximize system efficiency, which will implicitly determine locational marginal prices. Denote a power network by P = (I , E ). The ISO decision making can be described by model (12.3). The .G objective function (12.3a) minimizes the total energy purchasing cost from both renewable and conventional generators, . i∈I S ∪I C ρi di , minus the total energy  revenue from charging stations, . i∈I T λi pi . Notice that when .pi < 0, ISO purchases .−pi energy from charging station i, and the total energy cost for power systems will be minimized. Constraints (12.3b)–(12.3d) are power flow constraints, where (12.3b) gives line flow patterns under DC power flow assumptions, (12.3c) guarantees the power balance at each location i, and (12.3d) is the transmission capacity constraint. Model (12.3) guarantees that ISO will prefer purchasing power from cheaper generators and supplying power to more demanded charging locations, given grid topological and physical constraints. .



min

p,d≥0,θ ,f

s.t.

i∈I S ∪I C

ρi di +



λi (−pi ).

(12.3a)

i∈I T

bij (θi − θj ) = fij , ∀(i, j ) ∈ E .  fij = di − pi − li , ∀i ∈ I .

(12.3b) (12.3c)

j ∈Ii

− uPij ≤ fij ≤ uPij , ∀(i, j ) ∈ E

(12.3d)

12.2.4 Driver Modeling EV drivers need to determine their charging locations and travel routes in a transportation graph, denoted as .G (N , A ). EVs departing from .r ∈ R select a charging station .s ∈ S , with the utility function defined in (12.4). A similar utility function has been adopted in [11] to describe charging facility location choices and can be extended to include other relevant factors based on evolving EV charging behaviors without interfering the fundamental modeling and computational strategies presented in this chapter. Utility function (12.4) reflects the trade-off of EV drivers between four aspects: locational attractiveness .β0,s , travel time .−β1 ttrs , charging cost .−β2 ers λs , and an error term . . For example, an EV driver may not choose a charging facility on the shortest path or with the lowest charging cost. Instead, the EV driver will balance locational preference, travel time, and charging costs. Average charging demand is used for each rs pair. This modeling framework can be naturally extended to incorporate the variation of charging demands by

268

12 Interdependency Between Smart Grid and Transportation Network

creating dummy rs pairs. For home charging or workplace charging, where travelers have fixed destinations, the corresponding locational attractiveness (i.e., .β0,s ) will dominate the other utility components. Urs = β0,s − β1 ttrs − β2 ers λs +

.

(12.4)

Different assumptions on the probability distribution of . result in different discrete choice models. Here a multinomial logit model is adopted, in which . follows an extreme value distribution. The outputs of the discrete choice models have two interpretations. First, discrete choice models calculate the probability of a vehicle choosing from different destinations. Second, the results describe the traffic distribution to different destinations at an aggregated level. The utility function (12.4) partially depends on travel time .ttrs , which is determined by the destination and route choices of all the drivers (including EVs and conventional vehicles). Assume conventional vehicles have known destination choices, with origin–destination flow .q¯rs . The destination choice of EVs .qrs and path travel time .ttrs are coupled. On the one hand, the selection of charging location s will increase the travel demands on certain paths from r to s and influence the travel time of the transportation network; on the other hand, path travel time will affect destination choices as travel time is a factor in the utility function (12.4). To capture these couplings, a combined distribution and assignment (CDA) model [33] is adopted to model their destination choices and route choices, as shown in (12.5). .



min

x,q≥0

a∈A

+ s.t.

va

tta (u)du 0

  1  qrs ln qrs − 1 + β2 ers λs − β0,s . β1 r∈R s∈S   x¯ rs . v= x rs +

(12.5a) (12.5b)

¯ r∈R,s∈ S¯

r∈R,s∈S

(τ )Ax rs = qrs E , ∀r ∈ R, s ∈ S .

(12.5c)

¯ s ∈ S¯. Ax¯ rs = q¯rs E rs , ∀r ∈ R,  qrs = Qr , ∀r ∈ R

(12.5d)

rs

rs

(12.5e)

s∈S

The Bureau of Public v (BPR) function [34] is used here to determine the  Roads time of travel .tta (·). . a∈A 0 a tta (u)du in the objective function (12.5a) is the summation of thearea under all the link travel cost functions .tta (·), which is the total travel time . a∈A va tta (va ) minus the externalities caused by route choices. The second part consists of the entropy of traffic distribution .qrs (ln qrs − 1) and utility terms in (12.4). The objective function (12.5a) does not have a physical

12.2 Mathematical Modeling

269

interpretation [33], and it is a potential function constructed to guarantee the optimal solutions of (12.5) are consistent with the first Wardrop principal (a.k.a. user equilibrium2 ) [35] and the multinomial logit destination choice assumption. These conditions can be guaranteed by sufficient and necessary Karush–Kuhn– Tucker conditions of model (12.5) regarding .x rs and .q. For detailed proofs, one can refer to [33]. Constraint (12.5b) calculates link flows by summing link flows of EVs and conventional vehicles over all origin and destination pairs. Constraints (12.5c)–(12.5d) are the vehicle flow conservation at each node for EV travel demand .qrs and conventional vehicle travel demand .q ¯rs , respectively. Constraint (12.5e) guarantees the summation of EV traffic flow distribution to each s equals the total EV travel demand from r, .Qr . The equilibrium travel time for each OD pair rs . can be calculated as .ttrs = τrrs − τsrs , where .τirs is the dual variable for constraint (12.5c). An implicit assumption of model (12.5) is that the EV drivers will charge at their destinations. In [36], an extension of the CDA model is introduced, denoted as generalized combined distribution and assignment model (GCDA), where vehicles can charge at their origins, destinations, or en route. In addition, the GCDA model also considers both one-way and round-trip travel. Here, that model is not included to avoid over complication of the presented modeling framework and to keep a clear focus on developing a multi-agent power transportation interaction model considering renewable energy and effective computational strategies. Replacing CDA model with GCDA model is a relatively straightforward process and does not make a major difference to the modeling framework, and the proposed computational strategies in Sect. 12.3 still apply.

12.2.5 Market Clearing Conditions The power purchased and supplied by ISO at each location i, .di and .pi , needs to be balanced with locational power generation and charging demand, respectively, in a stable market. Otherwise, some of the supply or demand cannot be fulfilled and the market clearing prices will be adapted accordingly. To provide a steady-state modeling framework for long-term planning problems, real-time agent deviation from the equilibrium will not be considered and the hourly market clearing conditions can be stated as (12.6), where (12.6a) guarantees that the total energy purchased by ISO is equal to the total energy generated at each location; Eq. (12.6b) enforces the balance between charging supply and charging demand of EVs. .i(s) denotes the node index in the power graph that charging location s connects to. Equation (12.6b) does not include regular power demand, denoted by .li , which has been considered in the ISO power balancing constraint (12.3c).

2 The journey times in all routes used are equal to and less than those that would be experienced by a single vehicle on any unused routes.

270

12 Interdependency Between Smart Grid and Transportation Network

Locational prices of electricity .ρi and charging .λs can be interpreted as the dual variables for the market clearing conditions, respectively. Notice that there is a market clearing condition (12.6) for each scenario .ξ , which could result in different prices for peak and off-peak hours. (ρi ) di = giS + giC , ∀i ∈ I S ∪ I C.  (λs ) ers qrs = pi(s) , ∀s ∈ S

.

(12.6a) (12.6b)

r∈R

12.3 Computational Approach The system formulations for each stakeholder need to be solved simultaneously, as the decision processes of one agent depend on the decisions of the others. For example, locational prices are determined by the collective actions of all agents, as described in the market clearing conditions in (12.6). Since each stakeholder’s optimization problem is convex with mild assumptions on the cost functions of travel, investment, and generation, one can reformulate each optimization problem as sufficient and necessary complementarity problems (CPs) and solve all the CPs together [37]. However, solving CPs directly is challenging because of the non-convexity and high dimensionality. An exact convex reformulation is developed for this problem, as stated in Theorem 12.1. Theorem 12.1 (Convex Reformulation) If .tta (·), .CiS,I (·), .CiS,O (·), and .CiC (·) are convex, the equilibrium states of agents’ interactions in perfectly competitive market, i.e., (12.1), (12.2), (12.3), (12.5), and market clearing conditions (12.6), are equivalent to solving a convex optimization problem, as formulated in (12.7). .

min



(u,g,p, d,x,q)≥0 i∈I S

+

CiS,I (uSi ) + Eξ

S CiS,O (gi,ξ )

i∈I S



C CiC (gi,ξ )+

i∈I C

+



 β1  va,ξ tta (u)du β2 0 a∈A

  1  qrs,ξ ln qrs,ξ − 1 − β0,s β2 r∈R s∈S

s.t. (12.1b)–(12.1c), (12.2b)–(12.2c), (12.3b)–(12.3d), (12.5b)–(12.5e), (12.6a)–(12.6b) Proof See the Appendix.

(12.7a)

12.3 Computational Approach

271

The intuition behind the reformulation (12.7) is the reverse procedures of Lagrangian relaxation, where the penalty terms (e.g., .ρi di , .ρi giS , and .ρi giC ) are moved from the objective function back to constraints (e.g., (12.6a–12.6b)). This convex reformulation allows applying alternating direction method of multipliers (ADMMs), which leads to decomposition with guaranteed convergence properties [38], in contrast to heuristic diagonal methods. The existence and uniqueness of the system equilibrium is stated in Corollary 1. Corollary 1 (Existence and Uniqueness of Systems Equilibrium) If .tta (·), CiS,I (·), .CiS,O (·), and .CiC (·) are strictly convex functions, the system equilibrium exists and is unique.

.

Proof See the Appendix. To develop a decomposition-based algorithm, first relax the investment decision uSi to be scenario dependent and introduce non-anticipativity constraints,3 as shown in (12.8).

.

(γi (ξ )) uSi,ξ = zi , ∀i ∈ I S , ξ ∈ Ξ

.

(12.8)

Then, problem (12.7) can be reformulated as (12.9).

.

min

(u,g,p, d,x,q)≥0





S (CiS,I (uSi,ξ ) + CiS,O (gi,ξ ))

i∈I S

+



C CiC (gi,ξ )+

i∈I C

 β1  va,ξ tta (u)du β2 0 a∈A

  1  qrs,ξ ln qrs,ξ − 1 − β0,s + β2

(12.9a)

r∈R s∈S

s.t. (12.1b)–(12.1c), (12.2b)–(12.2c), (12.3b)–(12.3d), (12.5b)–(12.5e), (12.6a)–(12.6b), (12.8) Notice that (12.9) can be decomposed by scenarios and by systems if the nonanticipativity constraints (12.8) and charging market clearing conditions (12.6b) are, respectively, relaxed. A solution algorithm based on ADMM is summarized in Algorithm 12.1. Since Algorithm 12.1 is an application of the general ADMM approach on a convex optimization problem, convergence is theoretically guaranteed [38]. An augmented Lagrangian approach is used to relax constraints (12.6b) and (12.8), with dual variables .λ and .γ , respectively. The decision variables of model

3 Decisions made before uncertainties being revealed should not be measurable by a specific scenario .ξ .

272

12 Interdependency Between Smart Grid and Transportation Network

Algorithm 12.1 ADMM-Based Decomposition Algorithm 1: uν , g ν , p ν , d ν , x ν , q ν , zν , λν , γ ν 2: initialization: u0 , g 0 , p0 , d 0 , x 0 , q 0 , z0 , λ0 , γ 0 , α = 1, ε = 0.001, ν = 1, gap = ∞ = (uνξ , g νξ , p νξ , d νξ ), y 2 = (x νξ , q νξ ), K1 = {y 1 |y 1 3: define: y 1 0, (12.1b)–(12.1c), (12.2b)–(12.2c), (12.3b)–(12.3d), (12.6a)}, K2 = {y 2 |y 2 0, (12.5b)–(12.5e)} 4: while gap ≥ ε do 5: Step 1: ∀ξ ∈ Ξ , uνξ , g νξ , p νξ , d νξ ∈ arg min (12.13a)

≥ ≥

y 1 ∈K1

6:

Step 2: ∀ξ ∈ Ξ , x νξ , q νξ ∈ arg min (12.14a) y 2 ∈K2

zν =



P (ξ )uνξ

(12.10)

ξ ∈Ξ

7:

Step 3: S,ν ν ∀i ∈ I S , γ νi,ξ = γ ν−1 i,ξ + α(ui,ξ − zi )

(12.11)



 ν ν + α − p + q ∀i ∈ I T , λνi,ξ = λν−1 i,ξ ri,ξ i,ξ

(12.12)

r∈R

8:

Step 4: gap1 = gap2 =

max

ν ν |uS,ν i,ξ − zi |/zi

max

ν | − pi,ξ +

ξ ∈Ξ ,i∈I S

ξ ∈Ξ ,i∈I T

gap = max{gap1 , gap2 },



ν ν qri,ξ |/pi,ξ

r∈R

let ν := ν + 1

9: end while

(12.9) can be divided into two groups, .(u, g, p, d) and .(x, q, z), and be updated iteratively. When fixing .(x, q, z) and addingthe augmented Lagrangian terms   ν (uS − zν ) + α ||uS − zν ||2 ] and . ν (−p ν )+ . [γ [λ + q S T i,ξ i∈I i∈I r∈R ri,ξ i i 2 i,ξ i,ξ i,ξ i,ξ 2  α ν ||2 ] in model (12.9), we will have model (12.13) to update || − p + q i,ξ r∈R ri,ξ 2 2 .(u, g, p, d); likewise, when .(u, g, p, d) are fixed, we will have model (12.14) to update .(x, q), and the .z updates can be derived analytically, see equation (12.10), ν = 0 [39]. Notice that the updates of both .(u, g, p, d) using the fact that . ξ ∈Ξ γi,ξ and .(x, q) can be decomposed by scenarios, as described in Step 1 and Step 2 in Algorithm 12.1. Step 3 updates the dual variables .λ and .γ , with step size equal to the augmented Lagrangian parameter .α, so that the dual feasibility with respect to Step 2 is guaranteed in each iteration (see [38]).

12.4 Results on Three-Node Test System

.

  S,I S,O S S C CiC (gi,ξ ) Ci (ui,ξ ) + Ci (gi,ξ ) +

min

(u,g,p, d)≥0

+

273

i∈I S

i∈I C



α ν γi,ξ (uSi,ξ − ziν ) + ||uSi,ξ − ziν ||22 2 S

i∈I

+



  α ν ν λνi,ξ (−pi,ξ + qri,ξ ) + || − pi,ξ + qri,ξ ||22 2 T r∈R

i∈I

s.t.

(12.13a)

r∈R

(12.1b)–(12.1c), (12.2b)–(12.2c), (12.3b)–(12.3d), (12.6a)

   β1  va,ξ 1  qrs,ξ ln qrs,ξ − 1 − β0s tta (u)du + . min (x,q)≥0 β2 β 2 r∈R s∈S a∈A 0

   α ν+1 ν+1 λνi,ξ (−pi,ξ + + qri,ξ ) + || − pi,ξ + qri,ξ ||22 (12.14a) 2 T i∈I

s.t.

r∈R

r∈R

(12.5b)–(12.5e)

12.4 Results on Three-Node Test System A three-node test system is shown in Fig. 12.1. The solid and dashed lines are transportation and power links, respectively. A traffic flow of 50 departing from node 1 has two possible charging destinations, nodes 2 and 3. Each node has 50 units of existing power load in addition to the charging load. All the other link parameters are shown in Fig. 12.1. Figure 12.2 shows the distribution of uncertain renewable Fig. 12.1 Three-node test system. © [2021] IEEE. Reprinted, with permission, from [30]

274

12 Interdependency Between Smart Grid and Transportation Network

Fig. 12.2 Variation in .ξ . © [2021] IEEE. Reprinted, with permission, from [30]

capacity factors for 10 scenarios. The models and decomposition algorithms are implemented on Pyomo 5.6.6 [40] and the sub-problems are solved using Cplex 12.8 and IPOPT 3.12.13, with 0.1% optimality gap. All numerical experiments are run on a 2.3 GHz 8-Core Intel Core i9 with 16 GB of RAM memory, under Mac OS X operating system, with parallel computing enabled. Three cases are compared. Case 1 is the base case, as shown in Fig. 12.1. In case 2, the transportation capacity for link 1–3 reduces from 10 to 5. In case 3, the transmission capacity for link 1–3 reduces from 200 to 50. The reduction of transportation or transmission capacity may have multiple conflicting impacts on the system, including influence of equilibrium prices, renewable energy investment, system costs, and redistribution of traffic or power flow. Due to the interconnection and interdependence between smart grid and transportation network, these effects cannot be properly quantified without a modeling framework that can include the decision making of both systems as endogenous variables and capture the feedback effects between these two systems. The results aim to demonstrate the capability of the models on capturing the systems interdependence and quantifying the impacts of capacity reduction on system interaction outcomes.

12.4.1 Effects on Equilibrium Prices From Fig. 12.3a, with sufficient transmission link capacity (cases 1 and 2), the energy prices at nodes 2 and 3 are identical; otherwise, there will be incentives for the generators and ISO to supply more energy to the node with higher prices. Notice that the reduction of transportation link capacity does not impact the energy prices when there are sufficient transmission capacity and flexibility. When we reduce transmission capacity for link 1–3, prices at both nodes 2 and 3 increase. The reason is that the transmission capacity of the whole network will not be better off with a reduction of transmission capacity on any links. With a limitation on

12.4 Results on Three-Node Test System

275

Fig. 12.3 Impacts of link capacity on energy prices and system costs. (a) Energy prices. (b) Investment. (c) System costs. © [2021] IEEE. Reprinted, with permission, from [30]

overall transmission capacity, power may have to be generated at a more expensive location, which leads to higher marginal prices. But prices on node 3 increase to a larger extent than node 2, because node 3 is directly impacted by the transmission capacity limitation, which makes node 3 more challenging to receive energy.

12.4.2 Effects on Renewable Investment Since nodes 2 and 3 have the same cost parameters, the locational investment amount of renewable generators is determined by the equilibrium prices of energy .ρ and the distribution of capacity factors .ξ , see model (12.1). For example, a location with higher equilibrium price and a higher capacity factor will be strictly preferred. But in the case when these two factors are conflicting with each other, the investment amount will depend on whose influence is dominant. The investment results are shown in Fig. 12.3b. In cases 1 and 2, all investments are made in node 2. This is because node 2 has slightly higher capacity factors on average (see Fig. 12.2), and the locational energy prices are the same at nodes 2 and 3 in cases 1 and 2 (see Fig. 12.3a). But in case 3, the energy prices at node 3 are higher than the prices at node 2. The influence of energy prices on investment outperforms the influence of renewable capacity factors, which leads to more investment in node 3 in case 3. Comparing between cases 1 and 3, we can see that a reduction of transmission capacity for link 1–3 will increase the energy price at node 3 more than node 2, which will lead to a relocation of 46 units of investment from node 2 to node 3. This observation indicates that renewable energy investment could offset some negative impacts of high energy costs and energy scarcity due to limited transmission capacity. Notice that the influence of energy prices on renewable investment can only be numerically quantified because prices are endogenously determined by the models and calculating the marginal impacts of prices on renewable investment based on model (12.1) may be misleading.

276

12 Interdependency Between Smart Grid and Transportation Network

12.4.3 Effects on System Costs System costs include travel costs and energy costs. Travel costs depend on traffic distribution and the capacity of transportation infrastructure. Figure 12.3c shows the distribution of travel costs and energy costs. Both cases 2 and 3 have higher travel time than case 1, but for different reasons. For case 2, the increased total travel time is because of a reduction on the transportation capacity on link 1–3, which leads to more congestion for the whole transportation network. The reason for higher travel time in case 3 than case 1 is because node 2 has a cheaper energy price compared to node 3 in case 3. More traffic will choose to travel through link 1–2. Since the total travel demand is fixed (i.e., 50), an imbalanced traffic pattern will cause more congestion. Energy costs are the total costs of conventional energy and renewable energy production. Since renewable energy is cheaper and the total energy demand is fixed, cases with more renewable energy utilized will have lower total energy costs. Cases 1 and 2 do not have transmission congestion, so all the renewable energy can be utilized. The total energy in cases 1 and 2 will be identical and less than case 3, where transmission constraints prevent the effective usage of renewable energy. These observations are consistent with common beliefs, which provides evidence that the modeling framework is effective to describe the main interaction between transportation and power systems.

12.4.4 Effects on Flow Distribution Comparing Fig. 12.4a and b, transportation congestion on link 1–3 shifts 8.1 units of travel demand from link 1–3 to link 1–2. The increasing of charging demand on node 2 leads to redistribution of power flow on each transmission line to preserve power flow conservation and power physics laws. Similarly, comparing Fig. 12.4a and c, constraining the transmission capacity of link 1–3 results in an increase of the power flow on links 1–2 and 2–3. In addition, the reduced transmission capacity on link 1–3 will increase the energy prices at node 3 more than the prices at node 2 (see

Fig. 12.4 Impacts of link capacity on link flow. (a) Case 1. (b) Case 2. (c) Case 3. © [2021] IEEE. Reprinted, with permission, from [30]

12.5 Results on Sioux Falls Road Network and IEEE 39-Bus Test System

277

Sect. 12.4.1), which will discourage EVs from traveling to node 3. Notice that few vehicles are charging at node 3 in case 3 (see Fig. 12.4c), and the traffic flow from node 1 to node 2 splits into two paths, 1–2 and 1–3–2, to avoid traffic congestion. Again, these results illustrate the interdependency between transportation and power systems. Without properly considering the complicated interaction and feedback effects, the analysis may be biased.

12.5 Results on Sioux Falls Road Network and IEEE 39-Bus Test System The convergence properties of the algorithms are demonstrated on larger networks. We use Sioux Falls road network4 (see Fig. 12.5) and IEEE 39-bus test system5 (see Fig. 12.6), two test systems widely used in transportation and power system literature, respectively. The correspondence between the node indexes in transportation and power systems is shown in Table 12.4. To avoid infeasibility due to increased power demand from charging, we scale down the travel demand and road capacity to be 1% of the original values. Sioux Falls network is at the city level, while IEEE 39bus system is a regional-level transmission network. To better reflect the inter-city travel, we scale up the free flow link travel time in Sioux Falls network to 10 times of its original value. A similar approach is adopted by [14, 16]. The candidate EV charging locations and PV investment locations are marked in green in Fig. 12.5. The charging load accounts for 18.5% of the total load. The uncertain renewable generation factors .ξ are randomly sampled from uniform distribution .[0.5, 1.5]. The optimality gap is set to be 1%. Fig. 12.5 Sioux falls transportation network. © [2021] IEEE. Reprinted, with permission, from [30]

3

11

22

1 5

2

4 8

3

44

6 7

35

11 9

13 23

10 31

9 25

33 12

36

11 11

27 32

15

55

26

12

16

21

48 29 51 49

41 38

14 14 42 71 23

44 72 70

13 13

74 39

4 Data: 5 Data:

https://github.com/bstabler/TransportationNetworks/. https://matpower.org/docs/ref/matpower5.0/case39.html.

24

66 75

58 19 19

45

46 67 22

21 21

55

7 18 54

50

52

57 15 15

59

69 65 68

73 76

16

20

17

28 43 53

37

17

22 47

30 34 40

19 8

24

10 10

14 6

61

63

62 64

20 20

56 60

18

278

12 Interdependency Between Smart Grid and Transportation Network

Fig. 12.6 IEEE 39-bust test system. © [2021] IEEE. Reprinted, with permission, from [30]

Table 12.4 Node correspondence between systems System Transportation Power

Node index 4 2 1 6 1 4

5 11

10 13

11 16

13 19

14 2

15 23

19 25

20 27

21 32

Fig. 12.7 Convergence patterns and computing time. © [2021] IEEE. Reprinted, with permission, from [30]

The convergence patterns and computing time are shown in Fig. 12.7. Algorithm 12.1 converges reliability within 100 iterations for up to 100 scenarios. The computing time is almost linearly increasing (from 1.1 min to 301.0 min) with the scenario number, but with a higher increasing rate when solving the whole problem using IPOPT. IPOPT cannot solve for 10 or more scenarios in 24 h. Algorithm 12.1 has the potential for more scenarios due to the solution strategies of scenario decomposition and parallel computing. But one may need to increase the number of CPU cores to take full advantage of parallel computing.

Appendix: Proofs

279

Appendix: Proofs Proof (Theorem 12.1) The Lagrangian of (12.7) can be written as (12.15) after relaxing equilibrium constraints (12.6). L =



.

i∈I S

+

β1 β2

CiS,I (uSi ) + Eξ



+

va,ξ

0



S C ρ˜i,ξ (di,ξ − gi,ξ − gi,ξ )+

CiS,I (uSi ) + Eξ







ers qrs,ξ − pi(s),ξ



r∈R

S S CiS,O (gi,ξ ) − ρ˜i,ξ gi,ξ

C C CiC (gi,ξ ) − ρ˜i,ξ gi,ξ +

 β1  va,ξ tta (u)du β2 0 a∈A

1  qrs,ξ β2

  ln qrs,ξ − 1 + β2 λ˜ s,ξ ers − β0,s

r∈R s∈S



+

λ˜ s,ξ

s∈S

 i∈I S

i∈I C

+

  1  qrs,ξ ln qrs,ξ − 1 − β0,s β2 r∈R s∈S

i∈I S

+

C CiC (gi,ξ )

i∈I C

tta (u)du +

i∈I S ∪I C

=



S CiS,O (gi,ξ )) +

i∈I S

 a∈A



ρ˜i,ξ di,ξ +

i∈I S ∪I C



λ˜ s,ξ (−pi(s),ξ )

(12.15)

s∈S

. Define feasible set X = {u, g, p, d, x, q ≥ 0|(12.1b)–(12.1c), (12.2b)–(12.2c), (12.3b)–(12.3d), (12.5b)–(12.5e)}. Models (12.1), (12.2), (12.3), (12.5), and (12.7) are all convex optimization and satisfy linearity constraint qualifications. So strong duality holds. Model (12.7) is equivalent to (12.16) due to strong duality. .

min

max L = max

(u,g,p,d,x,q)∈X λ, ˜ ρ˜

min

˜ ρ˜ (u,g,p,d,x,q)∈X λ,

L

(12.16)

˜ ρ. ˜ min(u,g,p,d,x,q)∈X L is equivalent to (12.1, 12.2, 12.3, 12.5) for any given λ, ˜ ˜ In addition, (12.6) holds for the optimal λ, ρ. Proof (Corollary 1) If tta (·), CiS,I (·), CiS,O (·), and CiC (·) are strictly convex functions, model (12.7) is a strict convex optimization problem, which has a unique optimal solution. Following Theorem 12.1, the system equilibrium therefore exists and is unique.

280

12 Interdependency Between Smart Grid and Transportation Network

References 1. S.B. Miles, N. Jagielo, H. Gallagher, Hurricane Isaac power outage impacts and restoration. J. Infrastruct. Syst. 22(1), 05015005 (2016) 2. S. Lei, J. Wang, C. Chen, Y. Hou, Mobile emergency generator pre-positioning and real-time allocation for resilient response to natural disasters. IEEE Trans. Smart Grid 9(3), 2030–2041 (2016) 3. W. Wei, S. Mei, L. Wu, J. Wang, Y. Fang, Robust operation of distribution networks coupled with urban transportation infrastructures. IEEE Trans. Power Syst. 32(3), 2118–2130 (2016) 4. W. Wei, W. Danman, W. Qiuwei, M. Shafie-Khah, J.P. Catalao, Interdependence between transportation system and power distribution system: a comprehensive review on models and applications. J. Mod. Power Syst. Clean Energy 7(3), 433–448 (2019) 5. W. Tushar, C. Yuen, S. Huang, D.B. Smith, H.V. Poor, Cost minimization of charging stations with photovoltaics: an approach with EV classification. IEEE Trans. Intell. Transp. Syst 17(1), 156–169 (2015) 6. H. Liu, J. Qi, J. Wang, P. Li, C. Li, H. Wei, EV dispatch control for supplementary frequency regulation considering the expectation of EV owners. IEEE Trans. Smart Grid 9(4), 3763–3772 (2016) 7. H.-M. Chung, W.-T. Li, C. Yuen, C.-K. Wen, N. Crespi, Electric vehicle charge scheduling mechanism to maximize cost efficiency and user convenience. IEEE Trans. Smart Grid 10(3), 3020–3030 (2018) 8. H. Liu, K. Huang, N. Wang, J. Qi, Q. Wu, S. Ma, C. Li, Optimal dispatch for participation of electric vehicles in frequency regulation based on area control error and area regulation requirement. Appl. Energy 240, 46–55 (2019) 9. A.C. Melhorn, K. McKenna, A. Keane, D. Flynn, A. Dimitrovski, Autonomous plug and play electric vehicle charging scenarios including reactive power provision: a probabilistic load flow analysis. IET Gener. Transm. Distrib. 11(3), 768–775 (2017) 10. C. Peng, J. Zou, L. Lian, Dispatching strategies of electric vehicles participating in frequency regulation on power grid: a review. Renew. Sust. Energ. Rev. 68, 147–152 (2017) 11. Z. Guo, Z. Zhou, Y. Zhou, Impacts of integrating topology reconfiguration and vehicle-to-grid technologies on distribution system operation. IEEE Trans. Sustain. Energy 11, 1023–1032 (2019) 12. H. Zheng, X. He, Y. Li, S. Peeta, Traffic equilibrium and charging facility locations for electric vehicles. Netw. Spat. Econ. 17(2), 435–457 (2017) 13. A.Y. Lam, J. James, Y. Hou, V.O. Li, Coordinated autonomous vehicle parking for vehicle-togrid services: formulation and distributed algorithm. IEEE Trans. Smart Grid 9(5), 4356–4366 (2017) 14. Z. Guo, J. Deride, Y. Fan, Infrastructure planning for fast charging stations in a competitive market. Transp. Res. Part C Emerg. Technol. 68, 215–227 (2016) 15. Z. Chen, W. Liu, Y. Yin, Deployment of stationary and dynamic charging infrastructure for electric vehicles along traffic corridors. Transp. Res. Part C Emerg. Technol. 77, 185–206 (2017) 16. F. He, D. Wu, Y. Yin, Y. Guan, Optimal deployment of public charging stations for plug-in hybrid electric vehicles. Transp. Res. Part B Meth. 47, 87–101 (2013) 17. F. He, Y. Yin, J. Wang, Y. Yang, Sustainability SI: optimal prices of electricity at public charging stations for plug-in electric vehicles. Netw. Spat. Econ. 16(1), 131–154 (2016) 18. W. Wei, L. Wu, J. Wang, S. Mei, Expansion planning of urban electrified transportation networks: a mixed-integer convex programming approach. IEEE Trans. Transp. Electrif. 3(1), 210–224 (2017) 19. S. Xie, Z. Hu, J. Wang, Two-stage robust optimization for expansion planning of active distribution systems coupled with urban transportation networks. Appl. Energy 261, 114412 (2020)

References

281

20. S. Dempe, Foundations of Bilevel Programming (Springer Science & Business Media, Berlin, 2002) 21. B. Colson, P. Marcotte, G. Savard, Bilevel programming: a survey. 4or 3(2), 87–107 (2005) 22. M.H. Amini, O. Karabasoglu, Optimal operation of interdependent power systems and electrified transportation networks. Energies 11(1), 196 (2018) 23. M.H. Amini, J. Mohammadi, S. Kar, Distributed holistic framework for smart city infrastructures: tale of interdependent electrified transportation network and power grid. IEEE Access 7, 157 535–157 554 (2019) 24. S. Zhou, Y. Qiu, F. Zou, D. He, P. Yu, J. Du, X. Luo, C. Wang, Z. Wu, W. Gu, Dynamic EV charging pricing methodology for facilitating renewable energy with consideration of highway traffic flow. IEEE Access 8, 13 161–13 178 (2019) 25. X. Huang, J. Chen, H. Yang, Y. Cao, W. Guan, B. Huang, Economic planning approach for electric vehicle charging stations integrating traffic and power grid constraints. IET Gener. Transm. Dis. 12(17), 3925–3934 (2018) 26. A. Shukla, K. Verma, R. Kumar, Multi-objective synergistic planning of EV fast-charging stations in the distribution system coupled with the transportation network. IET Gener. Transm. Dis. 13(15), 3421–3432 (2019) 27. D. Mao, J. Tan, J. Wang, Location planning of PEV fast charging station: an integrated approach under traffic and power grid requirements. IEEE Trans. Intell. Transp. Syst. 22, 483– 492 (2020) 28. X. Zhang, P. Li, J. Hu, M. Liu, G. Wang, J. Qiu, K.W. Chan, Yen’s algorithm-based charging facility planning considering congestion in coupled transportation and power systems. IEEE Trans. Transport. Electrific. 5(4), 1134–1144 (2019) 29. X. Wang, M. Shahidehpour, C. Jiang, Z. Li, Coordinated planning strategy for electric vehicle charging stations and coupled traffic-electric networks. IEEE Trans. Power Syst. 34(1), 268– 279 (2018) 30. Z. Guo, F. Afifah, J. Qi, S. Baghali, A stochastic multiagent optimization framework for interdependent transportation and power system analyses. IEEE Trans. Transp. Electrification 7(3), 1088–1098 (2021) 31. C. Xiao, X. Yu, D. Yang, D. Que, Impact of solar irradiance intensity and temperature on the performance of compensated crystalline silicon solar cells. Sol. Energy Mater. Sol. Cells 128, 427–434 (2014) 32. S. Bruno, S. Ahmed, A. Shapiro, A. Street, Risk neutral and risk averse approaches to multistage renewable investment planning under uncertainty. Eur. J. Oper. Res. 250(3), 979– 989 (2016) 33. Y. Sheffi, Urban Transportation Networks, vol. 6 (Prentice-Hall, Englewood Cliffs, NJ, 1985) 34. B. of Public Roads, Traffic Assignment Manual (US Department of Commerce, Washington, DC, 1964) 35. J. Wardrop, Some theoretical aspects of road traffic research. Proc. Inst. Civ. Eng. 1(Part II), 325–378 (1952) 36. Z. Guo, Critical Infrastructure Systems: Distributed Decision Processes over Network and Uncertainties (University of California, Davis, 2016) 37. Z. Guo, Y. Fan, A stochastic multi-agent optimization model for energy infrastructure planning under uncertainty in an oligopolistic market. Netw. Spat. Econ. 17(2), 581–609 (2017) 38. S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al., Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011) 39. Y. Fan, C. Liu, Solving stochastic transportation network protection problems using the progressive hedging-based method. Netw. Spat. Econ. 10(2), 193–208 (2010) 40. W.E. Hart, C.D. Laird, J.-P. Watson, D.L. Woodruff, G.A. Hackebeil, B.L. Nicholson, J.D. Siirola, Pyomo–optimization modeling in python, vol. 67, 2nd edn. (Springer, Berlin, 2017)

Index

A AC OPA, 40, 45 Algorithm, 262, 263, 271 Alternating direction method of multipliers (ADMMs), 271 Architecture, 153, 154, 157, 166, 176 Attack detection, 202, 206, 209–211 Attack scenarios, 160, 162, 163 Average voltage, 201, 203, 204

B Bilinear, 57, 58 Blackout model, 31 Branching process, 31, 44, 217, 221, 224, 225, 227–229, 233, 237

C Cascades, 31, 32, 36–38, 40–45, 218–220, 223–230, 232, 239, 241–247, 250–252, 255 Cascading failure, 4, 17, 22, 23, 26, 237, 239, 250, 252 Cascading failure model, 3, 4, 16, 17 Cascading failure simulation, 31, 46, 244, 245 Circuit breaker, 183 Communication, 153, 155, 156, 161, 166, 170, 171, 201, 203 Cubature Kalman filter (CKF), 99, 107, 111, 113, 114, 116, 117, 120, 121 Communication network, 201, 203, 204, 208 Computational efficiency, 57, 63, 64 Convex optimization, 270, 271, 279 Coupled interaction matrix, 237, 239–241, 245, 250

Coupled interaction model, 238, 245, 246, 251, 252, 257 Couplings, 262, 268 Criticality, 218, 219, 227, 228, 233 Critical links, 238, 249, 252–254 Curse of dimensionality, 263 Cyber attacks, 99, 100, 102–104, 111, 118, 121, 154, 157, 159–162, 164, 165, 171, 173–176, 181, 182 Cyber-physical power system, 154, 160 Cybersecurity, 72, 78, 154, 157–159, 161, 163, 166, 168, 173, 174

D Data integrity attack, 99, 103, 113, 116, 119 Decision-making, 31 Decomposition, 55, 57, 264, 271, 274, 278 Deep learning, 202, 206, 207, 210 Denial of Service (DoS), 103, 111, 113, 115 Design principles, 158, 167 Detection filter, 81, 84, 93 Discretization units, 242–244, 250 Discretized load shed, 238–243, 245, 246, 251, 257 Distributed control, 201, 203–205, 208, 209 Distributed energy resources (DER), 153–157, 159–172, 174–176 Distributed generators (DG), 181, 182, 184–186, 195, 201–203, 205, 206, 209 Distributed load sharing, 183, 187, 190 Dynamic line rating, 4, 9, 10, 18, 19, 21, 24, 26 Dynamic observers, 99, 107, 121 Dynamic state estimation (DSE), 71, 79, 89, 92, 93, 99, 101, 103–107, 110–112, 119, 120

© Springer Nature Switzerland AG 2023 J. Qi, Smart Grid Resilience, https://doi.org/10.1007/978-3-031-29290-3

283

284 E Electric vehicles, (EVs), 261, 263, 264, 267, 269 EM algorithm, 31, 33–36, 40–43, 220, 226–228, 233, 237, 241, 244, 250–252 Emergency response, 47–49, 52, 53, 55, 59, 60, 62, 63 Equilibrium, 261–264, 269–271, 274, 275, 279 Expectation Maximization, 218 Extended Kalman filter (EKF), 99, 105–107, 111, 113, 114, 116, 117, 120, 121 Extreme weather, 3 F False data injection (FDI), 99, 207 FDI attacks, 181, 187, 190, 192, 194, 202 Framework, 48–50, 55, 267, 269, 274, 276 Frequency, 202, 203, 206, 209 G Generator re-dispatch, 48, 59, 61 Greedy algorithm, 147 Greedy heuristic algorithm, 139, 140, 142–147 H Heuristic algorithm, 125, 139, 142, 145

I Independent system operator (ISO), 264, 266, 267, 269, 274 Integer linear programming (ILP), 125, 131, 133, 138–140, 142–147 Integrated resilience response, 48, 49, 58, 60, 62–64 Interaction estimation, 31, 40–45 Interaction model, 31, 43, 45 Interactions, 263, 270 Interdependency, 218, 237, 239, 263, 277 Internet of Things (IoT), 262 Inverter, 182, 184, 186 Isolated buses, 256 J Joint distribution, 219, 223–226, 229, 230, 233

K Kalman filters, 99, 104, 112 Key components, 31, 32, 41–43 Key links, 31, 32, 41, 42, 44

Index L Lagrangian function, 203, 204 Lagrange-Good inversion, 218, 221 Largest eigenvalue, 218, 219, 227, 228, 233 Linear matrix inequality (LMI), 81, 91, 95 Line outages, 237, 239–241, 243, 245, 246, 249–251, 254–256 Load shed, 237–239, 241, 243–245, 247–252, 254–256 Load shedding, 48, 50, 51, 53, 59

M Mean, 239–241, 245, 247, 251, 257 Microgrids, 181, 182, 201–203 Mitigation, 31, 44, 45, 238, 239, 244, 246, 249, 254–256 Model uncertainty, 99, 100, 102, 105, 115, 116, 121 Multi-agent, 263, 269 Multi-type branching process, 218, 219, 226, 229, 230

N Natural disasters, 47–49, 51, 52, 55, 60, 62

O Observability, 71, 72, 80, 85, 86, 88, 89, 125, 127–130, 133, 134, 137–139, 141–144, 147 Offspring mean matrix, 218–221, 224, 226–228, 230, 233 Operator re-dispatch, 12, 18–22, 26 Optimal power flow, 262 Optimization, 261, 263, 266, 270 Outage propagation, 31, 40, 42–44, 217, 227

P Parallel computing, 264, 274, 278 Penetration, 153, 155, 157, 159, 164, 167 Phasor data concentrator (PDC), 103, 115, 125–128, 130, 131, 134–136, 139–142, 144–146 Phasor measurement unit (PMU), 71, 72, 74, 76, 78, 79, 81–83, 85, 86, 88–90, 92, 95, 99, 103, 104, 111, 113, 115, 125, 130–133, 135–137, 139–142, 145, 147 Photovoltaic (PV), 161, 164, 166, 167, 170, 172, 175 PMU network, 125–130, 137

Index Poisson distribution, 239–242, 245, 247, 251, 257 Power grid resilience, 47, 48, 50, 60–63 Power sharing, 201, 203, 205, 209 Power systems, 261, 263, 267, 274, 276, 277 Preventive response, 47, 49–51, 53, 59, 60, 62, 63 Primal-dual gradient, 202, 205

R Renewable generators, 266, 275 Replay attack, 103, 104, 111, 115 Resilience metrics, 158 Risk mitigation, 71, 72, 81, 82, 84–88, 93

S Secondary control, 202, 203 Security, 153, 156–159, 163, 167–169, 174 Self-healing, 125, 126, 128, 129, 131, 133, 137–139, 142, 144, 146 Situational awareness, 48 Sliding-mode observers (SMOs), 79–85, 87, 89–91, 93, 95 Smart grid, 263 Software-defined networking (SDN), 125–128, 130, 137, 142 Stability, 182, 189, 190 Stable region, 182, 191 Static state estimation (SSE), 99 Supervisory control and data acquisition (SCADA), 49

285 T Temperature disturbance, 3, 4, 15, 17, 18, 21, 24–26 Threats, 153, 157, 161, 166, 173 Topology switching, 48, 50, 53, 59, 60, 62–64 Transportation network, 261, 268, 276, 277

U Uncertainties, 262, 265, 271 Undervoltage load shedding, 12 Unknown inputs, 71, 72, 76, 77, 82 Unscented Kalman filter (UKF), 99, 105–107, 111, 121 Utilization level, 182, 186, 189, 190

V Variance, 243, 257 Voltage, 202, 203, 205–208 instability, 15 regulation, 201

W Wide-area measurement systems (WAMS), 49, 78 Wide-area monitoring, protection, and control (WAMPAC), 78 Wide-area monitoring,protection,and control (WAMPAC), 103