Data Science and Applications for Modern Power Systems 3031290992, 9783031290992

This book offers a comprehensive collection of research articles that utilize data―in particular large data sets―in mode

591 123 17MB

English Pages 445 [446] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Data Science and Applications for Modern Power Systems
 3031290992, 9783031290992

Table of contents :
Foreword
Preface
Acknowledgements
Contents
1 Data Perspective on Power Systems
1.1 What Is an Electric Grid?
1.2 A Data-Driven Perspective on Grid
1.2.1 Data-Driven Modeling and Monitoring
1.2.2 Data-Driven Control
1.2.3 Data-Driven Planning
1.3 Grid Data Availability
1.3.1 Outline of This Book
2 Basics of Power Systems
2.1 Participants
2.1.1 Generators
2.1.2 Prosumers
2.1.3 Aggregators
2.1.4 Utilities
2.1.5 System Operators
2.2 Flow of Power
2.3 Flow of Money
2.4 Flow of Information
2.4.1 Big Data Era
2.4.2 Challenges in Big Data Analytics
2.4.3 Look into the Future
3 Emerging Technology for Distributed Energy Resources
3.1 Distributed PV Generation
3.1.1 Introduction
3.1.2 Results
3.1.2.1 Scalable Deep Learning Model for Solar Panel Identification
3.1.2.2 Nationwide Solar Installation Database
3.1.2.3 Correlation Between Solar Deployment and Environmental/Socioeconomic Factors
3.1.2.4 Predictive Solar Deployment Model
3.1.3 Discussion
3.1.4 Experimental Procedures
3.1.4.1 Massive Satellite Imagery Dataset
3.1.4.2 System Detection Using Image Classification
3.1.4.3 Size Estimation Using Semi-supervised Segmentation
3.1.4.4 Distinguish Between Residential and Non-residential Solar
3.1.4.5 Predictive Solar Deployment Models
3.2 The Impact of Electric Vehicle Penetration
3.2.1 Introduction
3.2.2 The Impact of EV Charging Locations to the Power Grid
3.2.2.1 The Benefits of Problem Convexification
3.2.2.2 Sensitivity Analysis for the Optimization Variables
3.2.2.3 Sensitivity Analysis for Different Cost Components
3.2.3 The Impact of Choosing EV Routes for Charging
3.2.3.1 Numerical Results on Different EV Routes
3.2.3.2 Numerical Results on EV Numbers and Charging Time
3.2.4 Conclusion
4 Adapt Load Behavior as Technology Agnostic Solution
4.1 Consumer Segmentation
4.1.1 Introduction
4.1.1.1 Prior Work
4.1.2 Methodology
4.1.2.1 Total Daily Consumption Characterization
4.1.2.2 Encoding System Based on a Preprocessed Dictionary
4.1.2.3 Adaptive K-Means on Normalized Data
4.1.2.4 Hierarchical Clustering
4.1.3 Experiments on Data
4.1.3.1 Description of Smart Meter Data
4.1.3.2 Dictionary Generation on Real Usage Data
4.1.3.3 Dictionary Reduction via Hierarchical Clustering
4.1.3.4 Load Shape Analysis
4.1.4 Segmentation Analysis
4.1.4.1 Entropy Analysis
4.1.4.2 Shape Analysis
4.1.4.3 Multidimensional Segmentation
4.1.4.4 Spatial Locality Analysis
4.1.4.5 Temporal Locality Analysis
4.1.5 Impacts on Load Forecasting
4.1.6 Conclusion and Future Work
4.2 Consumer Targeting
4.2.1 Introduction
4.2.2 Methodology
4.2.2.1 Maximizing Demand Response Reliability
4.2.2.2 Response Modeling
4.2.3 Algorithm
4.2.3.1 Optimization Problem Transformation
4.2.3.2 Previous Approaches to Solve the SKP
4.2.3.3 Stochastic Knapsack Problem-Solving
4.2.4 Experiment on Data
4.2.4.1 Description of Data
4.2.4.2 Consumption Model Fitting Result
4.2.4.3 Targeting Result Analysis
4.2.5 Conclusion
4.3 Demand Response
4.3.1 Introduction
4.3.2 Probabilistic Baseline Estimation for Residential Customers
4.3.3 Probabilistic Baseline Estimation via Gaussian Process Regression
4.3.4 Feature Extraction: Covariance Function Design
4.3.4.1 Embedding Distance-Based Correlation
4.3.4.2 Embedding Periodic Pattern
4.3.4.3 Embedding Piecewise Linear Pattern in Temperature
4.3.4.4 Embedding More Functions
4.3.5 Utilizing Probabilistic Estimate for Fair Payment to Residential Customers
4.3.6 Simulation Result
4.3.6.1 Improved Daily Accuracy Without Day Aggregation
4.3.6.2 Reduced Relative Confidence Intervals with Day Aggregation
4.3.6.3 Reduced Relative Error with Day Aggregation
4.3.6.4 Computational Time
4.3.7 Conclusion
4.4 Energy Coupon as Demand Response
4.4.1 Introduction
4.4.2 System Overview
4.4.3 Experimental Algorithms
4.4.3.1 Price Prediction
4.4.3.2 Baseline Estimate
4.4.3.3 Individualized Target Setting and Coupon Generation
4.4.3.4 Lottery Algorithms
4.4.4 Experimental Design
4.4.4.1 Brief Summary of Experiment ('16)
4.4.4.2 Subject in Experiment ('17)
4.4.4.3 Procedure in Experiment ('17)
4.4.5 Results and Discussion
4.4.5.1 Energy-Saving for the Treatment Group
4.4.5.2 Comparison Between Active and Inactive Subjects in Treatment Group
4.4.5.3 Comparison Between Subjects in Treatment Group Facing Fixed/Dynamic Coupons
4.4.5.4 Financial Benefit Analysis
4.4.5.5 Influence of the Lottery on Human Behavior
4.4.5.6 Comparison with Previous CPP Experiment
4.4.5.7 Cost-Saving Decomposition
4.4.6 Conclusion
5 Use of Energy Storage as a Means of Managing Variability
5.1 Adding Storage to the Mix
5.1.1 Introduction
5.1.2 Formulation
5.1.2.1 Solar Generation
5.1.2.2 Load
5.1.2.3 Storage
5.1.2.4 Reliability Value
5.1.3 Optimal Investment Problem
5.1.4 Main Results
5.1.4.1 Reliability Value and Optimal Investment Decision
5.1.4.2 Example: Deterministic Case
5.1.5 Case Studies
5.1.5.1 A Benchmark Model
5.1.5.2 Data Description
5.1.5.3 Theoretical Estimates
5.1.5.4 Realistic Results
5.1.5.5 Optimal Investment Decision
5.1.5.6 Discussions
5.1.6 Conclusion
5.2 Long-Term Planning via Scenario Approach
5.2.1 Introduction
5.2.2 Probabilistic Storage Planning
5.2.2.1 Deterministic Storage Planning
5.2.2.2 Storage Planning with Probabilistic Guarantees
5.2.2.3 Structure of the Storage Planning Problem
5.2.3 Solving Probabilistic Storage Planning
5.2.3.1 Introduction to the Scenario Approach
5.2.3.2 Solving Probabilistic Storage Planning via the Scenario Approach
5.2.3.3 Sub-gradient Cutting-Plane Method
5.2.4 Case Study
5.2.4.1 Settings
5.2.4.2 Numerical Results
5.2.4.3 Discussions
5.2.5 Conclusion
5.3 Utility's Procurement with Storage and/or Demand Response
5.3.1 Introduction
5.3.2 Problem Formulation
5.3.2.1 System Model
5.3.2.2 Optimization Problem
5.3.2.3 Model-Based Solution for Benchmarking
5.3.2.4 Problem Statement
5.3.3 Model-Free Privacy-Preserving Optimization and Control Framework
5.3.3.1 Stage 1: Optimization
5.3.3.2 Stage 2: Private Control Implementation
5.3.4 Case Study
5.3.5 Conclusion and Future Work
6 Forecast for the Future
6.1 Forecasting
6.1.1 Introduction
6.1.2 Theoretical Analysis
6.1.2.1 Notations
6.1.2.2 Security-Constrained Economic Dispatch
6.1.2.3 SCED Analysis via MLP
6.1.2.4 An Illustrative Example
6.1.3 SPRs with Varying Parameters
6.1.3.1 Dynamic Line Rating
6.1.3.2 Ramping Constraints
6.1.4 A Data-Driven Approach to Identifying SPRs
6.1.4.1 The SPR Identification Problem
6.1.4.2 A Data-Driven Approach
6.1.5 Case Study
6.1.5.1 Performance Metrics
6.1.5.2 Static SCED with Static Line Ratings
6.1.5.3 Static SCED with Dynamic Line Ratings
6.1.5.4 Case Studies with Ramp Constraints
6.1.6 The Impact of Nodal Load Information
6.1.6.1 On Nodal Load Levels
6.1.6.2 Incomplete Load Information
6.1.7 Discussions
6.1.7.1 On Posterior Probabilities
6.1.7.2 On the Computational Cost
6.1.7.3 On Generation Offer Prices
6.1.7.4 LMPs with Loss Components
6.1.8 Conclusions
6.2 Price Prediction
6.2.1 Introduction
6.2.2 Problem Formulation
6.2.2.1 Direct Method (Price-to-Price Method)
6.2.2.2 Rerouted Method (Two-Stage Method)
6.2.3 Machine Learning Methods
6.2.3.1 Overview of Methods
6.2.3.2 Performance Evaluation Metric
6.2.4 Numerical Results
6.2.4.1 Data Preparation
6.2.4.2 Benchmark
6.2.5 Conclusion
6.3 Residential Appliances
6.3.1 Introduction
6.3.1.1 Related Work
6.3.1.2 Summary of Contributions
6.3.2 Appliance Load Characterization
6.3.2.1 Discrete Operating States
6.3.2.2 Duration Analysis
6.3.3 Hidden Semi-Markov Model
6.3.4 Appliance Load Model
6.3.4.1 Conditional HSMM
6.3.4.2 Parameter Estimation
6.3.4.3 State-Specific Model
6.3.4.4 Weighted Logistic Regression
6.3.5 Short-Term Load Forecasting
6.3.6 Case Studies
6.3.6.1 Data
6.3.6.2 Parameter Specification
6.3.6.3 Performance Metric
6.3.6.4 Load Forecasting for Individual Appliances
6.3.6.5 Load Aggregation and Model Refinements for A/Cs
6.3.6.6 Scalability and Performance
6.3.7 Conclusion
7 Design New Markets
7.1 Scenario-Based Stochastic Dispatch
7.1.1 Introduction
7.1.2 Taxonomy of Look-Ahead Economic Dispatch Under Uncertainty
7.1.2.1 Deterministic, Stochastic and Robust LAED
7.1.2.2 Scenario Approach LAED
7.1.3 Computational Algorithm to Solve the Scenario Approach Economic Dispatch
7.1.3.1 The A Priori Scenario Approach Method
7.1.3.2 Sampling and Discarding Approach in Sc-LAED
7.1.3.3 The A Posteriori Scenario Approach Method
7.1.4 Case Study
7.1.4.1 Extreme Ramping Test: Scenario vs. Deterministic and Robust LAED
7.1.4.2 Risk and Complexity: Considering All Constraints in the Sc-LAED
7.1.5 Conclusion
7.2 ISO Dispatch
7.2.1 Introduction
7.2.2 Problem Formulation
7.2.2.1 DRP as a Supplier in Day-Ahead Market
7.2.2.2 Decision Curve for DRP
7.2.2.3 Uncertainty of DR
7.2.3 Economic Dispatch Methods in Day-Ahead Market
7.2.3.1 Deterministic Model
7.2.3.2 Stochastic Model
7.2.3.3 Robust Model
7.2.3.4 Scenario Approach Model
7.2.3.5 Realization Cost
7.2.4 Numerical Examples
7.2.4.1 3-Bus System with One DRP
7.2.4.2 Simulation Results for Economic Dispatch
7.2.4.3 Trade-Off Between Feasibility and Performance
7.2.4.4 Influence of δ on DR Acceptance
7.2.4.5 IEEE 14-Bus System with Two DRPs
7.2.5 Conclusion
8 Streaming Monitoring and Control for Real-Time Grid Operation
8.1 Learning the Network
8.1.1 Introduction
8.1.2 Probabilistic Modeling of Network Voltages via Graphical Modeling
8.1.2.1 Problem Definition
8.1.3 Mutual Information-Based Algorithm for Distribution Grids
8.1.3.1 Why Mutual Information-Based Algorithm Works?
8.1.3.2 Adaptation for Distribution Grid with a Loop
8.1.3.3 Adaptation for Smart Meter with Voltage Magnitude Data
8.1.3.4 Limitations of the Method
8.1.4 Simulations
8.1.4.1 Tree Networks without DERs
8.1.4.2 Tree Networks with DERs
8.1.4.3 Networks with a Loop
8.1.4.4 Algorithm Sensitivities
8.1.5 Conclusion
8.2 State Estimation of the Steady-State
8.2.1 Introduction
8.2.2 Graphical Modeling
8.2.3 Distributed Joint State Estimation
8.2.3.1 An Objective Prior Probability
8.2.3.2 Embedding Physical Laws in the Conditional Probability
8.2.3.3 Marginalization for Interested State in Tree-Structured Networks
8.2.3.4 From Tree Structure for Distribution Grids to Mesh Structure for Transmission Grids
8.2.3.5 Improvement over Convergence, Optimality, and Memory Requirement
8.2.3.6 Algorithm Summary
8.2.4 Illustration Using an Example
8.2.5 Numerical Results
8.2.6 Error Domain Comparison Based on Mean Estimate
8.2.7 Variance Estimate
8.2.8 Computational Cost
8.2.8.1 Improvement over Convergence, Optimality, and Memory
8.2.8.2 The Impact of PMU Measurements
8.2.9 Conclusion and Future Research
8.3 Voltage Regulation Based on RL
8.3.1 Introduction
8.3.2 Preliminaries
8.3.3 Markov Decision Process and Reinforcement Learning
8.3.4 Voltage Regulation as an RL Problem
8.3.4.1 State Space
8.3.4.2 Action Space
8.3.4.3 Transition Model
8.3.4.4 Reward Function
8.3.5 Control Policy Architecture and Optimization
8.3.6 Numerical Simulation
8.3.6.1 Simulation Setup
8.3.6.2 Case Study on a Smaller (16-bus) Subsystem
8.3.6.3 Case Study on a Larger (194-bus) Subsystem
8.3.7 Conclusion
9 Using PMU Data for Anomaly Detection and Localization
9.1 Dynamics from PMU
9.1.1 Introduction
9.1.2 Linear Analysis of Synchrophasor Dimensionality
9.1.3 Online Event Detection Using PMU Data
9.1.3.1 Adaptive Training
9.1.3.2 Robust Online Monitoring
9.1.4 Numerical Examples
9.1.4.1 Dimensionality Reduction of Synchrophasor Data
9.1.4.2 Dimensionality Reduction of Realistic Texas Data
9.1.4.3 Online Event Detection Using the Early Event Detection Algorithm
9.1.5 Conclusion
9.2 Asset Management
9.2.1 Introduction
9.2.2 Localization of Forced Oscillations and Challenges
9.2.2.1 Mathematical Interpretation
9.2.2.2 Main Challenges of Pinpointing the Sources of Forced Oscillation
9.2.3 Problem Formulation and Proposed Methodology
9.2.3.1 Problem Formulation
9.2.3.2 FO Localization Algorithm for Real-Time Operation
9.2.4 Theoretical Interpretation of the RPCA-Based Algorithm
9.2.4.1 PMU Measurement Decomposition
9.2.4.2 Observations on the Resonance Component and the Resonance-Free Component
9.2.4.3 Low-Rank Nature of Resonance Component Matrix
9.2.5 Case Study
9.2.5.1 Performance Evaluation of the Localization Algorithms in Benchmark Systems
9.2.5.2 Algorithm Robustness
9.2.5.3 Impact of Noise on Algorithm Performance
9.2.5.4 Comparison with Energy-Based Localization Method
9.2.6 Conclusion
References
Index

Citation preview

Power Electronics and Power Systems

Le Xie Yang Weng Ram Rajagopal

Data Science and Applications for Modern Power Systems

Power Electronics and Power Systems Series Editors Joe H. Chow

, Rensselaer Polytechnic Institute, Troy, NY, USA

Alex M. Stankovic, Tufts University, Medford, MA, USA David J. Hill, Department of Electrical and Electronics Engineering, University of Hong Kong, Pok Fu Lam, Hong Kong

The Power Electronics and Power Systems book series encompasses power electronics, electric power restructuring, and holistic coverage of power systems. The series comprises advanced textbooks, state-of-the-art titles, research monographs, professional books, and reference works related to the areas of electric power transmission and distribution, energy markets and regulation, electronic devices, electric machines and drives, computational techniques, and power converters and inverters. The series features leading international scholars and researchers within authored books and edited compilations. All titles are peer reviewed prior to publication to ensure the highest quality content. To inquire about contributing to the series, please contact: Dr. Joe Chow Administrative Dean of the College of Engineering and Professor of Electrical, Computer and Systems Engineering Rensselaer Polytechnic Institute Jonsson Engineering Center, Office 7012 110 8th Street Troy, NY USA Tel: 518-276-6374 [email protected]

Le Xie • Yang Weng • Ram Rajagopal

Data Science and Applications for Modern Power Systems

Le Xie Texas A&M University College Station, TX, USA

Yang Weng Arizona State University Tempe, AZ, USA

Ram Rajagopal Stanford University Stanford, CA, USA

ISSN 2196-3185 ISSN 2196-3193 (electronic) Power Electronics and Power Systems ISBN 978-3-031-29099-2 ISBN 978-3-031-29100-5 (eBook) https://doi.org/10.1007/978-3-031-29100-5 © Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Foreword

Power system engineers have collected data, some continuously and some ondemand, since the early days of electrification. In the past, the main purpose of collecting data has been to support reliable system operation and verify power system models and parameters based on physical phenomena. Such data are for refining the models of large equipment such as a steam turbine generator and for validating transmission system data used in state estimation. A more accurate and realistic model will improve the fidelity of dynamic simulation programs and stability analysis. With the availability of new types of data and fast internet connection, the data can now be used to go beyond model-based approaches. Fifteen-minute data from smart meters installed at individual homes can be used to make demand response programs more responsive and rewarding for participants, without requiring physical models. High-sampling rate phasor measurement data synchronized by GPS signals can be used in power system monitoring such as online event detection and localization of forced oscillations. Professors Le Xie, Yang Meng, and Ram Rajagopal have made pioneering contributions in the power system data analytics research and have highlighted in this monograph their results in applying data science techniques to address uses of new data in modern power systems. They discuss a variety of issues including distributed energy resources, demand response, energy storage, load and energy price forecasting, and synchrophasor phasor data streaming. While these topics are addressed in many research monographs, the distinguishing feature here is the use of appropriate data science techniques to segregate disparate data and end-users to create meaningful subsets or clusters. Such groupings offer insights and paths to develop more effective and reliable control actions. To address the broad spectrum of power system data and problems, the authors have introduced a large variety of analytical and machine learning techniques in this monograph. Readers may find the discussion of their approaches and the lessons obtained from their results useful in their own research endeavor. Troy, NY, USA October 16, 2022

Joe H. Chow

v

Preface

Measurement data has been with the operation and planning of large power grid since its inception. However, it was really since early 2010s that the confluence of massive amount of new data sets, proliferation of advanced computing capabilities, and tremendous progress in machine learning technologies that propelled the rapid development of data sciences in power systems. It was right around mid-2010s that this concept of a holistic monograph to treat the issues of data sciences in modern power system was conceived. The objective of this book is two-fold. First, it provides new graduate-level students in power and energy systems a data science perspective on the applications that they might be working on for their graduate study. Second, and perhaps more importantly, it aims at providing experts well versed in data sciences a quick entry to power system problem formulations at multiple voltage levels. The organization of this book strives to provide both perspectives as articulated above. Chapter 1 provides a data perspective of the power system overview. It introduces the change from the traditional electric grid to the modern power grid. Chapter 2 describes how power, money, and information flow in a modern electricity system. Nowadays, the power system is becoming more and more complex, with various participants, including generators, prosumers, aggregators, utilities, and system operators. Chapter 3 focuses on renewable energy, including photovoltaic (PV) generation and electric vehicle (EV) charging. With the increase in installations of residential PV systems and EVs, utilities need to gain visibility of the PV/EV systems. It provides a review of the emerging technologies that are transforming the operation of the electric grid. Chapter 4 talks about the electric market, mainly giving the AI perspective on the relationship between the utilities and the customers. This chapter covers the topic from customer segmentation and targeting, theoretical analysis of demand response, to the specific program application of the demand response. Chapter 5 presents deterministic and stochastic solutions for energy planning storage. Additionally, by taking into account both the variability of renewables and many elements in which a system operator is interested, we broaden the scope to vii

viii

Preface

have a probabilistic with a flexible two-stage optimization to demonstrate a diverse optimization framework. Chapter 6 presents how AI works for forecasting tasks in managing the future grid, which is one of the critical operational difficulties in the power system. This chapter discusses data analytical techniques for predicting locational marginal price, wind energy price, and residential appliance loads. Chapter 7 uses two dispatch examples to show how data analytics can solve technical challenges and provide situational awareness for the future. Chapter 8 presents the monitoring and controlling of the power system operation. This chapter provides the power grid’s system modeling and state estimating methods. Moreover, this chapter also builds a reinforcement learning framework for the challenges of controlling. Chapter 9 discusses the use of the large-scale data obtained from the Phasor Measurement Units (PMUs) devices. The discussions are based on the two use cases: early event detection and locating the forced oscillation. During the preparation of this book, we have benefited tremendously the support and technical feedback from a large array of collaborators. We would like to thank Jiafan Yu, Zhecheng Wang, Arun Majumdar, Jungsuk Kwac, June Flora, Hao Ming, Bainan Xia, Ki-Yeob Lee, Adekunle Adepoju, Srinivas Shakkottai, Jianxiao Wang, Junjie Qin, Haiwang Zhong, Qing Xia, Chongqing Kang, Chao Yan, Xinbo Geng, Zhaohong Bie, S. Sivaranjani, P. R. Kumar, Shuman Luo, Yuting Ji, Elizabeth Buechler, Mohammad Sadegh Modarresi, Marco Claudio Campi, Simone Garatti, Algo Care, Anupam A. Thatte, Marco Campi, Yizheng Liao, Rohit Negi, Marija D. Ilic, Rayan El Helou, Dileep Kalathil, Yang Chen, Tong Huang, and Nikolaos M. Freris. We thank Julie Castro for proofreading this book. We thank Springer Series Editor Prof. Joe H. Chow for his support and patience during the writing of this book. We thank the National Science Foundation and Department of Energy for their generous support of our research that is summarized in this book. Last but not least, we thank our family for their sacrifice and love while we focus on the writing of this book. College Station, TX, USA Tempe, AZ, USA Stanford, CA, USA November, 2022

Le Xie Yang Weng Ram Rajagopal

Acknowledgements

First of all, I would like to thank Jiaqi Wu and Muhammad Bilal Saleem, who have provided help with preparation of figures and typesetting. Also, some materials of this book are based on articles we have written. Many people have helped by proofreading draft material and providing comments and suggestions. We would like to thank the people whose technical contributions are directly reflected within the chapters. Their assistance has been invaluable. For Chap. 3, we would like to thank Jiafan Yu, Zhecheng Wang, and Arun Majumdar. For Chap. 4, we would like to thank Jungsuk Kwac, June Flora, Jiafan Yu, Hao Ming, Bainan Xia, Ki-Yeob Lee, Adekunle Adepoju, and Srinivas Shakkottai. For Chap. 5, we would like to thank Jianxiao Wang, Junjie Qin, Haiwang Zhong, Qing Xia, Chongqing Kang, Chao Yan, Xinbo Geng, Zhaohong Bie, S. Sivaranjani, and P. R. Kumar. For Chap. 6, we would like to thank Xinbo Geng, Shuman Luo, Yuting Ji, and Elizabeth Buechler. For Chap. 7, we would like to thank Mohammad Sadegh Modarresi, Marco Claudio Campi, Simone Garatti, Algo Care, Anupam A. Thatte, P. R. Kumar, Hao Ming, and Marco Campi. For Chap. 8, we would like to thank Yizheng Liao, Rohit Negi, Marija D. Ilic, Rayan El Helou, and Dileep Kalathil. For Chap. 9, we would like to thank Yang Chen, P. R. Kumar, Tong Huang, and Nikolaos M. Freris. Finally, we would like to thank Julie Castro for proofreading this book.

ix

Contents

1

Data Perspective on Power Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 What Is an Electric Grid? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 A Data-Driven Perspective on Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Data-Driven Modeling and Monitoring. . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Data-Driven Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Data-Driven Planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Grid Data Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Outline of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 2 2 3 3 3 5

2

Basics of Power Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Prosumers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Aggregators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.5 System Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Flow of Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Flow of Money . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Flow of Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Big Data Era . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Challenges in Big Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Look into the Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7 7 7 8 9 9 10 10 12 13 13 14 16

3

Emerging Technology for Distributed Energy Resources . . . . . . . . . . . . . . . 3.1 Distributed PV Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.4 Experimental Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Impact of Electric Vehicle Penetration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 The Impact of EV Charging Locations to the Power Grid . . . .

17 17 17 18 24 26 29 29 31 xi

xii

Contents

3.2.3 3.2.4 4

5

The Impact of Choosing EV Routes for Charging . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39 44

Adapt Load Behavior as Technology Agnostic Solution . . . . . . . . . . . . . . . . . 4.1 Consumer Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Experiments on Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.4 Segmentation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.5 Impacts on Load Forecasting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Consumer Targeting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Experiment on Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Demand Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Probabilistic Baseline Estimation for Residential Customers 4.3.3 Probabilistic Baseline Estimation via Gaussian Process Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Feature Extraction: Covariance Function Design . . . . . . . . . . . . . 4.3.5 Utilizing Probabilistic Estimate for Fair Payment to Residential Customers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.6 Simulation Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Energy Coupon as Demand Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Experimental Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45 45 45 47 52 57 68 68 69 69 71 75 79 85 86 86 88

95 96 100 100 101 104 106 112 114 124

Use of Energy Storage as a Means of Managing Variability . . . . . . . . . . . . 5.1 Adding Storage to the Mix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.3 Optimal Investment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.4 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.5 Case Studies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Long-Term Planning via Scenario Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Probabilistic Storage Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

127 127 127 130 133 134 136 145 146 146 148

90 92

Contents

xiii

5.2.3 Solving Probabilistic Storage Planning . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Utility’s Procurement with Storage and/or Demand Response . . . . . . . 5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Model-Free Privacy-Preserving Optimization and Control Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

151 155 159 160 160 162

6

Forecast for the Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.3 SPRs with Varying Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.4 A Data-Driven Approach to Identifying SPRs . . . . . . . . . . . . . . . . 6.1.5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.6 The Impact of Nodal Load Information . . . . . . . . . . . . . . . . . . . . . . . 6.1.7 Discussions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Price Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Machine Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Residential Appliances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Appliance Load Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Hidden Semi-Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.4 Appliance Load Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.5 Short-Term Load Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.6 Case Studies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

173 173 173 175 179 183 188 194 200 203 203 204 206 208 212 221 222 222 225 227 229 233 235 241

7

Design New Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Scenario-Based Stochastic Dispatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 Taxonomy of Look-Ahead Economic Dispatch Under Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.3 Computational Algorithm to Solve the Scenario Approach Economic Dispatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

243 243 243

165 169 172

246 250 257 261

xiv

8

9

Contents

7.2 ISO Dispatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Economic Dispatch Methods in Day-Ahead Market . . . . . . . . . . 7.2.4 Numerical Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

262 262 263 268 274 281

Streaming Monitoring and Control for Real-Time Grid Operation . . . 8.1 Learning the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.2 Probabilistic Modeling of Network Voltages via Graphical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.3 Mutual Information-Based Algorithm for Distribution Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.4 Simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 State Estimation of the Steady-State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Graphical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3 Distributed Joint State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.4 Illustration Using an Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.5 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.6 Error Domain Comparison Based on Mean Estimate . . . . . . . . . 8.2.7 Variance Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.8 Computational Cost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.9 Conclusion and Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Voltage Regulation Based on RL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.3 Markov Decision Process and Reinforcement Learning . . . . . . 8.3.4 Voltage Regulation as an RL Problem . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.5 Control Policy Architecture and Optimization . . . . . . . . . . . . . . . . 8.3.6 Numerical Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

283 283 283

Using PMU Data for Anomaly Detection and Localization . . . . . . . . . . . . . 9.1 Dynamics from PMU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.2 Linear Analysis of Synchrophasor Dimensionality . . . . . . . . . . . 9.1.3 Online Event Detection Using PMU Data . . . . . . . . . . . . . . . . . . . . . 9.1.4 Numerical Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Asset Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Localization of Forced Oscillations and Challenges . . . . . . . . . . 9.2.3 Problem Formulation and Proposed Methodology . . . . . . . . . . . .

351 351 351 353 355 361 375 376 379 381 384

286 288 296 308 309 309 312 315 323 325 326 327 328 331 331 332 335 338 340 342 344 350

Contents

xv

9.2.4 9.2.5 9.2.6

Theoretical Interpretation of the RPCA-Based Algorithm. . . . 386 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435

Chapter 1

Data Perspective on Power Systems

1.1 What Is an Electric Grid? Electrification started from the late 1880s and has powered much of the human civilization in the past 150 years. The electric interconnection of North America is considered one of the greatest engineering achievements of the twentieth century according to the US National Academy of Engineering. The design and operation of the electric grid have evolved as a hierarchy of spatial and temporal decision-making processes. Rigorous physics-based modeling, analysis, and control have been developed to support the secure and efficient operation of this complex electric grid. It is expected that the electricity sector will be transformed in the coming decades due to three forces: decarbonization, electrification of other means of energy demand, and digitization. Between 2010 and 2018, the US power sector emission has been cut by 45%. This is largely driven by the supply of electricity transformed from fossil fuel to renewable energy resources [1]. In addition to the existing electrical demand, it is projected that a substantial amount of non-electrical demand such as transportation will be electrified in the next decade or so. As an important means of making the electric grid smarter, many more sensors, communication capabilities, and high-performance computing capabilities have been added to grid monitoring and control. Due to the massive secular changes in decarbonization, electrification, and digitization, the electric grid infrastructure would require a whole new paradigm of modeling, analysis, and control. We perceive such a paradigm as one of integration of data-driven and first principle-based analytics, as illustrated in Fig. 1.1.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Xie et al., Data Science and Applications for Modern Power Systems, Power Electronics and Power Systems, https://doi.org/10.1007/978-3-031-29100-5_1

1

2

1 Data Perspective on Power Systems

Point-on-Wave data Microseconds

Physicsbased & Human Behavioral Model

Synchrophasor data Milliseconds

Electromagnetic Transient Dynamics

Streaming Data

SCADA/AMI/Renewable/Market Data Seconds

Transient Machine Dynamics & Interaction

Minutes

Time Scale

Power flow with ramp constrains

Analytics Microseconds

Milliseconds

Seconds

Minutes

Time Scale

Fast-than-real-time Simulation [Zhang, Xie,

Anomaly Detection & Localization

Energy Coupon Retail Demand Response

[Xie, Kumar, et al, TPWRS 2020]

et al, TPD 2020]

Cyber Security Monitoring

[Zhong, Xie, et al, TPWRS 2013] DR Revenue Adequacy [Ming, Xie, TSG, 2019]

Nested RL-based Protective Relay in Solar-rich Grids [Xie,

[Huang, Xie, et al, TPWRS 2020]

Kalathil, et al, TPD 2020]

Scenario-based risk-tunable dispatch [Xie, Campi, et al, TPWRS 2019]

Electricity-as-a-service business model

Power Electronics Intelligence at the Grid Edge [Ramos-Ruiz, Xie,

Water-energy joint flexibility in nanogrids

JESTPE, 2020]

[Xie Chellam, et. al. Applied Energy 2020]

[Xie, et. al. Utility Policy 2020]

SCADA: Supervisory Control and Data Acquisition AMI: Advanced Metering Infrastructure L. Xie, “Data Technology (DT): The New Normal,” IEEE Power and Energy Magazine, May/June 2018

Fig. 1.1 Integrating data-driven and physics-based analytics for grid operation

1.2 A Data-Driven Perspective on Grid This book takes a departure from many classical books on power system analysis by highlighting a data-driven perspective on the operation and planning of the electric grid. These perspectives are, in our view, complementary to the classical physicsbased approach.

1.2.1 Data-Driven Modeling and Monitoring Modeling of electric energy conversion devices builds upon physical principles. Modeling the interconnection of these components is also based upon electrical circuit principles such as Kirchhoff’s laws. A data-driven modeling framework would start from the high-dimensional observation available for the timescale of interest (e.g., electromechanical oscillation at the timescale of seconds) and discover patterns of interest from both supervised and unsupervised manners. A variety of high-dimensional data science tools can be employed to identify patterns, detect anomalies, and localize sources of such anomalies.

1.3 Grid Data Availability

3

1.2.2 Data-Driven Control Typical power system control is designed and simulated based upon physical models of the apparatus and their interactions through the grid. Given the fact that the underlying dynamics that govern the interaction are nonlinear, typical control design and testing would be based on a linearized model of a given operating condition, with a certain operational margin reserved for the wide range of operating conditions. Given the increasing complexity, coupling, and nonlinearity of the large electric grid, data-driven control offers an alternative for a potentially scalable approach. Many recent successful examples in multi-agent reinforcement learning can lend themselves to a variety of control applications in the power grid.

1.2.3 Data-Driven Planning Compared to short-run operations, planning practice has been more dependent upon data analytics. Typical approaches would evaluate the longer-term growth of electric loads and conduct massive offline network expansion simulations to identify potential areas for transmission and generation expansion. A unique challenge that the future faces is driven by the decarbonization of energy sources and lies in how to use historical data to provide risk-tunable performance guarantees for longer-term capacity planning. The rise of high-performance computing, together with modern machine learning techniques, provides opportunities for formulating the planning problem as a multidimensional nonlinear optimization that can lend itself to many state-of-the-art analytics tools.

1.3 Grid Data Availability Power system behaviors fall into three categories according to their time range of response. The three categories are electromagnetic transient dynamics, electromechanical transient dynamics, and the quasi-steady-state behaviors. For system behaviors in different categories, the sampling rates of field measurements used for monitoring and analyzing them are different. In a 60-Hz power grid, field measurements used for monitoring electromagnetic transient dynamics have sampling rates typically ranging from 720 Hz to .30.72 kHz. These field measurements are called the Point-on-Wave (POW) data, and they can capture the extra-fast dynamics, such as lightning propagation and sub-synchronous resonance. The POW data are generally obtained by the digital fault recorders (DFRs), and they can be used to drive local decision-making processes, such as transmission/distribution line protection [2]. Until recently, additional communication functions have been equipped with DFRs, which enables the DFRs to transmit the time-stamped POW data to control centers in real time.

4

1 Data Perspective on Power Systems

The POW data can be leveraged to compute phasors, i.e., complex numbers that characterize the fundamental frequency components of an electrical measurement, e.g., instantaneous voltage/current. Phasors at different locations of the power grid are uploaded to the system control center in real time with a rate ranging from 30 to 120 Hz, and they are synchronized by the Global Positioning System (GPS). The real-time phasors are termed synchrophasors, and they are measured by phasor measurement units (PMUs). The synchrophasor data can be utilized to monitor some electromechanical phenomena in the power grid, such as natural/forced oscillations. Therefore, ongoing research aims to develop PMU-based decision-aid tools for some purposes, e.g., early event detection [3] and forced oscillation localization [4]. The quasi-steady-state behaviors, such as electric energy transactions, are sampled at slow rates ranging from every few seconds to every few minutes. For the transmission system-level energy transaction, the wide-area measurements are streamed to the control center every 2–4 seconds by the Supervisory Control and Data Acquisition (SCADA) system [5]. In current practice, the SCADA data are integrated into real-time energy management applications, such as grid visualization tools and automatic generation control (AGC). The energy transactions between the electric grid and customers are measured by the Advanced Metering Infrastructure (AMI). The AMI reports to electricity utilities data points of customers’ electricity usage every 15 minutes, mainly for billing purposes. Based on the AMI data, some promising human-in-the-loop mechanisms are designed to reshape the quasi-steadystate profile of the grid [6, 7]. Besides the field measurements on electrical variables, it is worth noting that some non-electrical data sampled at slow rates also participate in the daily operation of the power grid. The data of such a type includes information on the electricity market, weather, and Geographic Information System (GIS) (Fig. 1.2).

Space data set

wide-area (intra-enterprise) wide-area (intra-enterprise)

GIS market

regional

weather local

filed measurements milli seconds minutes hours seconds

Fig. 1.2 Available datasets in power systems

time

1.3 Grid Data Availability

5

1.3.1 Outline of This Book This book treats the market and physical operations of the evolving power system through the lens of data sciences. It starts with a data perspective of the power system overview in Chap. 1. This is followed by Chap. 2 which describes the basic flow of power and money in a modern electricity system. Chapter 3 provides a review of the emerging technologies that are transforming the operation of the electric grid. This is followed by Chaps. 4 and 5 in which demand flexibility and storage are presented as two crucial aspects of the solution for the evolving grid. Chapters 6 and 7 present the forecasting and market operational challenges and opportunities enabled by large datasets. Chapters 8 and 9 present physical operational challenges and opportunities driven by large amount of streaming datasets.

Chapter 2

Basics of Power Systems

2.1 Participants An electric power grid is a complex network composed of participants from generation, transmission, and distribution systems. During the power transfer process, a system operator works with utilities and aggregators to maintain the stability of the power grid and reduce economic losses and damages to electricity facilities. For example, the Supervisory Control and Data Acquisition (SCADA) systems are used by the participants to collect real-time data for monitoring performance operations like bus voltage control, load balancing, circulating current control, overload control, fault protection, and self-healing.

2.1.1 Generators Traditionally, power production is mainly carried out in power plants, where different energy resources are converted into the form of electricity due to its easy transmission between geographically faraway areas. For example, fossil, hydro-, and nuclear energies can be converted into electricity via propelling turbines for AC currents. Nowadays, utility records show that for past decades, numerous distributed power plants (wind, solar, etc.), primarily serve local areas. They are connected to the existing grid, and their presence has raised great concerns about possible reliability problems in the future electric power grids [8]. Therefore, the electric industry is undergoing structural changes as distributed energy resources (DERs) are integrated into the distribution grids. DERs are small power sources such as photovoltaic (PV) power devices (renewable generation), energy storage devices (consumption flexibility), and electric vehicles (vehicle-togrid services). The deployment of DERs is gaining momentum on a worldwide

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Xie et al., Data Science and Applications for Modern Power Systems, Power Electronics and Power Systems, https://doi.org/10.1007/978-3-031-29100-5_2

7

8

2 Basics of Power Systems

scale, as they have the potential to offer end-consumers more choices, cleaner power, and more control over energy bills. For example, providing sustainable and economical energy is one of the key missions for smart cities [9]. This is because the large-scale integration of distributed energy resources (DERs) not only creates more sustainable energy sources but also reduces the electricity cost and transmission loss. To achieve this goal, many cities have made their plans to integrate DERs to the grid, such as photovoltaics, electric vehicles, and energy storage devices. Examples include Smart Cities San Diego Project, Amsterdam Smart City Project, and the Zurich 2000 Watts Society. While DERs in principle offer substantial benefits in transition to a sustainable electric grid, successful and reliable integration of these energy resources poses fundamental challenges in system operations. On the generation side, the reverse power flow can render the existing protective systems inadequate. Without appropriate monitoring and control, even a small-scale DER integration could destabilize the local grid and cause reliability issues for customers [10]. On the demand side, the frequent plug-and-charge electric vehicles will impact the distribution grid power quality, such as voltage unbalance and transformer overload [8]. These uncertainties can lead to power outages or blackouts in distribution grids, which may cause a loss of thousands to millions of dollars within 1 hour [11]. Therefore, the robustness of these new architectures will have to be studied. Due to their unconventional characteristics, new modeling of these technologies must be done for data analytics and data-driven control in smart grids.

2.1.2 Prosumers An important difference made to traditional power generation is that it is converted into many small-scale generators in the distribution grids. As the power is generated locally, e.g., at one’s own house or nearby solar farms, customers become prosumers avoiding long-distance power transfer. Specifically, a prosumer can produce and consume power simultaneously. As a result of more prosumers in the field, the electricity grid blurs the lines between power generation and consumption. So far, there are six types of prosumers: Do It Yourself (DIY) prosumers, self-service prosumers, customizing prosumers, collaborative prosumers, monetized prosumers, and economic prosumers [12]. These prosumers expedite the trends for renewable energy to preserve the environment, provide more energy choices, support economic development, and drive technology advances. At the distribution level of the electric grid, the customer side has deeper penetration of customer-owned intermittent energy resources, such as photovoltaic (PV) panels, backup power storage batteries, and vehicle-to-grid demand response from electric vehicles. In addition to these changes, customers are also encouraged to participate in demand-side management programs to further improve sustainability. While there are many benefits, the proliferation of prosumers raises great concern about the resilience of the power grids. The unconventional characteristics of the

2.1 Participants

9

new grid make it difficult, if not impossible, for conventional power system analysis tools to monitor and control power flows in a perfect way. Dynamic fluctuations of voltage profiles, voltage stability, islanding, and the distribution system operating at stability boundaries are some of the troubling issues for distribution grid operation [13]. Therefore, highly accurate power state analytical tools at both transmission and distribution levels are critical for operational planning purposes to mitigate risks, e.g., avoid over-voltages in the near future.

2.1.3 Aggregators With new players like prosumers, it is important to see how these new individuals can participate in the market. Unfortunately, individuals are too small when compared to the traditional generation and accessory service providers. Therefore, the concept of aggregators becomes an important bridge for organizing individual consumers for new services and participating in the markets. Such aggregators are in the form of utilities, commercial companies, or community entities [14]. Specifically, aggregators serve as a broker to provide transactions between energy entities and a group of houses sharing the same interest [14]. For example, aggregators are coordinating energy flow and transactions with transmission system operators (TSOs) and distribution system operators (DSOs) to provide a better solution for grid congestion management. Additionally, aggregators are members of electricity systems. So, they can influence many grid-connected entities via new communication infrastructures, especially in the distribution grids [15]. Broadly speaking, aggregators can participate in various electricity networks to provide trading and ancillary services. For example, [16] and [17] show that aggregation of customers can make the reward more accurate for demand response problems, handling large numbers of users at the same time, and paying the individual demand response user over a longer period of time. In sum, by using a probabilistic estimate, smart grid stakeholders ranging from end user to system operators will be empowered to discover opportunities and risk reduction strategies based on our analysis.

2.1.4 Utilities The growing integration of distributed energy resources (DERs) in urban areas provides opportunities while raising various reliability issues. To ensure robust distribution grid operation, different types of utilities need to work together. In the United States, for example, there are generally three types of electric power utility ownership structures: public power utilities, rural electric cooperatives, and investorowned utilities (IOUs) [18]. For example, some cities receive power supplies from municipal utility companies, and customer-owned cooperatives provide power to

10

2 Basics of Power Systems

rural areas. But, the majority of electricity customers are served by investor-owned utilities [19]. Public utilities are not-for-profit and structured and regulated by the local government. Public utilities must ensure social fairness and service quality [20]. Therefore, one important focus of public utilities is to reduce the risk of insufficient service where a private entity has limited incentive and when it has an insufficient profit. Such private entities are called investor-owned utilities that operate as a forprofit company for the investors.

2.1.5 System Operators System operators have a critical role in utilities, as they operate the electric power grid reliably and efficiently in order to transmit electricity to both urban and suburban areas. System operators can have various roles, such as interchange operator, balancing operator, transmission operator, reliability coordinator, and market operators [21]. For example, the generation and transmission have certain working requirements for normal secure operation, beyond which power system stability might be violated. Nowadays, the grid exhibits vulnerability to renewable penetrations and disruptive events, such as blackouts. Interruptions of electricity service or extensive blackouts are known contributors to physical hardware damages, unexpected interruption of normal work, and subsequent economic loss [22]. So, it is the system operator’s responsibility to monitor and control the grid to avoid contingencies, blackouts, etc. [23, 24]. To do so, system operators typically use computer consoles in a control center and interact with field crews, general personnel, substation engineers, and neighboring utilities for close monitoring and coordination for reliable power delivery [21]. With structural changes nowadays, traditional monitoring methods need improvement to track and manage increasing uncertainties inherent in the new technologies, such as recent and ongoing massive penetration of renewable energy, distribution intelligence, and plug-in electric vehicles. Because of the unconventional characteristics of new technologies, system operators also need to quickly understand and utilize new tools, e.g., data analytics and machine learning tools for the robust operation of the smart grid.

2.2 Flow of Power With massive penetration of intermittent components such as green energy, traditional power grids nowadays find it stressful in dealing with numerous uncertainties in the way that power flows. Specifically, electric grids are undergoing profound changes globally, both in scale and in how the way power flows, to meet the rapidly

2.2 Flow of Power

11

increasing demand in electricity consumption. In 2013, the total global energy consumption was 12.7 billion tons of oil equivalent, with renewable resources accounting for 19% [25]. But, in 2030, it is projected that this will increase to 16 billion tons of oil equivalent with 77% from renewables [26]. During the changes, the pattern of traditional power flow, e.g., one way and stable, will no longer hold in the smart grid where intermittent generation (such as wind generation) or topological changes can lead to a significant state shift in power flow. But, reliable and efficient operation of the electric power system is critical, as each interruption of electricity service or a widespread blackout is known to cause a mix of physical hardware damage, unexpected interruption of normal work, and subsequent economic loss. For example, on the generation side, the reverse power flow can render the existing protective systems inadequate. Without appropriate monitoring and controls, even a small-scale DER integration can destabilize the local grid and cause reliability issues for customers. On the demand side, the frequent plug-and-play electric vehicles will impact the distribution grid’s power quality, such as voltage unbalance and transformer overload. Additional issues happen with frequent distribution grid reconfiguration, which is hard to detect based on traditional approaches. Wrong topology information causes wrong control signal, making the fast-changing smart grid prone to go over stability boundaries and collapse. While producing new opportunities, the largescale penetration of distributed generation (DG) is also posing new challenges. Different than a transmission power grid where the topology is with limited changes, a distribution grid can have regular topology changes due to the ad hoc connection of many plug-and-play components. Even worse, a distribution system operator usually lacks specific topology information, e.g., DG connection status, as many of the DERs do not belong to the utility. Incorrect topology estimates can cause critical issues such as a wrong calculation in the dangerous reversions of power flow, an incorrect description of fast dynamic variation of voltage profiles, line work hazards, etc. Figure 2.1 shows similar challenges in the distribution grid.

Fig. 2.1 The big data techniques used in the power flow analysis

12

2 Basics of Power Systems

2.3 Flow of Money There are different types of electricity markets that allow for money to flow between electricity entities. Let’s use US markets as examples. Most of the investor-owned electric utilities in the United States charge end user prices under the regulation of State Public Utilities Commission. These utilities can also operate in deregulated markets with the price set by the wholesale market. For instance, independent system operators (ISOs) encourage competition for electricity generation from the wholesale market participants [27]. The Federal Energy Regulatory Commission regulates transaction operation in most places within the United States [28]. Resale entities (also known as aggregators) buy electricity from the generation and turn around and sell the power to buyers, who then provide electricity to the end users. For example, if a source has a successful bid to provide generation for meeting demand, the market is “cleared.” In the retail market, many customers have some choices to find retailers that fit customers’ needs. In addition to existing money flow, the rapidly expanding smart grid brings consideration of new ways to flow money between different market stakeholders. This is due to increasing distributed energy resources (DERs), such as photovoltaic and storage devices, which are rapidly integrated into the power grid for renewable generation for services. Many states have policies to channel the financial flow in place that promotes a long-term transition to cleaner renewable sources of energy, like wind and solar power. Obviously, this is because generating power inside the distribution power grid cannot only create more sustainable energy sources but can also create cheaper electricity and reduce losses due to the shortened path between generation and the end-consumer, etc. As renewable generators become a larger portion of the grid’s resources, complications may arise with the existing wholesale market structure in deregulated states. Renewable energy sources do not require fuel inputs to run since they use energy from the sun, wind, and other natural sources. Consequently, they can offer bids of $0 into the energy and capacity markets [19]. As these sources make up a larger portion of the grid over time, these $0 bids can significantly reduce wholesale prices for energy and capacity and could discourage long-term investment for all resources [19]. As a result, wholesale markets may need to adapt in the future to better accommodate different types of resources. The market needs to introduce new tools to flow the money to the right entities and deal with increasing uncertainty in smart grid monitoring concurrent with the penetration. One suggested approach is to have storage devices and standby coal generators for smoothing out power fluctuation. However, these devices alone are currently too expensive to be economically feasible for large-scale renewable penetration. So, we need a new market with new services to provide cost-effective solutions. For example, demand response is a recent technology aimed at utilizing flexible loads to operate power systems in an economically efficient way [29]. Successful enrollment of customers in intervention-based demand-side energy management (DSM) programs, such as energy efficiency and installation of PV panels, depends on having accurate

2.4 Flow of Information

13

estimates of the benefits of these programs available and communicated to the customers. The program benefits may include long-term financial savings and their contribution to managing supply and demand in transition to a sustainable grid. For customers to commit to an intervention program such as installing rooftop PV panels or participating in a long-term energy efficiency program, a key factor is to provide them with accurate estimates of the potential short-term and long-term benefits.

2.4 Flow of Information The success of the ongoing evolution of today’s electric power grids into smart grids greatly depends on good knowledge of data as conditions vary. For managing the transmission systems, engineers started to use Supervisory Control and Data Acquisition (SCADA) in the 1960s [30]. SCADA works with energy management systems (EMS) [31] on the transmission grid to supervise, optimize, and control the power transmission. At the distribution grid, SCADA works with distribution management systems (DMS) to perform the same function, which started from the 1980s. For example, SCADA/EMS and SCADA/DMS both retrieve data for data analytics, and the data points can cover all the network locations where communication and sensing systems are available. Based on the collected data and system model, SCADA/EMS and SCADA/DMS can monitor the system states, correct system models, identify faults, simulate future scenarios, and prevent cascading failures or faults [32]. Utilities can also use such a system to participate in energy trading, etc.

2.4.1 Big Data Era While adding new capabilities, the distributed energy resource proliferation raises great concern about challenges such as dynamic fluctuations of system states. Fortunately, recent advances in communications, sensing, computing, and control, as well as the targeted investments toward deploying advanced meter infrastructures (AMIs) and synchrophasors, have become drivers and sources of data, which were previously unavailable in the electric power industry. Such technological advances have resulted in the automatic collection of huge amounts of data with small, lowcost, and efficient sensor devices. Data collection is continuous and often needs to be processed as it arrives. Even more, such a database is expected to exhibit exponential growth. The fundamental goal for data management is to maintain data quality, manage information, and conduct data analytics. With vast amounts of data being generated in the power grids, researchers and engineers need to address questions, such as what patterns and trends are needed to be extracted and how to use them to improve power systems’ reliability, security, sustainability, efficiency, and flexibility. For example,

14

2 Basics of Power Systems

in the power grid system, a large amount of measurement data is collected at specific intervals from sensors deployed throughout the system. The collected data is then processed to provide a snapshot of the current system status of the grid to the operator. In the era of big data, various sensors create variant contexts from users and abundant training examples. Together they enable innovative unprecedented datadriven approaches to ensure grid robustness. Based on such data, for example, one can use a Bayesian approach based on a historical data search to solve the problem of state estimation [33]. Specifically, a group of historical similar measurement sets and their corresponding state estimates are used in combination with the current measurement to understand if the system is healthy or not [34]. The need for a data-driven method is particularly important in distribution grids. For example, the traditional approach to power outage assessment in distribution grid relies on customers making phone calls to report an incident to the Customer Information System, from which the Outage Management System obtains the information and dispatches crews to the field to identify outage areas. They often receive delayed and imprecise outage information, making outage detection and power restoration slow and inefficient. The recent deployment of advanced metering infrastructure (AMI) enables smart meters to send a “last gasp” message when there is a loss of power [35]. In addition, fault location, isolation, and service restoration (FLISR) technologies and systems have been adopted by the utilities in the United States, which can further reduce the impact and duration of power interruptions [36]. With the integration of DERs, the approaches above will have limited performance. For example, if rooftop solar panels are installed on the customers’ premises, the customer can still receive power from renewable generators when there is no power flow in the distribution circuit connecting to the premises. So, the (AMI) smart meter at the customer premises cannot detect a power outage. Also, in metropolitan areas, such as New York City, Chicago, and San Francisco, the secondary distribution grids are mesh networks [37]. Hence, a branch outage will not necessarily cause a power outage. However, the distribution grid system operators still need to detect, localize, and identify the out-of-service branches.

2.4.2 Challenges in Big Data Analytics Currently, there is a significant gap in performance between the existing research and the desirable “informative” data exploration. The challenges of smart grid are enormous as the power industry paradigm shifts from the traditionally complex physical model-based monitoring architecture to data-driven model-based resource management. Expectably, such a gap will be enlarged in the new smart grid, where tremendous new devices and technologies not only lead to economic and climate awards but also new problems [38–40]. According to utility records, for the past 10 years, numerous distributed generator plants (wind, solar, etc.) have undergone exponential growth and raised great concerns about the burden imposed by their large number and unreliability on the future electric power grid. Such

2.4 Flow of Information

15

unconventional characteristics of the grid make it difficult, if not impossible, for conventional power system analysis tools to improve their performance. Another important issue is computational speed. The reason is big data analytics takes time, and a large computational burden prevents the new method to be applied online. For example, the similarity evaluation over high-dimensional power system measurement vector is time-consuming in a large electric power grid. Besides, the time required to exhaust all the historical measurements is also costly. Therefore, exploring the structure of power system measurement data and systematically organizing them for data-driven SE play key roles in providing better streaming estimation with real-time guarantees for sustainable grid services [41]. While computational time is important, scalability of the algorithm is also the key for data-driven analysis for large-scale system analysis, especially for analyzing edge devices altogether. We can use it properly to model the future large-scale smart grid. For example, the state estimators used by the industry today are hard to scale up and are computationally complex [42]. To avoid excessive computational complexity, only extra high-voltage (EHV), high-voltage (HV), and occasionally medium-voltage (MV) representations of the complex multi-voltagelevel power grids are included. The low-voltage (LV) distribution networks are not modeled nor supported by the online SE today. This, in turn, makes it difficult to estimate the status and states of many new diverse resources and users connected to the LV level distribution systems. More generally, the power system operators of traditional power grids face inherent difficulties in managing the effects of small-scale generations and loads, including but not limited to renewable energy generators, such as wind and solar generators; responsive small electricity users; and electricity users which can offer storage to utility, such as electric cars. While having the potential to reduce the impact on the environment, increase fuel diversity, and bring in economic benefits, these new components also raise tremendous concerns regarding the secure and reliable operation of the backbone EHV/HV power grids; in particular, their state needs to be estimated to account for their effects on the state of the backbone power grid. This needs to estimate the online state in the entire electric power grid making it even more difficult to manage all data in a centralized way than in the past. A multilayered, distributed implementation of state estimators for future electric energy systems is likely to become the preferred approach; this requires a systematic design of distributed algorithms whose performance does not worsen relative to the centralized methods. Additionally, there are topics that are non-traditional, such as customer behavior analysis. For example, the smart meters at households enable a new opportunity to utilize the time-series data, which were unavailable previously in the electric power industry [34], to solve new problems. Fortunately, the smart meters installed at households enable a new opportunity to utilize the time-series data, which were previously unavailable in the electric power industry [43], to tackle the distribution grid outage detection challenge. Understanding consumer flexibility and behavior patterns is becoming increasingly vital to the design of robust and efficient energy-saving programs. Accurate prediction of consumption is a key part of this understanding.

16

2 Basics of Power Systems

Fig. 2.2 Artificial intelligence methods can be used to solve problems that occurred in the power grid

2.4.3 Look into the Future More sensors [44, 45], high-performance computers, and storage devices will be deployed over the power grid in the future, which creates an enormous number of opportunities for improved operation of the grid. Utilizing these data can be regarded as a tremendous opportunity to test and resolve the problems that happen simultaneously. Notably, learning from the data to deal with uncertainties has been widely recognized as a key player in achieving the core design of Wide Area Monitoring, Protection, and Control (WAMPAC) systems, which centers on efficient and reliable operations [41]. For example, Fig. 2.2 shows such an illustration of the big picture for the future smart grid. The remainder of the book is used to introduce use cases for such big data analytics.

Chapter 3

Emerging Technology for Distributed Energy Resources

3.1 Distributed PV Generation In this section, we explain the tool DeepSolar, which is used for solar analysis. It is a deep learning framework analyzing satellite imagery to identify the GPS locations and sizes of solar photovoltaic panels. Leveraging its high accuracy and scalability, we constructed a comprehensive high-fidelity solar deployment database for the contiguous United States. We demonstrated its value by discovering that residential solar deployment density peaks at a population density of 1000 capita/mile.2 , increases with annual household income asymptotic at .∼ $150k, and has an inverse correlation with the Gini index representing income inequality. We uncovered a solar radiation threshold (4.5 kWh/m.2 /day) above which the solar deployment is “triggered.” Furthermore, we built an accurate machine learning-based predictive model to estimate the solar deployment density at the census tract level. DeepSolar is a public database that we offer as a publicly available resource for researchers, utilities, solar developers, and policymakers to further uncover solar deployment patterns, build comprehensive economic and behavioral models, and ultimately support the adoption and management of solar electricity.

3.1.1 Introduction Deployment of solar photovoltaics (PVs) is accelerating worldwide due to rapidly reducing costs and significant environmental benefits compared with electricity generation based on fossil fuels [46]. Because of their decentralized and intermittent nature, cost-effective integration of solar panels on existing electricity grids is becoming increasingly challenging [47, 48]. What is critically needed and currently unavailable is a comprehensive high-fidelity database of the precise locations and sizes of all solar installations. Recent attempts such as the Open PV Project [49] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Xie et al., Data Science and Applications for Modern Power Systems, Power Electronics and Power Systems, https://doi.org/10.1007/978-3-031-29100-5_3

17

18

3 Emerging Technology for Distributed Energy Resources

rely on voluntary surveys and self-reports. While they have been quite impactful in our understanding of solar deployment, they run the risk of being incomplete with no guarantee on the absence of duplication. Furthermore, with the rapid pace of solar deployment, such a database could become outdated. Machine learning combined with satellite imagery can be utilized to overcome the shortcoming of surveys [50]. The availability of satellite imagery with spatial resolution is less than 30 cm for the majority of the United States, which is annually updated, and offers a rich data source for solar installation detection based on machine learning. Existing pixel-wise machine learning methods [51, 52] suffer from poor computational efficiency and relatively low precision and recall (cannot reach 85% simultaneously), while existing image-wise approaches [53] cannot provide system size or shape information. Google’s Project Sunroof utilizes a proprietary machine learning approach to report locations without any size information. They have so far identified much fewer systems (0.67 million) than in the Open PV database (.∼1 million) in the contiguous United States. Leveraging the development of convolutional neural networks (CNNs) [54] and large-scale labeled image datasets [55] for automatic image classification and semantic segmentation [56], here we present an efficient and accurate deep learning framework called DeepSolar that uses satellite imagery to create a comprehensive high-fidelity database (which we called DeepSolar database) containing the GPS locations and sizes of solar installations in the contiguous United States. To demonstrate the value of DeepSolar, we correlate environmental and socioeconomic factors with solar deployment data and have uncovered interesting trends with these factors. We utilize these insights to build SolarForest, the first high-accuracy machine learning predictive model that can estimate solar deployment density at the census tract level utilizing local environmental and socioeconomic features as input.

3.1.2 Results 3.1.2.1

Scalable Deep Learning Model for Solar Panel Identification

Generating a national solar installation database from satellite images requires a method that is able to accurately identify panel location and size from very limited and expensive-to-obtain labeled imagery while being computationally efficient to run at a nationwide scale. DeepSolar is a novel semi-supervised deep learning framework featuring computational efficiency, high accuracy, and label-free training for size estimation (Fig. 3.1). Traditionally, training a CNN to classify images requires large amounts of samples with true image-level class labels. Training the CNN to segment objects also requires programming that is defined with ground-truth pixel-wise segmentation annotations, which are extremely expensive to construct. Furthermore, fully supervised segmentation has relatively poor computation efficiency [51, 52]. To enable efficient solar panel identification

3.1 Distributed PV Generation

19

Fig. 3.1 Schematic of DeepSolar image classification and segmentation framework (a) Input satellite images are obtained from Google Static Maps. (b) Convolutional neural network (CNN) classifier is applied. (c) Classification results are used to identify images containing systems. (d) Segmentation layers are executed on positive images and are trained with image-level labels rather than actual outlines of the solar panel, so it is “semi-supervised.” (e) Activation maps generated by segmentation layers where whiter pixels indicate a higher likelihood of solar panel visual patterns. (f) Segmentation is obtained by applying a threshold to the activation map, and finally, both panel size and system counts can be obtained

and segmentation, DeepSolar first utilizes transfer learning [57] to train a CNN classifier on 366,467 images sampled from over 50 cities/towns across the United States with merely image-level labels indicating the presence or absence of panels. Segmentation capability is then enabled by adding an additional CNN branch directly connected to the intermediate layers of the classifier, which is trained on the same dataset to greedily extract visual features to generate clear boundaries of solar panels without any supervision of actual panel outlines. Such a “greedy layer-wise training” technique greatly enhances the semi-supervised segmentation capability, making its performance comparable with fully supervised methods. The output of this network is an activation map that involves a threshold to produce panel outlines. Segmentation is not applied on samples predicted to contain no panels, greatly enhancing the computation efficiency. Details can be found in Sect. 3.1.4 and Supplemental Information in [58]. The performance of our model is evaluated on a test set containing 93,500 randomly sampled images across the United States. We utilize precision (rate of correct decisions among all positive decisions) and recall (ratio of correct decisions among all positive samples) to measure classification performance. DeepSolar achieves a precision rate of 93.1% with a recall of 88.5% in residential areas and a precision rate of 93.7% with a recall of 90.5% in non-residential areas. Such a result is significantly higher than previous reports [51–53, 59]. Furthermore, our

20

3 Emerging Technology for Distributed Energy Resources

performance evaluation guarantees far more robustness since their test sets were only obtained from one or two cities, but ours are sampled from nationwide imagery. Mean relative error (MRE), the area-weighted relative error, is used to measure size estimation performance. The MRE is 3.0% for residential areas and 2.1% for nonresidential areas for DeepSolar. The errors are independent and nearly unbiased, so the MRE decreases even further when measured over larger regions. See more details in Supplemental Information [58].

3.1.2.2

Nationwide Solar Installation Database

Over a month, DeepSolar was used to scan over one billion image tiles covering all urban areas as well as locations with reasonable nighttime lights to construct the first complete solar installation profile of the contiguous United States with exact locations and sizes of solar panels (see Supplemental Information [58] for details). The number of detected solar systems in the contiguous United States is (1.4702 .± 0.0007) million, which exceeds the 1.02 million installations without an accurate location in Open PV [49] and the 0.67 million installations without size information in Project Sunroof. In our detected installation profile, a solar system is a set of solar panels on top of a building or at a single location such as a solar farm. We built a complete resource density map in the contiguous United States from state level to household level (Fig. 3.2). Solar installation densities have dramatic variability at state (e.g., 1.34–224.1 m.2 /mile.2 ) and county levels (e.g., 255– 7490 m.2 /mile.2 in California). Distributed residential-scale solar systems are 87% of the total system counts, but 34% of the total panel area in our database, and 23.4% of the census tracts contain 90% of the residential-scale installations (Fig. 3.3a). Only 2998 census tracts (4%) have more than 100 residential-scale systems (Fig. 3.3b). The median average system size for tracts with different levels of residential solar system counts is all between 20 and 27 m.2 (Fig. 3.3b). Due to the distributed nature of residential solar systems and their small variability in sizes, in this work, we focus on residential solar deployment density, defined as the number of residential-scale systems per thousand households at the census tract level. Leveraging our database, non-residential solar deployment can also be extensively analyzed in the future.

3.1.2.3

Correlation Between Solar Deployment and Environmental/Socioeconomic Factors

We correlate the residential solar deployment with environmental factors such as solar radiation and socioeconomic factors from US census data to uncover solar deployment trends. We also collect and consider possible financial indicators reflecting the cumulative effects of energy policies, including the average electricity retail rate over the past 5 years, the number of years since the start of net metering, and other types of financial incentives.

3.1 Distributed PV Generation

21

Fig. 3.2 Solar resource density (solar panel area per unit area [m.2 /mile.2 ]) at state, county, and census tract levels, with examples of detected solar panels.Darker colors represent higher solar resource density. Several census tracts in Hudson County, New Jersey, have solar resource density higher than 30,000 m.2 /mile.2 , while the five northern states (Montana, Idaho, Wyoming, North Dakota, and South Dakota) have solar resource density less than 1.34 m.2 /mile.2 , indicating extremely heterogeneous spatial distributions. The red-line rectangles denote the predicted bounding boxes of solar power systems in image tiles, and the values denote the estimated area of solar systems

Fig. 3.3 Residential solar deployment statistics at census tract level (a) Cumulative distribution of residential solar area over census tracts. (b) Tract counts of the number of solar systems. The left y-axis is the number of census tracts corresponding to the bar plot. The right y-axis is the tract mean solar system size, corresponding to the purple error bar plot. The purple dots are the medians of tract average system size within each category; the error bars represent 25 and 75% percentiles. (c) Mean system size of a tract varies with the number of residential solar systems in the tract. Each point represents one census tract. When the number of the system increases, the mean size converges to 25 m.2

22

3 Emerging Technology for Distributed Energy Resources

Results show that solar deployment density sharply increases when solar radiation is above 4.5–5 kWh/m.2 /day (Fig. 3.4a), which we define as an “activation” threshold triggering the increase of solar deployment. When we dissect this trend according to electricity rates (Fig. 3.4c), we find that the activation threshold is clear for low-electricity-rate regions, but it is unclear in high-electricity-rate regions, indicating that this threshold may reflect a potential financial breakeven point for deep penetration of solar deployment. Since a significant variation of solar deployment density is observed with solar radiation (see Supplemental Information [58] for details), we split all tracts into three groups according to the radiation levels (low, medium, and high) and analyze the trends with other factors based on such grouping. Population/housing density has been observed to be positively [60] or negatively [61, 62] correlated with solar deployment. Figure 3.5a shows that both trends hold but with a peak deployment density at the population density of 1000 capita/mile.2 . Rooftop availability is not the limiting factor as the trend persists when we compute the number of systems per thousand rooftops (see Supplemental Information [58] Section 3.2). Annual household income is a substantial driver for solar deployment (Fig. 3.5b). Low- and medium-income households have low deployment densities despite

Fig. 3.4 Correlation between solar radiation and solar deployment (a) Solar deployment density has a nonlinear relationship with solar radiation. Two thresholds (4.5 and 5.0 kWh/m.2 /day) are observed for all percentiles. Shaded areas represent the cumulative maximum of percentile scatters. Census tracts are grouped according to 64 bins of solar radiation. Curves are fitted utilizing locally weighted scatterplot smoothing (LOWESS). (b) US map colored according to the three levels of average solar radiation defined by the thresholds identified in (a). (c) Solar deployment density correlation with solar radiation, conditioning on the level of retail electricity rate

3.1 Distributed PV Generation

23

Fig. 3.5 Residential solar deployment density correlates with socioeconomic factors conditional on radiation Census tracts are grouped according to 64 bins of the target factor. Curves are fitted utilizing LOWESS. Blue/green/brown labels denote the county that the median census tract in the bin belongs to. Here we only show tracts with high solar radiation (.>5.0 kWh/m.2 /day). Complete trends are shown in Figure 14 in Supplemental Information [58]. (a) Solar deployment density increases with population density, with a peak at 1000 capita/mile.2 . (b) Solar deployment density increases with average annual household income but saturates at incomes of $150k. (c) Solar deployment density increases with the average years of education. (d) Solar deployment density decreases with income inequality in a tract, and a critical Gini index of 0.4 saturates solar deployment

solar systems being profitable for high-radiation rates, indicating that the lack of financial capability of covering the upfront cost is likely a major burden of solar deployment. Surprisingly, we observe that the solar deployment in high-radiation regions saturates at annual household incomes higher than $150,000, indicating other limiting factors. The solar deployment density rate also shows an increasing trend with an average education level (Fig. 3.5c). However, if conditioning on income, this trend does not hold in regions with high radiation but still holds in the regions with poor solar radiation and lower-income level (Figure 15 in Supplemental Information [58]). Moreover, solar deployment density in census tracts with high radiation is strongly correlated and decreases with the Gini index, a measure of income inequality (Fig. 3.5d). Additional trends that illustrate racial and cultural disparities, for example, can be extracted utilizing this database. We expect that routinely updating the DeepSolar large-scale database and making it publicly available can empower the community to uncover further insights.

24

3.1.2.4

3 Emerging Technology for Distributed Energy Resources

Predictive Solar Deployment Model

Models that estimate deployments from socioeconomic and environmental variables are key for decision-making by regulatory agencies, solar installers, and utilities. Studies have focused on either utilizing surveys [63–69] or data-driven approaches [60–62, 70–74] at spatial scales ranging from county- to state-level models, achieving in-sample R2 values between 0.04 and 0.71. The models are typically linear [74] or log-linear [73] and utilize less than 10,000 samples for regression. Our result instead reveals that socioeconomic trends are highly nonlinear. Furthermore, our database, generated by DeepSolar, offers abundant data points to develop elaborate nonlinear models. Hence, we build and compare several accurate predictive models to estimate solar deployment at the census tract level utilizing the data from more than 70,000 census tracts (see details in Experimental Procedures). Each model takes 94 environmental and socioeconomic factors as inputs, such as solar radiation, average electricity retail rate over the past 5 years, number of years since the start of different types of financial incentives, average household income, etc. (see details in Supplemental Information [58]). These 94 factors are the largest set of factors we can collect for all census tracts, and part of them have also been utilized and reported in previous works [60–62, 70–74]. Among all predictive models, the random forest-based model, called SolarForest, achieves the tier-1 out-of-sample R2 value of 0.72 in the tenfold cross-validation, which is even higher than the in-sample R2 values of any other models in previous works [60–62, 70–72, 74]. SolarForest is a novel machine learning-based hierarchical predictive model that postulates census tract level solar deployment as a two-stage process: whether tracts contain solar panels or not, and, if they do contain them, the number of systems per household is decided (Fig. 3.6a). Each stage utilizes a random forest [75] that takes all 94 factors into account. By ranking the feature importance in prediction at both stages, we observe that population density is the most significant feature to decide whether a census tract contains solar systems (Fig. 3.6b); for a census tract containing solar systems, environmental features such as solar radiation, relative humidity, and the number of frost days serve as the most important predictors to estimate the solar deployment density (Fig. 3.6c).

3.1.3 Discussion DeepSolar is a novel approach to create, publish, update, and maintain a comprehensive open database on the location and size of solar PV installations. Our aim is to continuously update the database to generate a time-history of solar installations and increase coverage to include all of North America, including remote areas with utility-scale solar and non-contiguous US states. Eventually, the database will include all regions in the world that have high-resolution imagery. In this work, we only estimated the horizontal projection areas of solar panels from satellite imagery. In the future, based on existing GPS location information, we aim to continue using

3.1 Distributed PV Generation

25

Fig. 3.6 Architecture and feature importance of SolarForest (a) SolarForest combines two random forests: a random forest binary classifier (blue) to predict whether a census tract contains at least one solar system and a random forest regression model (magenta) to estimate the number of solar systems per thousand households if the tract contains solar systems. The classifier consists of 100 decision trees, and the regressor consists of 200 decision trees. Each circle of the decision trees represents a node for binary partitioning according to the value of one feature. If the output of the classifier is “Yes,” the final prediction of solar deployment density rate is the output of the regressor; else the solar deployment density is predicted to be zero. (b) The relative feature importance in the SolarForest model, classification stage. (c) The relative feature importance in the SolarForest model, regression stage

deep learning methods to infer roof orientation and tilt information from street view images, enabling a more accurate estimation of solar system size and solar power generation capacity. In addition, the database is linked to US demographic data, solar radiation, utility rates, and policy information. We demonstrated that this rich database led to the discovery of previously unobserved nonlinear socioeconomic trends of solar deployment density. It also enabled the development of state-ofthe-art predictive models on solar deployment based on machine learning. As we annually update the database, such predictive models can be further improved to forecast the annual increment of solar installations in the census tracts according to the local environmental and socioeconomic factors. In the near future, this database can be utilized to develop granular adoption models relying on richer information of electricity rates and incentives, conduct causal inferences, and gain a nuanced understanding of peer effects, inequality, and other sociocultural trends in solar

26

3 Emerging Technology for Distributed Energy Resources

deployment. It can serve as a starting point to develop engineering models for solar generation in power distribution systems. The DeepSolar database closes a significant gap for the research and policy community while at the same time advancing methods in semi-supervised deep learning on satellite data and solar deployment modeling.

3.1.4 Experimental Procedures 3.1.4.1

Massive Satellite Imagery Dataset

A massive amount of image samples is essential for developing a CNN model since CNN can only gain good generalization ability with a large number of labeled samples for training. Bradbury et al. [76] built a manually labeled dataset based on US Geological Survey ortho-imagery. However, it is sampled from only four cities in California, which fails to cover the diversity nationwide; thus, they cannot guarantee that the developed model will perform well in other regions. In comparison, we have built a large-scale satellite image dataset based on the Google Static Map API with images collected to cover the contiguous United States (50 cities/towns) comprehensively. Our dataset consists of a training set (366,467 samples), a validation set (12,986 samples), and a test set (93,500 samples). The percentage of images in the dataset for model development compared with the total number of images we scanned so far in the United States is 0.043%. Images in the test set are randomly sampled by generating random latitude and longitude within rectangular regions totally different from those in the training set. To train both classification and segmentation capabilities, an image-level label, indicating positive (containing solar panel) or negative (not containing solar panel), is annotated for all samples in the dataset. To evaluate the ability of size estimation, each test sample is also annotated with ground-truth regions of solar panels besides image-level labels. This dataset is made public for the research community to drive model developing and testing on specific computer vision tasks. See Supplemental Information [58] for more details.

3.1.4.2

System Detection Using Image Classification

We utilize a state-of-the-art CNN architecture called Inception-v3 [77] as our basic classification framework. The Inception-v3 model is pre-trained with 1.28 million images containing 1000 different classes in the 2014 ImageNet Large Scale Visual Recognition Challenge [55] and achieves 93.3% top 5 accuracies on that dataset. We start from the pre-trained model since the diversity from the massive dataset helps the CNN learn basic patterns of images across multiple domains. The model is then developed on our training set by retraining the final affine layer from randomized initialized parameters and fine-tuning all other layers starting from the

3.1 Distributed PV Generation

27

well-trained parameters. This process, called transfer learning [57], is becoming common practice in deep learning and computer vision. The outputs of our model are two probabilities indicating positive (containing solar) and negative (not containing solar). The distribution of binary solar panel labels is extremely skewed in the training set (46,090 positive in 366,467 total) since solar panels are very rare compared with the whole territory. We solve this problem with a cost-sensitive learning framework [78–80], which automatically sets more penalties to the misclassifications of positive samples than negative samples (see details in Supplemental Information [58]).

3.1.4.3

Size Estimation Using Semi-supervised Segmentation

In addition to identifying whether an image tile contains solar panels, we also developed a semi-supervised method to accurately localize solar panels in images and estimate their sizes. Compared with fully supervised approaches suffering from low computation efficiency and requiring a large number of training samples with ground-truth segmentation annotations, our semi-supervised segmentation model requires only image-level labeled (containing solar or not) images for training, which is achieved by greedily extracting visual patterns from intermediate results of classification. Roughly speaking, in CNN, the output of each convolutional layer is a stack of featured maps, each representing different feature activations. With the linear combination of these visual patterns, we can obtain a class activation map (CAM) [81] indicating the most activated regions of our target object, a solar panel. Furthermore, in CNN, features learned at upstream layers represent more general patterns such as edges and basic shapes, while features learned at downstream layers represent more specific patterns. As a result, upstream feature maps are more complete but noisy, while downstream feature maps are more discriminative but incomplete. By greedily extracting features at upstream layers, we can generate both complete and discriminative CAM for segmentation. To achieve that, we repeat greedily training a series of layers for classification and adding new layers after training (see details in Supplemental Information [58]). Such a greedy layerwise training mechanism is proposed for the first time for semi-supervised object segmentation. The code for system detection and size estimation is available here: http://web.stanford.edu/group/deepsolar/home.

3.1.4.4

Distinguish Between Residential and Non-residential Solar

Our database contains both residential and non-residential solar panel data. We distinguish between residential and non-residential solar panels since they have different usages, scales, and economic natures. Due to the size, shape, and location differences of these two types of solar panels, we utilize a logistic regression model and train it with four basic features of each solar system: solar system area, nightlight intensity, the ratio between the solar system area and its bounding box

28

3 Emerging Technology for Distributed Energy Resources

area, and a Boolean, indicating if the system is merged from a single image tile. Since the non-residential solar systems only account for a small proportion, we also assign different weights, which are inversely proportional to the quantity ratio, to the misclassification of these two types during training. The training set size is 5000, and the test set size is 1078. Out-of-sample tests show that the recall is 81.3% for the residential type and 98.5% for the non-residential type, and the precision is 96.8% for residential type and 90.6% for non-residential types on the test set. These results are in terms of area.

3.1.4.5

Predictive Solar Deployment Models

We have developed and compared several nonlinear machine learning models to estimate the census tract level solar deployment rate utilizing 88 environmental and socioeconomic factors as inputs. The models are linear regression with quadratic and interaction terms, multivariate adaptive regression splines (MARS), one-stage random forest, two-stage models utilizing a second stage with linear regression or MARS, two-stage random forest (SolarForest), and feedforward neural network (SolarNN). We utilize tenfold cross-validation to estimate their out-of-sample performances. The results in Table 3.1 summarize performance utilizing crossvalidation R2 values (out-of-sample estimate) to compare easily between different models. SolarForest achieves R2 = 0.722 and SolarNN achieves R2 = 0.717, which are the highest state-of-the-art accuracy. SolarForest is an ensemble random forest [75] framework with a random forest classifier and a random forest regression model (Figure S12). It aims at capturing a two-stage decision process at the census tract level. The classifier identifies whether a census tract has at least one system installed, and the regressor estimates the number of systems installed in the tract in case the tract contains solar systems. Both models utilize the 88 socioeconomic and environmental census tract level variables listed in Supplemental Information [58]. Gini importance is used to measure the feature importance for both classifier and regressor in the SolarForest, which is

Table 3.1 Comparison of the cross-validation R2 value of different solar deployment predictive models. Tenfold cross-validation is carried out utilizing the census tract data. LR linear regression, MARS multivariate adaptive regression splines, RF random forest. Hierarchical SolarForest proposed in the section was the best-performing model Model LR (quadratic .+ interaction) MARS RF regressor RF classifier .+ LR (quadratic .+ interaction) RF classifier .+ MARS SolarForest (RF classifier .+ RF regressor) SolarNN (feedforward neural network)

Cross-validation R.2 0.181 0.267 0.412 0.643 0.592 0.722 0.717

3.2 The Impact of Electric Vehicle Penetration

29

calculated by adding up the Gini impurity decreases during the fitting process for each individual feature. SolarNN is a feedforward neural network model with five hidden, fully connected layers. Each hidden layer contains 88 neurons. It has a scalar output of the estimated value of solar deployment density. The activation function used in SolarNN is ReLU [82].

3.2 The Impact of Electric Vehicle Penetration After introducing a learning tool of DeepSolar on the generation side, we will analyze the impact of electric vehicles in this section on the load side. Specifically, the problem of where to charge and how to charge is critical for accommodating more electric vehicles (EVs) to battle against fossil fuel emission. On the utility side, charging station placement is critical due to factors such as voltage regulation, protection device upgrade, etc. To understand the global optimum of large-scale analysis, we study the sensitivity of charging costs with respect to different factors while preserving the convex formulation. On the customer side with EVs, strategic coordination of EV charging would benefit both the power system and transportation system. Therefore, we will show the factors that influence EV users to choose charging routes and stations. Such analysis is based on a congestion game setup to optimize and reveal the relationship between the grid and its users.

3.2.1 Introduction With the ongoing installation of cyber-infrastructure, interconnecting cyberinformation in different areas is regarded as an efficient way to exchange energy system information for an efficient energy integration. Motivated by the success of the Internet, the existing energy network and its cyber network create the Energy Internet [83, 84]. This Energy Internet is expected to accommodate electric vehicles and form an efficient grid for integrating highly distributed and scalable alternative generating sources and storage with the existing power systems [84]. Traditionally, the concept of the Energy Internet focused more on the static grid [85]. However, the US electric vehicle (EV) market saw a .32% annual growth rate between 2012 and 2016. The growth rate now is reaching .40%. With EVs becoming ubiquitous in the future, the information within the Energy Internet needs to be expanded to include data from both the static grid and the dynamic components. Figure 3.7 shows such a concept, where the energy information about the static power grid and the EVs with mobility are displayed. Specifically, the information is not only exchanged in different areas of the static grid but also jointly optimized in the data cloud with the EV information. This information fusion process is realized by the cyber layer of the Energy Internet.

30

3 Emerging Technology for Distributed Energy Resources

Fig. 3.7 Energy Internet for electric vehicles

Such an Energy Internet can provide answers to industrial questions coming from utilities and private companies. Being in the forefront of hosting new components, many utilities find it beneficial to know how to efficiently plan their system updates for more EVs [86, 87]. EV manufacturers would be eager to know the best charging plan for improving user experience. To see the impact of EVs on the grid, one can model residential power consumption [88–95]. For example, [96] models the residential and personal transportation energy demands in a single framework to capture an individual household’s full energy footprint [92, 97, 98]. Muratori [96] observes that uncoordinated PEV charging could significantly alter the shape of aggregate residential demand, potentially posing a threat to the electricity infrastructure, even at low adoption levels. At the local level, clustering effects in vehicle adoption may result in high PEV concentrations even if overall adoption remains low, causing distribution transformers to experience much higher peak demand and necessitating modifications to the energy distribution system. When higher in-home power charging is used [96], this impact is amplified (e.g., level 2 as opposed to level 1 charging). Different than fixed charging location and unchanged customer route setup from the above, this section aims to understand the impact of placing the charging station and reroute the electric vehicles, shown in Fig. 3.8.

3.2 The Impact of Electric Vehicle Penetration

31

Fig. 3.8 The cyber-physical network for electric charging grid and the transportation grid

3.2.2 The Impact of EV Charging Locations to the Power Grid Under the Paris Agreement signed in 2016, the model of a sustainable urban city—Singapore—pledged to cut emissions intensity by .36% below 2005 levels by 2030 [99]. To meet the commitment, emissions reduction worldwide in the transport sector is crucial, and large-scale electric vehicle (EV) adoption in the future is, therefore, utmost essential to Singapore and many other cities/countries. For example, Singapore took several important steps in this direction such as (1) an announcement of a new Vehicular Emissions Scheme [100] and (2) the launch of the electric vehicle car-sharing program [101], etc. However, one of the major barriers to successful adoption of EVs at a large scale is the limited number of available charging stations. Thus, it is important to properly deploy EV charging infrastructure to enhance the adoption of EVs efficiently. EV charging station placement has therefore been an active research area for intercity and urban infrastructure planning. In freeway charging infrastructure planning, [102] tackles the EV charging station placement problem in a simple

32

3 Emerging Technology for Distributed Energy Resources

round freeway, whereas [103] proposes a capacitated-flow refueling location model to capture PEV charging demands in a more complicated meshed transport network. However, both papers share the similarity of considering the driving distance in the freeway. In contrast, the driving distance constraints are not prominent in the urban area charging infrastructure planning since the charging stations are easily accessible; therefore, researchers have considered various aspects dedicated to urban area charging station placement. For example, [104] manages to find the optimal way to recharge electric buses with long continuous service hours under two scenarios: with and without limited batteries. However, it is applicable only to public bus systems. R-Ghahnavieh and S-Barzani [105] considers urban traffic circulations and hourly load change of private EVs, but it ignores the geographical land and labor cost variations that are of high importance in urban areas. If focusing in on the specific techniques deployed and the realistic factors considered, the problem under study can be examined in various technical aspects. For example, [106] includes the annual cost of battery swapping, and [107] considers the vehicle-to-grid technology. Furthermore, researchers and engineers explore many realistic factors such as investment and energy losses [108], quality of service [109], service radius [110], etc. The work in [111] considers the EV integration impact on the grid. In fact, when the load profiles change, the electrical demand at particular points can exceed the rated value of the local T&D infrastructure. A study in the United States has put the value of deferring network upgrade work at approximately $650/kW for transmission and $1050/kW for distribution networks [112]. Besides the techniques in the previously mentioned papers that focus on infrastructure upgrades, it may be necessary under large-scale EV integration. Some papers discuss infrastructure upgrade. For example, [107] considers the loading limits of the distribution transformer and distribution lines, while [113] considers minimizing the voltage deviation cost. However, realistic factors like the upgrade of protective devices and its effect on the overall planning are not addressed in the context of EV charging station integration in the past research. The aforementioned urban planning and technical issues are mainly formulated as optimization problems. Based on the nature of the equations involved, these optimization problems contain linear programming as well as the nonlinear programming problems [107, 108]. Based on the permissible values of the decision variables, integer programming, and real-valued programming, they usually exist in the same EV charging station problem [114]. Based on the number of objective functions, both single-objective [103] and multi-objective [108, 115] problems are proposed by researchers. Variously, the optimization problems are sometimes considered on a game theoretical framework in [109, 116]. Solutions to these optimization problems include greedy algorithm [104, 114], genetic algorithm [107], interior point method [110], gradient methods [109], etc. However, these solutions do not consider the convexification of the constraints. Consequently, they are unable to guarantee a global optimum. Figure 3.9 shows the flowchart of the proposed EV charging station placement method. It considers the integrated electrical and transportation networks as well

3.2 The Impact of Electric Vehicle Penetration

33

Electrical network DG

Substation

DG

Costs in Obj. Function

Assumption • Urban • Large scale EV integration

• Distribution Expansion • Voltage regulation • Protective device upgrade

Optimization

Zoom-in view in Fig. 2

Sensitivity analysis

• EV station cost • EV station sitting • EV station sizing

Convexification

EV charging station planning

Satisfying

Transportation network

Fig. 3.9 Flowchart of the proposed EV charging station placement method

as their associated infrastructure costs. Therefore, the costs related to distribution expansion, voltage regulation, protective device upgrade, and EV station construction are incorporated in the objective function. This study is assumed to be conducted for urban cities and large-scale EV integration in the future. Since a great number of stations need to be installed in this circumstance, the EV charging station integration location could be at any bus along the distribution feeder as long as the operation constraints permit. In this section, the objective function and constraints are first formulated and then explained in detail. The objective function aims to minimize the metrics in Fig. 3.9. There are four terms (also viewed as four constraints) in this objective function. They are related to different aspects of EV charging and network upgrade costs. Afterward, the sensitivity analysis is provided from a mathematical angle. In the following, we show the numerical results after quantifying with confiscation techniques [117]. Extensive sensitivity analysis is also demonstrated.

3.2.2.1

The Benefits of Problem Convexification

The purpose of convexifying nonlinear constraints is to guarantee a global optimum without jeopardizing the cost evaluation. Table 3.2 illustrates the comparison between the scenario with a convexification and without a convexification. First, there are 23 cases tested in this section under different EV flows and station capacity limits in order to obtain the percent of cases that failed to find a global minimum. Since the initial points also affect whether the optimization objective function converges to a global minimum or not, 11 initial feasible locations are therefore tested for each case to obtain the overall percentage of cases that failed to find a global minimum. As a result, 50 tests in total fail to converge to their corresponding global minimums. As is seen from Table 3.2, .50/(23 × 11) = 19.8% of cases failed to find a global minimum due to the non-convexity constraints. Not surprisingly, all of the cases with convexified constraints successfully find the global minimum.

34

3 Emerging Technology for Distributed Energy Resources

Table 3.2 Optimization results with and without constraint convexification in the 123-bus system Constraints Percent of cases that failed to find a global minimum Average total cost in cases with a global minimum ($) Average total cost in cases with a local minimum ($) Computational time in cases with a global minimum (s) Computational time in cases with a local minimum (s)

Without convexification (50)

.19.8%

With convexification (0)

.0.0%

.7.92

× 107 (8)

.7.96

.8.47

× 107 (3)

Unavailable

.112.4

(8)

.2115.7

(3)

× 107 (11)

.105.6

(11)

Unavailable

Note: the numbers in the brackets denote the numbers of tests under the corresponding constraints

Second, the fact of convexifying the constraints does not affect much of the total cost. To maintain fair comparisons under the same EV demand and computational complexity, the average total cost and computational time need to be calculated in the same case. In Table 3.3, the demonstrated case is with the EV flow of 5185 EVs/h and 25-spot limit per station. In this case, the solutions of three tests reach local minimums, and eight tests reach global minimums. Furthermore, to demonstrate the system-level performance after accumulating all errors due to convexification, Fig. 3.10 is presented here. The error is defined as the cost difference with and without convexification divided by the cost without convexification. As shown in Fig. 3.10, the total cost increases in the early stage when the EV flow is low since the optimization constraints force the charging stations to be built at high costs. At a later stage when the EV flow rises, the overall cost reduces because fewer stations are built—more spots can be installed in the same station where the costs are low. Third, the total optimization computational time with convexified constraints is comparable to the one without convexification. However, the computational time is significantly higher in the cases that a local minimum is found. In summary, the convex preservation contributes to a limited amount of extra cost to the total cost and always provides a global minimum with small computational time. Therefore, it is concluded that the idea of convexifying the constraint in this scenario is more beneficial than disadvantageous in this optimization problem.

3.2.2.2

Sensitivity Analysis for the Optimization Variables

This section observes the optimization variables to illustrate the station distribution of the entire system when the constraint coefficients change and then draws some interesting conclusions upon the observation. The spot limitation of each station

3.2 The Impact of Electric Vehicle Penetration

107

4.5 4

Total cost after convexification ($)

8

3.5 Total cost after convexification Error of cost afer convexification

7

3

6

2.5

5

2 1.5

4 1 3

2 1000

Error of cost afer convexification (%)

9

35

0.5

2000

3000

4000

5000

6000

0 7000

Number of EV per hour Fig. 3.10 The system-level performance after accumulating all errors due to convexification

in this network is assumed to be 25.1 Assuming the number of EV flows per hour requires only .30%2 of the maximum station capacity in the whole system, Fig. 3.11 shows the charging station distribution from bus 2 to bus 123. We further focus on five specific nodes at buses 33–37 to see how the constraints affect the EV charging station placement, as shown in Fig. 3.12. The five particular end nodes are selected as a representative region to exhibit the sensitivity of the placement distribution due to the constraint coefficient change. Upon a large amount of observation on the entire system’s placement distribution, the five particular end nodes well represent the overall characteristics. By observing the overall placement results in the 123-bus system, we conclude that the constraint on voltage regulations pushes the EV charging station placement toward the end of the distribution feeder. 1 Adding

one extra charging spot requires 15 .m2 –20 .m2 land [118, 119]; the land use is then close to 375 .m2 –500 .m2 . Since more spaces can be provided for non-EVs, this land use range can easily fit into the parking lot design requirements such as the ones in [120]. 2 Given 122 potential stations in the system, theoretically, the maximum station capacity in the whole system is .122 × 25 = 3050 spots. However, the reality can be that the EV flows per hour have not been saturated to the point that each available station needs to be fitted with a maximum of 25 spots. A .30% capacity indicates .3050 × 0.3 = 915 spots, which is a feasible scenario for medium-voltage distribution network as evidenced in [103, 119].

36

3 Emerging Technology for Distributed Energy Resources

20

c5=1e4

15

c5=5e5

10

c5=1e6

5

c5=1e6,cprot=0 20

40

60 Bus number

80

100

120

Number of EV charging spots

25

c5=0

0

Fig. 3.11 The station distribution of the entire system when the coefficient .c5 changes 1 2 3 8

4 5

9

6 7

10 12

11

16 17

13 14

15

21

18 19

22 20

23

24 26

25 27

28

29

40 33

41

30

34

35

31

37

36

42 43

32

Fig. 3.12 The partial topology of the distribution system and the highlight of the area under study

The cost derived from the constraint on the protective device is less when the EV charging stations are located near the feeder trunk.

3.2.2.3

Sensitivity Analysis for Different Cost Components

In this subsection, we investigate three issues. First, how does the amount of EV flow at unit time affect the number of charging stations and total cost? As the number of charges per hour progresses, the number of spots in demand is proportional to the number of EVs per hour. As for the number of stations, it reaches its maximum of 122 (assuming no station is built on the slack bus) in Fig. 3.13a. Meanwhile, the number of EVs per hour is 6913. In Fig. 3.13b, given the EV station capacity of 10

3.2 The Impact of Electric Vehicle Penetration

2500

140 Number of stations Number of spots

120

1500

80 60

1000

40

Number of spots

Number of stations

2000 100

500 20 0 0

1000

2000

3000

4000

5000

6000

0 7000

Number of EV per hour

(a) 140

1400 Number of stations Number of spots

1200

100

1000

80

800

60

600

40

400

20

200

0 0

500

1000

1500

2000

2500

3000

3500

0 4000

Number of EV per hour

(b) 2

10

8

Total cost Station cost Distribution system cost Voltage regulation cost Protection cost

Costs ($)

1.5

1

0.5

0 0

1000

2000

3000

4000

5000

6000

7000

Number of EV per hour

(c) 12 10 8

107 Total cost Station cost Distribution system cost Voltage regulation cost Protection cost

6 4 2 0 0

500

1000

1500

2000

2500

Number of EV per hour

(d)

3000

3500

4000

Number of spots

Number of stations

120

Costs ($)

Fig. 3.13 EV charging station placement in the 123-bus system under different EV magnitude. .c5 = 5e5. (a) EV stations and spots. EV station capacity 25. (b) EV stations and spots. EV station capacity 10. (c) EV costs. EV station capacity 25. (d) EV costs. EV station capacity 10

37

38

3 Emerging Technology for Distributed Energy Resources

10

5

4

c ($)

10 5 0 0

150 1000

100

2000 3000

50

4000 5000

Number of stations

0

Number of EV per hour

(a)

105

4

c ($)

10

5

0 0

10 1000

8 2000

6 3000

4 4000

10

7

2 5000

0

Total cost ($)

Number of EV per hour

(b) Fig. 3.14 The effect of the coefficient .c4 on the station number and the total cost. .c5 = 5e5. EV station capacity 10. (a) The impact of electric vehicle penetration on the station number. (b) The impact of electric vehicle penetration on the total cost

spots per station, the number of stations saturates at 122—the maximum number of stations that the current system can hold, when the number of EVs per hour reaches 3500. The cost diagrams under two EV station capacities are depicted in Fig. 3.13c and d. Second, what is the effect of distribution expansion costs on the total cost? Due to labor and land costs in different areas, prices will vary immensely (Fig. 3.14). Third, what are the relations between the amount of EV flow at unit time and the distribution grid operation costs on the voltage regulation and protection upgrade? When the EV charging station and distribution expansion costs are not dominantly high, the voltage regulation and the projection costs affect the total cost. The sensitivity of the voltage regulation cost and projection cost in terms of the number of EVs per hour is presented in Fig. 3.15. The voltage regulation cost

3.2 The Impact of Electric Vehicle Penetration

2.5

39

106 Voltage regulation Protection upgrade

Cost ($)

2

1.5

1

0.5

0 0

500

1000

1500

2000

2500

3000

3500

4000

4500

Number of EV per hour Fig. 3.15 The sensitivity of the voltage regulation cost and projection cost in terms of the number of EVs per hour. .c5 = 5e5. EV station capacity 10

rises quadratically and ceases when the number of EVs per hour exceeds the system station capacity, which is 3500 EVs per hour.

3.2.3 The Impact of Choosing EV Routes for Charging When considering charging, EV drivers choose routes with charging station(s) (1) closer to their location, (2) with less charging time, and (3) consistent with their travel direction. Therefore, we consider minimizing the charging time as the optimization goal, defined as the time consumed in the traveling, waiting, and charging process. It is obviously determined by the selected charging station and driving path as well as the number of vehicles that choose the same charging station at the same time. Thus, there is a clear competitive relationship between EVs during their charging process. When the charging price is low, in order to save costs, it is inevitable that congestion occurs as EVs compete for limited charging resources, resulting in a competitive relationship between EVs. Similarly, during peak charging periods, there is competition between EVs to save waiting time. This competitive relationship, or noncooperative relationship, may cause EV users to consume a substantial amount of time for charging and may place a vast burden on the local grid where the charging station is located. The competitive relationship between EVs discussed above is not obvious when charging resources are sufficient. However, in specific rush hour periods or in circumstances of limited charging resources, the competition between EVs is

40

3 Emerging Technology for Distributed Energy Resources

particularly intense. Therefore, our goal is to guide the EV charging process to avoid congestion and excessive load on the power grid. The competition between EVs can be modeled as a congestion game, which is usually used to model noncooperative games between interactions of players who share resources. In noncooperative games, each player decides on which resources to utilize. Subsequently, the individual decisions of players result in resource allocation at the population scale. Resources which are highly utilized become congested so that the corresponding players incur higher losses. The characteristics of the EV charging process are thus highly consistent with congestion game research. So, we use the congestion game to analyze the EV charging process. To ensure that the analysis closely portrays reality, we studied the traffic patterns and distribution of charging stations in Tempe, AZ, where Arizona State University (ASU) is located. The transportation and power distribution network topologies near ASU are shown in Figs. 3.16 and 3.17. Notice that most charging stations are located near ASU. Therefore, congestion is an important factor that must be considered when a large number of EVs enter this area for charging. The traffic congestion in Fig. 3.16 is based upon Google Traffic at 4 pm in Tempe, AZ.

ASU

Legend:

Fast

Slow Freeway

Charging station City road

Fig. 3.16 Transportation system topology near Arizona State University

3.2 The Impact of Electric Vehicle Penetration

3

1

2

18

25

4

27

26

41

10 ASU

11

32

38

46

19

20 12

33

39

47 48

5

28

21

13

34

40

6

29

22

14

35

41

7

30

23

36

42

9

8 Legend:

45

37

31

15

16

24 Charging station

17

44

Power system bus

49

50

51

43

52

Distribution power line

Fig. 3.17 Power distribution network topology near Arizona State University

3.2.3.1

Numerical Results on Different EV Routes

We will illustrate the simulation and optimization process to study the characteristics of the EV charging station and route choice. As an example, we use a portion of the traffic topology shown in Fig. 3.16 to demonstrate the simulation. The traffic topology is shown in Fig. 3.18. There are three charging stations in the scenario, .cs1 , .cs2 , and .cs3 . Assume that each of the charging stations has only one charging spot in this case for the convenience of later analysis. The two EVs, .ev1 and .ev2 , have low batteries and can select one of the three charging stations to use. For .ev1 , its travel time toward the selected charging stations .csi (i ∈ {1, 2, 3}) is .t1fi . The travel time from the charging station .csi to its final destination .d1 is .t1ri . In the same way, the travel time of .ev2 toward the selected charging stations .csi is .t2fi . The travel time from the charging station .csi to its final destination .d2 is .t2ri . At the same time, we assume that each electric car has an equal charging time .tu = 30 min. Since there are several roads to reach the selected charging station and the EV’s final destination, we should first choose the correct road. To reduce the power consumption of the EVs, we choose the road that can reach the charging station and destination as quickly as possible. We calculate the driving time using Google Maps and obtain an optimal route. The selected roads are shown in Fig. 3.18. The specific values of the driving time are shown in Table 3.3.

42

3 Emerging Technology for Distributed Energy Resources

Fig. 3.18 Traffic topology used for simulation

ev2 ev1

.t1f1 .t2f1 .t1r1 .t2r1

Table 3.4 Time including traveling, charging, and waiting time

cs2

d2

d1

Slow Freeway

Legend: Fast

Table 3.3 Traveling time data for the selected route

cs1

= 12 min = 16 min = 13 min = 15 min

.t1f2 .t2f2 .t1r2 .t2r2

cs3

Charging station City road Preferred path for ev1 Preferred path for ev2

= 17 min = 11 min = 8 min = 10 min

.t1f3 .t2f3 .t1r3 .t2r3

= 21 min = 15 min = 9 min = 15 min

.ev2 .ev1

.cs1

.cs2

.cs3

.cs1

.55

.55

.55

.cs2 .cs3

87 min .55 61 min .60 61 min

51 min .79 51 min .60 51 min

60 min 60 min .84 60 min .55

Next, all the possible charging times, including traveling, charging, and waiting time, are estimated using the developed model in Sect. 3.2 and are shown in Table 3.4. Finally, we optimize the charging station selection using the congestion game that was introduced above. The results show that the Nash equilibrium solution to this problem is that EV .ev1 selects the charging station .cs1 and EV .ev2 selects the charging station .cs2 . In this way, an optimal charging strategy is obtained, at which the charging time of the two EVs reaches a good balance. The corresponding charging time for .ev1 and .ev2 is 55 and 51 min, respectively.

3.2 The Impact of Electric Vehicle Penetration Fig. 3.19 Effect of EV number on charging time

43

110

Charging time (min)

100 90 80 70 60 50 2

4 6 Number of the EVs

8

Charging time (min)

Fig. 3.20 Effect of charging spot number on charging time

Number of the charging spots per charging station

3.2.3.2

Numerical Results on EV Numbers and Charging Time

To illustrate the effect of the EV number on charging time, the number of EVs waiting to be charged has gradually increased from 2 to 8. The corresponding charging time is shown in Fig. 3.19. It can be seen that the median value of the charging time significantly increased from 53 to .82.5 min. The maximum charging time reaches 114 min, of which the waiting time is 54 min (Fig. 3.20).

44

3 Emerging Technology for Distributed Energy Resources

3.2.4 Conclusion It’s critical to adequately examine the impact of EV charging on the electric power infrastructure as attempts to electrify personal transportation continue. For such evaluation, we created a plan for charging station placement for the utility company side and chose the route of electric vehicles (EVs) for charging on the customer side. Specifically, our charging station placement analyzes the sensitivity to the costs of distribution expansion, EV station, and voltage regulation. We also evaluate the importance of convexification in the problem formulation phase. For the EV route evaluation, we study the EV charging strategy with experimental evaluation. The evaluation for the utility company and customers shows the importance for optimizing static infrastructure placement and dynamic EVs in the grid.

Chapter 4

Adapt Load Behavior as Technology Agnostic Solution

4.1 Consumer Segmentation For the demand side, we will investigate a household electricity segmentation methodology that uses an encoding system with a preprocessed load shape dictionary using California smart meter data. Structured approaches using features derived from the encoded data drive the five sample programs and the policy-relevant energy lifestyle segmentation strategies. We also ensure that the methodologies developed scale to large datasets.

4.1.1 Introduction The widespread deployment of advanced metering infrastructure (AMI) has made available concrete information about user consumption from smart meters. Household load shapes reveal significant differences among large groups of households in the magnitude and timing of their electricity consumption [121]. Hourly smart meter data offers a unique opportunity to understand a household’s energy use lifestyle. Further, this consumption lifestyle information has the potential to enhance targeting and tailoring of demand response (DR) and energy efficiency (EE) programs as well as improving energy reduction recommendations. According to the Federal Energy Regulatory Commission, DR is defined as: “Changes in electric use by end-use customers from their normal consumption patterns in response to changes in the price of electricity over time, or to incentive payments designed to induce lower electricity use at times of high wholesale market prices or when system reliability is jeopardized.” EE means using less power to perform the same tasks on a continuous basis or whenever that task is performed.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Xie et al., Data Science and Applications for Modern Power Systems, Power Electronics and Power Systems, https://doi.org/10.1007/978-3-031-29100-5_4

45

46

4 Adapt Load Behavior as Technology Agnostic Solution

In this section, an electricity customer segmentation methodology that uses an encoding system with a preprocessed load shape dictionary is examined. Energy consumers’ load shape information is then used to classify households according to extracted features such as entropy of shape code which measures the amount of variability in consumption. Load shape information enhances our ability to understand the individual as well as groups of consumers. For example, time of day building occupancy and energy-consuming activities can be interpreted from these shapes. In the proposed segmentation system, we use a structured approach that utilizes features derived from the encoded data to drive the segmentation. We also develop segmentation strategies that are aligned with specific application purposes, such as household targeting for EE programs or recommendations for the time of use shifts. In addition, we ensure that methods can readily scale to large datasets. We test our approach in a .220K household data sample for a large utility.

4.1.1.1

Prior Work

Much of the previous research on audience segmentation takes place in psychology, marketing, and communication. Almost all segmentations in those fields rely on surveys of individuals regarding their self-reported values, attitudes, knowledge, and behaviors [122, 123]. In the last decade, utility companies are increasingly using these psychographic segmentation strategies to support program targeting, recruitment message tailoring, and program design for DR and EE programs [124]. Rarely, however, is actual energy use part of the segmentation strategy [125, 126]. Recently, the widespread dissemination of electricity smart meters offers the opportunity to create segmentation strategies based on 15-minute, 30-minute, or hourly household energy use. Understanding a household’s time of day energy consumption, daily usage pattern stability over time, as well as the actual volume of energy use offers insights into household use of energy [121]. Further, these consumption features can be relevant to marketing and program design tasks. For example, high usage volume consumers or load shapes may signal the potential for certain energy efficiency messages, whereas household load shape stability may be more relevant for the time of use reduction messages. Existing literature on the analysis of smart meter data focuses on forecasting and load profiling, such as [127–129]. Some significant contributions in segmentation are [121, 130–134]. Self-organizing maps (SOM) and K-means are used to find load patterns in [132] and to present an electricity consumer characterization framework in [130]. A two-stage pattern recognition of load curves based on various clustering methods, including K-means, is described in [134]. Various clustering algorithms (hierarchical clustering, K-means, fuzzy K-means, SOM) are used to segment customers with similar consumption behavior in [131]. Similarly, [133] checks the capacity of SOM to filter, classify, and extract load patterns. As an alternative approach to distance-based clustering (K-means, SOM), [129] introduces a class of mixture models and random effects mixture models, with its own EM algorithm

4.1 Consumer Segmentation

47

to fit the mixture models. The current section proposes a different approach that decomposes the daily usage patterns into total daily usage and a normalized daily load shape. Representative load shapes are found utilizing adaptive K-means and summarized utilizing hierarchical clustering, so a stable encoding mechanism can be designed. Various metrics are computed based on the encoding we propose. The section also distinguishes from previous work by analyzing the results of applying the method to more than 66 million load shapes from a population of .220K residential consumers. This massive data analytics reveals various important features about the data, including that a consumers’ lifestyle is captured by their typical load shapes. We propose five different simple segmentation schemes and illustrate how these segmentation strategies can be selected for certain program development, pricing, and marketing purposes. We also test the proposed segmentation strategies, so they can be scaled to large databases by using a load shape dictionary. The robustness of these approaches is verified by contrasting dictionaries learned from different groups in the population.

4.1.2 Methodology The proposed methodology for usage-based segmentation consists of three stages, as shown in Fig. 4.1. The methodology relies on a simple decomposition of the load profiles. Given a daily consumption profile .l(t), we decompose it as .l(t) = a s(t), where a=

24 

.

t=1

Fig. 4.1 User segmentation flow

l(t) and s(t) =

l(t) . a

(4.1)

48

4 Adapt Load Behavior as Technology Agnostic Solution

Fig. 4.2 Daily consumption distribution at Zone 3 and Zone 13

a is the total daily consumption and .s(t) is the normalized load profile, which we denominate load shape. The first stage creates a dictionary for representative load shapes by modeling the distribution of a and clustering the load shapes .s(t) across the population. The second stage extracts proper dynamic features from the encoded data utilizing the preprocessed dictionary. The last stage performs a second level of clustering depending on segmentation criteria such as lifestyle or usage variability. The methodology is designed to scale to very large datasets.

4.1.2.1

Total Daily Consumption Characterization

The simplest characterization of total daily usage is accomplished by inferring the probability distribution of values across the population. The empirical distribution exhibits a long tail as shown in Fig. 4.2 for two climate zones. Irwin et al. [127] utilize a Weibull distribution to model this distribution for a small number of consumers. Instead, we find that a mixture of log-normal distributions best fits the actual data. The density function for a mixture with elements is given by f (a) =

n 

.

i=1

λi gi (a), gi (a) = 

1

e

−(log(1+a)−μi )2 2σi2

,

(4.2)

2π σi2

where .(μi , σi ) are the mean and standard deviation of each mixture element and .λi is the proportion of each element in the population. Figure 4.3 shows the fit for the population from one zip code during the summer. The data fit well with one-, two-, or three-element mixture depending on the climate zone. In particular, we find that dry and hot areas require two to three elements and cooler, coastal areas require a single element. The parametric model fits well for different zones and seasonality or timing choices (e.g., winter season or a specific day), implying that it is not limited by temporal or spatial locality.

4.1 Consumer Segmentation

49

Fig. 4.3 Mixture of log-normal distribution fitting on one zip code area for June–August 2011 data

4.1.2.2

Encoding System Based on a Preprocessed Dictionary

We focus on the normalized load shapes .s(t), of which many features can be extracted from load shapes. In DR programs, peak usage fraction, peak time, and peak duration can be important features to better control the demand at peak time. For EE programs, important information is featured, so that it can be used as proxy variables of the existence of specific appliances and their efficiency. For example, load sensitivity to temperature during the summer can be a proxy variable of air conditioner existence. Besides, many other features can be extracted from this raw usage data depending on the interests of possible programs. However, the data generated by sampling a large population hourly is enormous (e.g., for .220K household dataset, we have 66 MM load profiles), creating difficulties for any approach that segments consumers or investigates potential features to be extracted. To address this difficulty, we propose an encoding system using a preprocessed dictionary. The dictionary contains K representative load shapes .Ci (t). Every load shape in the data is mapped to the closest shape code. Load shape ∗ .s(t) is assigned to center .i (s) that minimizes the squared error:

50

4 Adapt Load Behavior as Technology Agnostic Solution

E(s, i) =

24 

.

(ci (t) − s(t))2 , .

(4.3)

i=1

i ∗ (s) = arg min E(s, i). i

(4.4)

The encoding procedure also records the minimum squared error .E(s, i ∗ (s)) for each encoded shape. The total energy a is characterized by its quantile according to the distribution .f (a) from the previous subsection. Various properties can be directly computed on the load shape dictionary. Note that given a load shape .snk (t) for day n for household k, we can identify a sequence of shape codes .Ci ∗ (snk ) , a sequence of total consumption values .ank , and the sequence of errors .E(snk , i ∗ (snk )). To reduce the notation burden, whenever possible, we omit the household index k. Since no prior dictionary exists, a procedure to uncover it is required. A good dictionary needs to have good coverage meaning every load shape in the data is sufficiently close to some representative shape. A good dictionary is also consistent, meaning that executing the learning procedure in different subsets of the population returns representative load shapes that are not too far from each other. The next subsection addresses this issue.

4.1.2.3

Adaptive K-Means on Normalized Data

Finding representative shapes that minimize the sum of the mean squared errors E(s, i ∗ (s)) over all shapes s is a standard clustering problem. The K-means algorithm is the most popular statistical clustering approach. To populate a dictionary for representative shapes, the K-means algorithm can be a good starting point as attempted in [121, 128–131], and [132]. However, the classical K-means algorithm needs to determine the number of clusters before running the algorithm. It is hard to decide on a proper K considering the number of different load shapes. It is also not appropriate to follow statistical methods that set “K” without proper reasoning that provides a basis for their adoption. Instead, we propose an adaptive K-means algorithm with a threshold to construct the shape dictionary [135]. The algorithm starts with a set of initialized cluster centers utilizing a standard K-means algorithm, with an initial .K = k0 . Adaptive Kmeans then adds additional cluster centers whenever a load shape .s(t) in the dataset violates the mean squared error threshold condition:

.

E(s, i ∗ (s)) =

.

2 

4t=1 (s(t) − Ci ∗ (s) (t))2 ,

(4.5)

where .θ is the threshold choice. The threshold provides flexibility to cope with various practitioners’ needs and control of the statistical properties of the load shapes in the same group. Since load shapes are normalized, each cluster center resulting from K-means is also normalized as they are the average of the member

4.1 Consumer Segmentation

51

Algorithm 1 Adaptive K-means algorithm based on threshold Require: Daily load shapes for all users sn (t), min. and max. number of clusters (min.k, max.k) Set K = min.k while 1 do Run K-means with the initial centers (if given) for all clusters do Check the threshold condition in (4) for all sn (t) belonging to it and count the number of clusters violating the condition Nv (Meaning: Any data assigned to a cluster is not farther from the cluster center than the given threshold (θ) proportion.) end for if Nv = 0 then return the clustering results and K else if K + Nv > max.k then return message: failure to converge. end if K = K + Nv for all clusters violating the threshold condition do Run K’-means with K’ = 2 end for Update the set of cluster centers, including all split clusters end while

shapes. This guarantees that distances on both sides of (4) are bounded, and it is easy to demonstrate the range .0 ≤ θ ≤ 2 is required for non-trivial solutions. The main differentiation of the proposed algorithm from previous approaches is that the threshold test is utilized to dynamically split clusters that do not satisfy the condition. Together with the normalization utilized in the load shapes, it results in more robust dictionaries and better properties for the algorithm. The detailed algorithm is shown in Algorithm 1.

4.1.2.4

Hierarchical Clustering

The resulting representative shape dictionary from K-means can be highly correlated as the adaptive K-means algorithm does not guarantee an optimal distance between cluster centers and instead meets a threshold .θ for every cluster. For interpretability and analysis, it is interesting to relax this condition for some clusters. We propose a simple hierarchical clustering algorithm to merge clusters whose centers are too close (Algorithm 2). The algorithm reduces the dictionary to a target size T by merging clusters. The weighted average is exactly the new cluster mean. It is important to understand the purpose of the two-stage clustering for generating the dictionary. If the dictionary size T is set directly, the performance is similar to classical K-means in a threshold condition violation perspective, which is addressed in Sect. 4.1.3.3. However, classical K-means doesn’t guarantee that every load shape is within a certain range of the cluster center. Adaptive K-means is needed to find proper K satisfying the desired threshold condition, except that,

52

4 Adapt Load Behavior as Technology Agnostic Solution

Algorithm 2 Hierarchical clustering Require: Adaptive K-means result (Ci : Cluster center (i = 1, . . . , K)/n: Size of i-th cluster) Set the target dictionary size, T (< K) while K > T do Find the closest two cluster centers, Ci and Cj Set Ci = (ni Ci + nj Cj )/(ni + nj ) and delete Cj K =K −1 end while

under this hard constraint, several small clusters can arise. Hierarchical clustering is utilized to filter and consolidate these small clusters to result in a small and stable dictionary that is meaningful in practice.

4.1.3 Experiments on Data 4.1.3.1

Description of Smart Meter Data

The data used in this section is provided by the Pacific Gas and Electric Company (PG&E). The data contains the electricity consumption of residential PG&E customers at 1-hour intervals. There are .218, 090 smart meters, and the total number of 24-hour load profiles is 66,434,179. The data corresponds to 520 different zip codes and covers climate Zones 1, 2, 3, 4, 11, 12, 13, and 16, according to the California Energy Commission (CEC) climate zone definition. The data ranges from April 2008 to October 2011. However, the data range is different depending on the type of smart meter used. For example, there are a small number of smart meters having the data before July 2010 and after August 2011. Only data from 123,150 households with data ranging from August 2010 to July 2011 (44,949,750 load profiles) in experiments comparing individual households. Otherwise, the whole dataset is utilized.

4.1.3.2

Dictionary Generation on Real Usage Data

It is noteworthy that the daily usage data whose sum is very small are ignored in populating a dictionary. This is because small usage patterns are usually irregular, and after scaling up by normalization, they perturb the overall representative shapes in the dictionary generation process. Moreover, typical demand management programs require at least some amount of usage to be applicable. After the dictionary is populated, small usage data can be assigned to another code or can be encoded by the dictionary because it is visible in the daily usage sum. Considering that the empirical total daily consumption is .18.68 kWh and the empirical 10% quantile is

4.1 Consumer Segmentation

53

Fig. 4.4 Relation between threshold choice and number of clusters applying Algorithm 1 Fig. 4.5 Example of adaptive K-means result with .θ = 0.2: normalized daily usage patterns (load shapes) and cluster centers

4.47 kWh, the load patterns with a total energy lower than 3 kWh (6% quantile) are ignored. We evaluate the performance of the adaptive K-means clustering-based dictionary construction. Figure 4.4 shows the relation between the threshold setting and the size of the dictionary for 1 year of data (.144, 147 load patterns) at a chosen zip code. The dictionary size increases with a decrease in the threshold size as expected. It is a good idea to set the threshold at 0.2 where the number of clusters is not large, yet the marginal gain in error improvement to the explanatory power is small but requires large numbers of cluster centers. These cluster centers do not enable a stable dictionary. The resulting typical clusters are shown in Fig. 4.5, with the cluster center shown as red circles. From Fig. 4.5, “# number” identifies the load shape code in the corresponding dictionary. Figure 4.6 summarizes cluster information by plotting all shapes assigned to a cluster. The clusters can be seen to be consistent. We can check the robustness of the dictionary by estimating the coverage of the constructed dictionary in other zip codes and weather zones. Table 4.1 shows that the dictionary has good coverage implying that the uncovered clusters possess a stable structure.

.

54

4 Adapt Load Behavior as Technology Agnostic Solution

Fig. 4.6 Example of adaptive K-means result with .θ = 0.2: heat map of normalized data under the same shape code Table 4.1 Dictionary coverage from Zone 13 Zone 13 Zone 13 coverage Zone 2 coverage A dictionary with .θ = 0.2 141,876 (or 143,915) load shapes 85,393 (or 88,771) load shapes populated (98.36%) (96.2%)

4.1.3.3

Dictionary Reduction via Hierarchical Clustering

The dictionary size for .θ = 0.2 is on the order of thousands of shapes, making it hard to interpret them and create meaningful metrics. The correlation between cluster centers reveals that some of the clusters are strongly connected, and the size distribution shows that many of them are small (Fig. 4.7). Hierarchical clustering (Algorithm 2) is applied so that the clusters merge by sacrificing quality by the least possible amount. The target number of clusters is set to .T = 1000 as it is the smallest dictionary size achieving the threshold condition violation on less than 5% of the sample load profiles. The quality of the reduced dictionary can be evaluated by encoding all the load shapes in the dataset. For each encoded load shape, we can compute the ratio .θˆ =

24

t=1 (s(t) − Ci ∗ (s) (t)) 24 2 t=1 (Ci ∗ (s) (t))

2

(4.6)

which would always be smaller than the threshold (.θ = 0.2) in the original adaptive K-means dictionary. Figure 4.8 shows the distribution of estimated thresholds. Only a small portion of load shapes (5.13%) are violating the threshold condition (.θ < 0.2). It is worth highlighting that a shape dictionary populated from one area covers 95% of all load shapes over all areas and periods, which means the representative load shapes are consistent regardless of spatial and temporal locality.

4.1 Consumer Segmentation

55

Fig. 4.7 Cluster correlation and size distribution Fig. 4.8 Estimated threshold distribution

Fig. 4.9 Distribution of − Ci ∗ (s) (t))

.σ (s(t)

Another important point to consider is whether a given consumer and the deviations of all his daily consumptions are close to the clustered shapes. Figure 4.9 displays the distribution of the standard deviation of the residuals for each household, defined as the deviations from each of their daily shapes. Notice that, at any given hour, the error in cluster representation is up to 7% of the total daily (assuming .3σ bound).

56

4 Adapt Load Behavior as Technology Agnostic Solution # of load shapes and usage pattern coverage

99% 95% 90%

0

Covered usage patterns (%) 3 4 5 1 2

6

Covered usage patterns (%) 40 80 100 20 60

Covered usage patterns (%) by each shape in the dictionary

0

400 200 600 800 Sorted dictionary index

1000

0

200

400 600 800 # of load shapes

1000

Fig. 4.10 Covered usage patterns and number of load shapes

Additional statistics on coverage are provided by empirical distribution and cumulative distribution of cluster size for the whole population shown in Fig. 4.10. Notice that 90% of the whole data (66 MM load shapes) is covered by 272 representative load shapes (cluster centers). This enormous reduction in representation enables a principled analysis of household lifestyle based on load shapes, which we introduce later.

4.1.3.4

Load Shape Analysis

Basic observations on the encoded load shapes from the population data are described. Figure 4.11 shows the 16 most frequent load shapes, which account for more than 1% of the load shapes in the entire encoded data with the final dictionary. Notice that most of the high usage happens in the late afternoon or evening, which represents the lifestyle of usual households. A sample lifestyle may be that households leave home in the morning, return home after school or work, and consume electricity till they sleep. Notice that many shapes can be differentiated by the timing of peak consumption, indicating this might be a good variable to design programs around. We can compare the number of households that have the top 16 load shapes among their top 5 load patterns to confirm that the top 16 patterns represent a population and not just a small set of consumers. Figure 4.12 shows that, on average, each top 16 load shape appears in the top 5 of 15.3% of households.

4.1 Consumer Segmentation

57

Fig. 4.11 Sixteen most frequent load shapes of the entire households

Fig. 4.12 Number of users who have top 16 load shapes within their top 5 load patterns

4.1.4 Segmentation Analysis Based on the load shape encoding developed in the section, various analyses are developed.

58

4.1.4.1

4 Adapt Load Behavior as Technology Agnostic Solution

Entropy Analysis

Typical analysis of load shapes for households has focused on average load shapes. Yet, two households with identical average load shapes could have significantly different daily load shapes. In fact, this variability is a very important factor in program targeting and customer engagement. For example, it could potentially be easier to target DR to a more stable household that consumes the same load shape everyday than one that is highly variable. On the other hand, it might be better to target behavioral modification and energy efficiency programs to households with a more diverse set of behaviors. Figure 4.13 displays the histogram and cumulative distribution of the number of (encoded) load shapes observed per household with a full calendar year of data (365 days). It clearly shows that there are households that follow a limited set of load shapes and a more variable set of households. Since the time horizon in our present analysis is 365 days, the maximum number of load shapes for a customer is 365 (although the dictionary has .T = 1000 codes). Figure 4.13 shows that the most variable household has 285 load shapes, and 45% of all households have less than 100 shapes a year. We propose utilizing the notion of entropy to create a metric that captures customer variability. For each household n, we record the relative frequency .pn (Ci ) of each encoding cluster center .Ci in his daily series. Then, the entropy of household n is given by Sn = −

K 

.

p(Ci ) log p(Ci ).

(4.7)

i=1

The entropy is highest if all the cluster centers are equally alike in the dataset (i.e., pn (Ci ) = 1/K) and lowest (i.e., .Sn = 0) if the household follows a single cluster center. We can also compute the entropy of households for each day of the week by computing the relative frequency of codes on each day of the week separately. Figure 4.14 displays the results. The weekday distribution does not differ much from the overall distribution. The average entropy during weekends is higher than during

.

Fig. 4.13 Number of load shapes per user

4.1 Consumer Segmentation

59

Fig. 4.14 Load shape entropy distribution

weekdays, which is reasonable because households have a more regular lifestyle on weekdays. Households can be segmented by their positions in the distributions in the top left plot of Fig. 4.14. For example, if a household’s entropy has a quantile above 75%, it can be classified as a variable household, and if it has a quantile below 25%, it can be classified as a stable household. Also notice there is a group of load shapes with very low entropy (6480 households of entropy between 1.3 and 2). Figure 4.15

60

4 Adapt Load Behavior as Technology Agnostic Solution

Fig. 4.15 Most frequent load shapes in the low entropy group

shows the four most frequent load shapes accounting for 88% of the load shapes in the group. The average daily usage of this group is .8.13 kWh which is much lower than the average daily usage (19 kWh) of the entire data provided. This group has smaller homes that are empty during the day and can either correspond to a lifestyle or to empty homes.

4.1.4.2

Shape Analysis

Wong and Rajagopal [121] suggest six representative average load shapes for households: “evening,” “night,” “afternoon,” “morning,” “daytime,” and “dual peak” segments. We develop the methodology further here. The encoding dictionary is reduced further by hierarchical clustering with .T = 100. Figure 4.16 shows 20 most frequent load shapes for all households. The top 20 load shapes provide coverage of 87.5% of all shapes, and each has a frequency higher than 1%. These top 20 shapes can be segmented according to the timing of consumption: Morning peak (M: 4:00–10:00): Load shapes (#9, 11, 13, 14) belong to this segment. Except #11, the load shapes have a relatively low daily average consumption of up to 18.68 kWh, as well as small peak values. The main difference in the four load shapes is the peak time (6 am, 9 am, 8 am, 10 am in corresponding order). This segment has a low potential for targeting DR programs. Daytime peak (D: 10:00–16:00): Load shapes (#7, 10, 17) belong to this segment. These load shapes have relatively high average usages (above 20 kWh). Evening peak (E: 16:00–22:00): Load shapes (#1, 3, 5, 6, 16) are included in this segment. This segment explains the high number of load shapes (about 40%) among all segments. Load shape #6 has a very large average usage (.33.42 kwh) with peak time (4–6 pm). This segment can be a potentially significant target for DR programs. Night peak (N: 0:00–4:00, 22:00–24:00): Load shapes (#4, 8, 12, 19) belong to this segment. Except for #12, these load shapes show low average usage. Specifically, #4 has a very low average total daily usage (8.45 kWh). This is similar to the low entropy load shapes in Fig. 4.15. This segment has a low potential for targeting DR programs.

4.1 Consumer Segmentation

61

Fig. 4.16 Twenty most frequent load shapes of the entire households using the dictionary of = 100

.size

Dual peak morning and evening (Du M&E): Load shapes (#2, 15) are in this segment. It may be possible to say #9 and 13 have weak dual peaks. The load shapes in this segment have average usage a little below the empirical average, and the morning peak is the primary peak for these load shapes except #2. Dual peak evening and night (Du E&N): Load shape #18 represents this segment. A sample lifestyle in this segment would be cooking dinner with electric appliances, taking a rest, and then using a computer or some electronic device before bedtime. Dual peak daytime and evening (Du D&E): Load shape #20 has two peaks at 4 and 10 pm and borders between “evening and night peak” and “daytime and evening peak.” If #20 shifts 1 hour to the right, then it is very similar to #18, which has two peaks at 5 and 11 pm. So, a sample lifestyle can be similar to the last load shape with a 1-hour shift. We can define three additional segments in the dual peak shapes: morning and daytime (M&D), morning and night (M&N), and daytime and night (D&N). However, these segments contain less than 0.1% of all load shapes. Figure 4.17

62

4 Adapt Load Behavior as Technology Agnostic Solution

Fig. 4.17 Characteristics of seven load shape segments

Fig. 4.18 Number of load shape segments per user

plots aggregate statistics of all the segments, including the % of load shapes and the % of the total energy usage. Daytime peak and evening peak segments can be the high potentials for targeting of DR programs, while morning peak and night peak segments have low potential. We can encode the daily household shapes relying on the ten segments defined above. The number of distribution segments each household belongs to is shown in Fig. 4.18. Notice that most households have load shapes that belong to seven segments. 87.4% of households can be explained by the seven load shape segments in Fig. 4.17. We recompute the user entropy to see the effects of relative rates of each segment, obtaining the result in Fig. 4.19. The average entropy is 1.76, which means the typical household load shapes belong to .3.38 segments throughout the year. There are also significant numbers of households that have low entropy.

4.1 Consumer Segmentation

63

Fig. 4.19 Load shape segment entropy distribution

4.1.4.3

Multidimensional Segmentation

In this subsection, we show how to segment the households using a combination of multiple clustering criteria. The segmentation based on consumption timing developed in the previous subsection indicates that good subsets of the household populations can apply for different programs, such as DR. We select all users with at least one load profile in the desired segment. Yet, that segmentation does not include two important dimensions that need to be considered: quantity and variability. Section 4.1.2 shows that a mixture of log-normal distributions fit well with the daily consumption distribution. Using the fitted distribution, the quantile of daily consumption can be calculated for each day. Then, using the average quantile, a household can be assigned a group: heavy, light, or moderate. Variability in consumption can be captured by computing the entropy of each household and classified into stable, moderate, and variable, according to the entropy. Figure 4.20 displays a scatterplot, where each point corresponds to a household’s entropy and average usage quantile. Based on this plot, nine classes of households are created. The average quantile consumption is divided into three groups since for this subset of households, the mixture model had two distributions, which naturally expresses three classes: light (for those households whose consumption is mostly drawn from the first mixture), moderate (for those households whose consumption is drawn from either mixture), and heavy (for households whose consumption is mostly drawn from the second mixture). For example, to target the households for an automated DR program, the focus can be on heavy and stable users in the appropriate time-based segment (e.g., daytime peak). Figure 4.21 shows the four most frequent load shapes among the heavy and stable households corresponding to Fig. 4.20. Those four load shapes explain about half (47%) of all usage patterns in the filtered users. The first two load shapes could be very good candidates for DR since they have large relative peaks. Energy efficiency programs, on the other hand, would target households that are more variable and, thus, exhibiting behavior choices that could be induced via

64

4 Adapt Load Behavior as Technology Agnostic Solution

Fig. 4.20 Combination of usage volume and shape variability

Fig. 4.21 Most frequent load shapes in the filtered users Table 4.2 The number of households in Fig. 4.20 Heavy Moderate Light Total

Stable 79 (10.2%) 73 (9.4%) 40 (5.1%) 192 (24.7%)

Moderate 103 (13.2%) 187 (24.12%) 100 (12.9%) 390 (50.2%)

Variable 38 (4.9%) 106 (13.6%) 51 (6.6%) 195 (25.1%)

Total 220 (28.3%) 366 (47.1%) 191 (24.6%) 777 (100%)

different forms of interventions. Potentially, the analyst could focus on heavy or moderate users in the variable class. Table 4.2 summarizes the number of users in different classes and shows that filtering can significantly reduce the number of households that require a deeper and potentially much more time-intensive analysis.

4.1.4.4

Spatial Locality Analysis

EE and DR programs are managed by assuming that household consumption patterns are affected by climate and other factors decided by spatial locality. In fact, the usual practice is to focus on DR and EE programs in particular zip codes or climate zones. In this subsection, we validate whether consumption patterns are indeed influenced by such locality. The total daily consumption is clearly influenced, as coastal and cool climate zones only exhibit a single mixture component with a

4.1 Consumer Segmentation Table 4.3 Comparing load sharp frequencies among groups

65 t-test to check .P (Ci |conditionA) = P (Ci |conditionB) : sample size satisfying condition A .N2 : sample size satisfying condition B # of Ci among N1 ¯ N2 .X¯1 : . , .X2 : . # of Ci Namong N1 2 2 2 .S = X¯1 (1 − X¯1 ), .S = X¯2 (1 − X¯2 ) .N1

1

.T

=



2 ¯ ¯ X1 −X2

1 N1

+ N1

2

(N1 −1)S12 +(N2 −1)S22 N1 +N2 −2

, .d.f. = N1 + N2 − 2

1) .T < t0.025 : P (Ci |conditionA) < P (Ci |conditionB) 2) .T > t0.975 : P (Ci |conditionA) > P (Ci |conditionB) 3) .Otherwise : P (Ci |conditionA) = P (Ci |conditionB)

Fig. 4.22 More frequent load shapes in Zone 3

lower mean during the summer, while inland and hot climate zones have mixtures with at least two components. The higher component corresponds to cooling energy consumption during summer. It is also interesting to examine whether load shapes exhibit locality effects as well. For example, we compare Zone 3 (a cool coastline area) and Zone 12 (a hot inland area). The frequency of each load shape is compared in two zones. Since the frequency is an estimate of true frequencies of load shapes, we utilize a two-sample t-test (Table 4.3). The assumption is that load shapes are drawn independently from a multinomial distribution. Table 4.3 shows that the two zones have different frequencies on 80% of load shapes and similar frequencies on the remaining 20%. More detailed comparisons can be drawn by looking at the frequent load shapes in both zones. Figures 4.22 and 4.23 show that Zone 3 has more frequent load shapes with late evening or night peaks and moderate consumption, while Zone 12 has afternoon or early evening peak shapes with heavy consumption. Zone 3 has a relatively mild climate and uses electricity primarily on heating at night. Zone 12 may have many customers using air conditioners during the afternoon and early evening due to its hot climate. Figure 4.24 shows no common load shape that covers more than 0.8% of shapes for both zones, confirming both zones have different load shape distributions.

66

4 Adapt Load Behavior as Technology Agnostic Solution

Fig. 4.23 More frequent load shapes in Zone 12

Fig. 4.24 Common load shapes in both zones Table 4.4 T-test result

.P (Ci |Zone3)

> P (Ci |Zone12) < P (Ci |Zone12) .P (Ci |Zone3) = P (Ci |Zone12) Total

272 532 196 1000

.P (Ci |W eekdays)

322 496 182 1000

.P (Ci |Zone3)

Table 4.5 T-test result

> P (Ci |W eekends) < P (Ci |W eekends) .P (Ci |W eekdays) = P (Ci |W eekends) Total .P (Ci |W eekdays)

4.1.4.5

Temporal Locality Analysis

It is interesting to analyze whether there exists temporal locality in load shape choice. For example, we can compare load shape frequency distributions between weekdays and weekends. The two-sample t-test is also used. Table 4.5 shows that 82% of load shapes are distributed distinctly among both groups.

4.1 Consumer Segmentation

67

Fig. 4.25 More frequent load shapes on weekdays

Fig. 4.26 More frequent load shapes on weekends

Figure 4.27 shows that no common load shape explains more than .0.6% for weekdays and weekends at the same time. Figure 4.25 confirms load shapes corresponding to regular working lifestyles (#.644, 742, 869) are frequent during weekdays. Figures 4.25 and 4.26 show that morning peak, night peak, and dual peak (morning and evening) load shapes happen more on weekdays with moderate consumption, while there are more daytime and afternoon peak shapes on weekends with heavy consumption in three load shapes (#.811, 844, 868). Possibly households are empty during the daytime and afternoons on weekdays and consume actively during the morning or evening. During weekends, households can start a day a bit later than weekdays like #527 or stay home with various activities requiring substantial electricity consumption (e.g., cooking, doing the laundry, or cooling) in #.811, 844, and 868. Winters and summers have different load shape choices as well, but we omit the analysis which is performed in a similar manner. In the case of an actual region-based DR program design process, this temporal locality analysis can be more detailed by conditioning specific times and combining with spatial locality investigation.

68

4 Adapt Load Behavior as Technology Agnostic Solution

Fig. 4.27 Common load shapes on weekdays and weekends

4.1.5 Impacts on Load Forecasting The methodology can be used to drive improvements in peak load forecasting for a power system zone. The key observation is from the analysis in Fig. 4.17. If predicting total peak load for a particular hour, only a subset from the set of households that are in a relevant class can influence such a forecast. Therefore, additional information collected about such households could significantly increase the prediction accuracy. Moreover, the proposed approach can inform load forecasting about individual households. Such forecasting is important for the design of micro-grids and intelligent distribution systems. The methodology suggests that different consumer classes might require different forecasting approaches. In particular, Figs. 4.19 and 4.20 show that households can be classified according to entropy. Low entropy consumers are easier to forecast at an individual level, and high entropy consumers are harder to forecast since they have significantly more variability. Moreover, in analyzing the performance of forecasting, it is important to distinguish the differences between the various classes. Our method could also drive algorithms for load or load shape forecasting for individuals. After the encoding procedure, each household would have a sequence of load shape codes and another for daily consumption. Load shape can be forecasted using various Markov chain-type methods or advanced classification algorithms after reducing the size of the load shape dictionary. With those results, any daily consumption prediction method can be merged to forecast the load at a specific time.

4.1.6 Conclusion and Future Work The implications for the methods described here have implications for utility policy and programs such as DR and EE. Using customers’ load shape profiles, we can

4.2 Consumer Targeting

69

effectively target residents who have the highest potential for benefiting from DR programs. Load shape-based high potential targeting can have significant benefits: increased likelihood of success, energy-savings, and public relations benefits from successful engagement in utility programs. Load shape-based energy use profiles that incorporate the level of use and entropy offer other potential benefits. For example, recommendations for energy reduction or critical peak pricing that are “lifestyle” based would be very different from the appliance- and device-based recommendations currently used by most utilities. Lifestyle recommendations include focusing on shapes such as morning and afternoon or only afternoon peaks and suggesting that customers move activities earlier or later in the day. Since it is rare that a single load shape represents a lifestyle, lower energy or off-peak load shapes within a household repertoire of shapes also could be recommended as a means of energy reduction and savings. Beyond load shape segmentation, the extent of entropy within a household could yield a further understanding of the potential success for targeting and design recommendation. For example, high entropy households, indicating variability in occupancy and energy using activity, may have a low potential for targeting for DR programs but high potential for energy reduction programs such as appliance rebates. Future experimental research is needed to validate the targeting and energysaving potential of the segmentation methods.

4.2 Consumer Targeting After knowing different customer behaviors for potential gains, we need to select customers for demand response (DR) programs which can be challenging, while existing methodologies are hard to scale and poor in performance. The existing methods are limited by a lack of temporal consumption information at the individual customer level. We propose a scalable methodology for DR program targeting utilizing novel data available from individual-level smart meters. The approach relies on formulating the problem as a stochastic knapsack problem involving predicted customer responses. A novel and efficient approximation algorithm is developed, so it can scale to problems involving millions of customers. The methodology is tested experimentally using real smart meter data in more than 58k residential households.

4.2.1 Introduction Demand response (DR) programs have been expanded to match peak load growth and increases in supply-side uncertainty. The goal of a DR program is to elicit flexibility from loads by reducing or shifting consumption in response to external

70

4 Adapt Load Behavior as Technology Agnostic Solution

signals such as prices or curtailment indicators. Typically, a program is designed to extract a targeted level of energy or power from all the participating loads. The program operation yield is the ratio between the extracted energy from the participants and the target level. Current program yields are low, in the range of 10–30%. Customer recruitment is essential for the success of DR programs. Existing customer recruitment mechanisms are designed to identify customers who are likely to enroll. They utilize predictive marketing models (e.g., discrete choice) that utilize as inputs household-related variables such as family income, enrollment in other programs, and household size. From the experimental results [136], the recruitment rates for time-based rate programs are much different depending on the types of offers, while an opt-in offer shows only an 11% recruitment rate and an opt-out offer has 84%. Yet, the lack of individual-level consumption information has prevented these recruitment efforts from achieving high operation yields. Besides yield, a DR program needs to ensure reliable performance. Customer enrollment directly influences reliability as each consumer might provide different levels of certainty on the demand curtailment they can offer during a DR event. The widespread deployment of advanced metering infrastructure data can significantly change the approach to customer recruitment. This section investigates how to target customers for DR programs based on this data. The significant recruitment cost and scalable progressive deployment of DR programs require an efficient way to select a small number of appropriate customers from a large population. Since enrollment decisions are made in advance of the actual consumption period, only a prediction of per customer DR potential is available. The prediction can be made by analyzing historical high-resolution consumption data for each customer. Given a prediction, customers need to be chosen in a way that balances the magnitude of DR potential (reward) with the uncertainty in the prediction (risk). In fact, customers need to be viewed as a portfolio, with an optimal trade-off between risk and reward being desirable. This section develops a methodology for large-scale targeting that combines data analytics and a scalable selection procedure. The contribution of this section can be summarized in three parts: • Formulation of the customer selection problem as a stochastic discrete optimization program • Novel guaranteed approximation algorithm for this class of discrete optimization problems • Experiments using real smart meter data, more than 58k residential households The literature on DR programs is vast (see [137–139]) and includes the definitions of DR programs [140], how to estimate the energy-savings (baselining) [141], and other related areas. Recent papers have focused on designs for operating DR, including distributed pricing mechanisms that adapt to data [142–145], simplified operation mechanisms with full information [141, 146], and operations with partial information [147]. Integration of DR as system reliability has also been well investigated in [148]. Finally, implementation in the hardware of these solutions

4.2 Consumer Targeting

71

is being developed in [149]. Some of the previous studies above already utilized smart meter data [138, 141, 149]. However, customer recruiting for many DR programs still relies on segmentation of customers based on their monthly billing data or surveys [124, 150]. The growing availability of smart meter data has shown that such approaches are highly inaccurate since segments obtained from actual consumption differ from those obtained by alternative methods [151, 152]. This section is organized as follows. Section 4.2.2 provides the problem description, including a stochastic knapsack problem (SKP) setting and response modeling. Section 4.2.3 presents a review of SKP and develops a novel heuristic algorithm to solve the proposed SKP. Section 4.2.4 presents the experimental evaluation and validation of the methodology. Section 4.2.5 summarizes the conclusion and discusses future work.

4.2.2 Methodology Given a large amount of data, the scalability of the approach is very important. Thus, we propose a quick linear response modeling and a novel heuristic to solve the SKP (4.8), which is basically an NP-hard problem. Figure 4.28 shows the overall DR program targeting flow in this section with brief computation complexity information. The remainder of the section details the methodology. It first proposes a reliability-constrained selection algorithm that uses a probabilistic estimate of DR response. It then demonstrates how such a response can be learned from existing smart meter data. Notice that the specific approach to predict individual responses for different types of DR would differ, but each still provides a probabilistic estimate. For completeness, we illustrate the procedure for thermal DR.

4.2.2.1

Maximizing Demand Response Reliability

There are K potential customers that can provide DR service. Given the customer recorded data, the response of customer k is a random variable .rk corresponding to the energy saved during a DR event. The distribution of .rk is determined by fitting a response model corresponding to the type of DR and has a known joint probability Fig. 4.28 Overall DR program targeting flow

72

4 Adapt Load Behavior as Technology Agnostic Solution

distribution. The cost for customer k to participate in the program is .ck . During planning, this cost represents the cost of marketing the program to a customer and rebates for a customer to purchase the resources to perform DR. The program operator has a budget C and desires an aggregate target response T (in kWh) from the program with the maximum reliability possible. DR availability is captured by the control variable T . DR reliability is naturally captured by the probability of the response exceeding the target T . The optimal DR program selection problem can then be stated as K   . max P rk xk ≥ T x

s.t.

k=1 K 



ck xk ≤ C, ↔

k=1

K 

 xk ≤ N, if ck is same

k=1

xk ∈ {0, 1}, k = 1, . . . , K

(4.8)

where .xk represents whether a customer is recruited or not. Note that, if ck is the same for all customers, the budget constraint is the same as limiting the number of participating customers by N. The program maximizes the reliability of the DR program by recruiting customers within the program budget. The optimal reliability for budget C and target T is given by the objective function value .p∗ (C, T ). The function captures the trade-off between DR availability and DR reliability for a budget C. The function has some important properties that conform to our intuition about the trade-off. The objective function is monotonically decreasing in T , so ∗ ∗ .p (C, T1 ) ≤ p (C, T2 ) if .T1 ≥ T2 . The budget determines the constraints, so ∗ ∗ .p (C1 , T ) ≥ p (C2 , T ) if .C1 ≥ C2 . The proposed optimization problem is an SKP. SKPs are stochastic integer programs known to be NP-hard. The goal of this section is to develop a novel efficient approximation algorithm that scales to K in millions of customers. The efficient algorithm is used then to compute the function .p∗ (C, T )). An important additional assumption is that K is large, and C is sufficiently large, so a significant number of customers are included. In that case, given a set of random variables for the response .rk , the total group response is approximately Gaussian distributed due to the central limit theorem K  .

rk xk ∼ N(μT x, xT



x)

(4.9)

k=1

where .x is the recruitment vector, N indicates a normal distribution, .μ is a vector of individual response means .μk , and . is a covariance matrix with covariances . j k between responses. In practice, if the number of customers selected is in the order of 50, the response distribution is very close to normal.

4.2 Consumer Targeting

73

Note that we cannot guarantee ith customer’s DR program enrollment though xi = 1inx. Thus, this optimization problem in (4.9) will provide the upper limit of DR program performance from a technical perspective on the assumption that all the selected customers will be recruited successfully. Even though the recruitment rate is low, this problem setting can easily extend to a model1 that includes a recruitment probability .pk learnt as a discrete choice model. Another interpretation of this problem can be efficiently selecting good customer candidates to be recruited.

.

4.2.2.2

Response Modeling

Estimating the response .rk from provided data is an important challenge in practical applications. We illustrate here a model for estimating customer response in a specific program, but we note that our methodology applies in general, with the ability to define general models. A highlight of the methodology is that .rk is a random variable, so models that are not accurate can still be useful. The customer response model specification depends on the design of the DR program. Consider a global temperature adjustment program for heating, ventilation, and air conditioning (HVAC) systems [137]. Such a program increases the temperature setpoint of the air conditioner for each customer by a fixed amount to reduce cooling power consumption. Selecting customers with high energysaving potential during a DR event day and hour requires an accurate model to estimate the total energy consumed at each setpoint level. If HVAC consumption is independently metered, a simple model can be built utilizing the observed consumption, the external temperature, and the utilized setpoint [153]. In general, though, only total home consumption and zip-code-level external temperature are observed. The main proportion of temperature-sensitive energy consumption is from HVAC systems [137]. It has been observed that the relationship between energy consumption and outside temperature is piecewise linear, with two or more breakpoints [141, 154–156]. These descriptive models are typically fit utilizing past consumption data, especially for the summer season. The power consumption of a customer k at time t on day d, .lk (t, d), is modeled as lk (t, d) = ak (t)(T ok (t, d) − T rk )+

.

+ bk (t)(T rk − T ok (t, d))+ + ck (t) − ek (t)

(4.10)

where .T ok (t, d) is the outside temperature, .T rk is the breakpoint temperature which is the proxy of the temperature setpoint for the HVAC system, .ak (t) is the cooling sensitivity, .bk (t) is the heating sensitivity or the temperature sensitivity before turning on the AC system, .ck (t) is the baseload, and .ek (t) is a random variability. Typically, in the summer, .bk (t) is close to zero. In some cases, .bk (t) ≈ ak (t) since

1 We

can replace .rk with .r  k = rk pk and change (4.9) accordingly.

74

4 Adapt Load Behavior as Technology Agnostic Solution

either the variability is large or only large temperatures are observed in the summer; thus, a reliable estimate of .bk (t) cannot be obtained. The model above is closely related to the equivalent thermal parameter model [153], assuming that the system is in thermal equilibrium, so inner mass temperature and outside temperature are equal. Note that this linear model in (4.10) may be less accurate and generate more variance in predicting HVAC consumption compared to other physics-based models solving differential equations [153, 157, 158] with proper data (e.g., indoor temperature, heater capacity, thermal resistances, and volume of the house). However, these physics-based models cannot be fit with the limited data (described in Sect. 4.2.4.1). Moreover, these physics-based models are not scalable with a large number of customers due to computational cost. Thus, (4.10) is chosen following the previous studies [141, 154–156]. Also, the proposed stochastic optimization problem, the main contribution of this section, can be solved with any estimates for HVAC consumption and variance if better estimates are available. To restrict the computational cost to find the best breakpoint, .T rk is assumed to be an integer, in the range .68 − 86 ◦ F(20 − 30 ◦ C), which is typical. Additionally, to prevent the cases that one or two dominant data points determine the breakpoint and provide invalid temperature sensitivity, we put a constraint on the breakpoint that both sides of the breakpoint should have at least a certain fraction of data (e.g., we set 15% in this section to have at least ten samples at both sides of the breakpoint). Model learning is performed in two steps. Minimization of residual sum of squares (RSSs) is utilized to learn the parameters of the model .T rk , .ak (t), .bk (t), and .ck (t), and the distribution of the error .ek (t) from the observed data .lk (t, d) and .T ok (t, d). An F-test [159]2 is utilized to prevent overfitting, which tests whether there is a breakpoint. Without a breakpoint, the model will be (4.11) with a different constant bias, .ckr (t) lk (t, d) = ak (t)T ok (t, d) + ck (t) + ek (t).

.

(4.11)

The overall computation needed to fit the consumption model is to solve (at most) 20 linear regression models: one for each potential value of the breakpoint (at most 19) and one for fitting (4.11). The coefficients associated with the breakpoint with the smallest RSS are selected as the estimate. The regression provides estimates of parameter values and errors, in particular an estimate of cooling sensitivity .aˆ k (t) and the standard deviation of parameter error denoted by .σ (aˆ k (t)). The distribution of the parameter estimate might not necessarily be Gaussian. Covariances .COV(a ˆ j (t), aˆ k (t)) between sensitivity estimates for different customers can be obtained by analyzing the residuals. The DR response model is postulated as follows. The savings in the response are obtained by increasing the setpoint temperature by .T rk . We assume DR is utilized during very hot days, so .T ok (t, d) ≥ T rk + T rk . The response

2 As

(4.11) is a nested model of (4.10), F-test can be used.

4.2 Consumer Targeting

75

model is then postulated to be .rk = ak (t)T rk kWh. The random variable .rk has  mean .μk = aˆ k (t)T rk and standard deviation .σk = ˆ k (t)T kk = σ (a  rk ). The covariance between any two responses .rj and .rk is given by . j k = COV(aˆ j (t), aˆ k (t))T rj T rk . The aggregate DR response is then distributed as in 4.9.

4.2.3 Algorithm 4.2.3.1

Optimization Problem Transformation

Utilizing the normality assumption of the total response in the original SKP formula (4.8), the following equivalence is easy to demonstrate:  ∗

ρ = max P

.

C

K 

 rk xk ≥ T

k=1

T − μT x ↔ ρ ∗ = min √ . C xx

(4.12)

The set .C represents the constraint set of the original SKP. The equivalency reveals some qualitative insights about the optimal solution. Consider the case .ck = 1, so the budget .C is the number of customers. If .C is small, then it’s likely that the target level T exceeds the group response, so .ρ ∗ ≥ 0. The mechanism mostly selects customers with high mean, high variance responses resulting in low reliability. If .C is sufficiently large, then .ρ ∗ ≤ 0, and the mechanism selects high mean, low variance customer responses. Consequently, DR reliability is high. A simple rule of thumb is that .C should usually be set so that the sum of the .C highest response means .μk exceeds T .

4.2.3.2

Previous Approaches to Solve the SKP

Several approaches address solving the transformed SKP in ((4.12)) [160, 161]. The basic principle is to utilize Lemma 4.1, which proved independently for completeness in [162]. Lemma 4.1 The customer selection SKP (4.12) is equivalent to the optimization program

T −μ min √ μ,σ 2 σ2

s.t. μ, σ 2 ∈ H .

where the constraint set is defined as

(4.13)

76

4 Adapt Load Behavior as Technology Agnostic Solution



H = Conv{ μ, σ 2 : x ∈ HX , μ = μT x, σ 2 = xT x}

.

HX = {x : x ∈ 0, 1K , cT x ≤ C, μT x ≥ T , xT x ≥ 0} under the assumption that .μ∗ ≥ T . Furthermore, there is a map from the optimum ∗ ∗2 ) to .x ∗ . Additionally, there exists .0 ≤ λ ≤ 1, so the optimization .(μ , σ .

max λμ − (1 − λ)σ 2 μ,σ 2

(4.14)

s.t.(μ, σ 2 ) ∈ H has the same optimum .(μ∗ , σ ∗2 ) as optimization (4.13). To find the extreme points in .H, (4.14) is solved multiple times for various values of .λ. When .λ is given, the calculation can be done by dynamic programming. However, the overall procedure will be determined mainly by selecting a small number of various values of .λ to find all extreme points. Setting various .λ is related to nonlinear optimization techniques in [161]. According to [163], the expected complexity of finding the set of various .λ is .O((K CN )2) (where K is the number of customers and N is the limit of the number of participating customers), which requires heavy computation costs when K is large. Alternatively, [164] does not try to find extreme points in H but rather changes the SKP into multiple knapsack problems (KPs)  on the assumption that all the 2 .σ s are integers and all users are independent (. is diagonal). With these strong k assumptions, this method is quite computationally efficient. The complexity is 2 2 .O(KN2 max(σ ) ) as it requires .O(N 2 max(σ ) ) times of finding the maximum k k value among K values. A more straightforward approach is to extend the concept of a greedy approximation algorithm utilized to solve specific instances of the KP [165, 166]. The proposed algorithm is described in [162] as Algorithm 4 and has complexity .O(K log K). It utilizes the per customer risk-reward ratio .μk /σk to sort and rank customers who are offered sufficient benefit. This constraint improves the performance of the algorithm, as explained in [162].

4.2.3.3

Stochastic Knapsack Problem-Solving

We propose an efficient heuristic algorithm (Algorithm 3) to solve the SKP by finding the extreme points in .H. By the monotonicity property shown in the proof of Lemma 4.1, the maximum should be obtained from one of the extreme points found by solving (4.14) with various .λ. Every extreme point of .H has a corresponding point .x ∈ HX which can be obtained by solving

4.2 Consumer Targeting

77

Fig. 4.29 Idea of the heuristic algorithm



λ . max λ μ x − x x, λ = ≥0 . 1−λ x∈HX  T

T



(4.15)

Algorithm 3 Algorithm to solve the SKP in (4.8) Require: μ and  from response modeling. Set a integer constant, M ( = the number of iterations). for from 0 to M do

iπ ( = equally increasing slope angle). λi = tan 2M Solve the problem below and save xi : (If ρ ∗ ≤ 0) xi = arg max λi μT x − xT x,

(4.16)

(If ρ ∗ > 0) xi = arg max λi μT x + xT x,

(4.17)

x x

s.t.

K 

ck xk ≤ C,

k=1

xk ∈ {0, 1}∀k. end for return x¯ = arg

min

x∈{x0 ,...,xM }

T T √−μ x . xT x

As shown in Fig. 4.29, if we think a 2D scatterplot of .H (x-axis .μ and y-axis σ 2 ), .λ corresponds to the slope. Then, finding an extreme point by solving (4.15) corresponds to finding the point on the slope with the minimum intercept in .H. Depending on the slope, .λ , different extreme points will be obtained. Thus, the proposed heuristic is to find all extreme points by solving (4.15) constant times, M, i.e., increasing the slope from 0 to .π/2 by .π/2M. We bound the ratio between the optimal cost obtained by Algorithm 3 and the true optimal cost .ρ ∗ as defined in (4.12).

.

78

4 Adapt Load Behavior as Technology Agnostic Solution

Proposition Let .μi = μT xi and .σ/2i = xTi xi . When .ρ ∗ in (4.12) is less than zero, Algorithm 3 has the approximation bound in (4.18) that only depends on .σi 

:

.

σ T − μT x¯ . < ρ ∗ min i−1 √ i σi x¯  x¯

(4.18)

We provide the proof for Proposition 4.2.3.3 in [162]. Briefly, the bound depends on the given data and the number of iterations, M, which decide .σi . For example, we provide the relation between M and the approximation bound at Zone 13 in Fig. 4.37 (Sect. 4.2.4.3). When .ρ ∗ in (4.12) is larger than zero, under a given T and N, the optimal solution does not have to be one of the extreme points [161]. However, we assume that the optimal solution stays at one of the extreme points and finds the extreme points by solving (4.17) instead of (4.16) in Algorithm 3. This is because the optimization direction is increasing .μT x and .xT x as much as possible when .ρ ∗ is larger than zero. When . is not diagonal, (4.16) should be solved by quadratic programming after relaxing .0 ≤ xk ≤ 1. However, if we assume that the responses are independent so that . is diagonal .[σ2 = (1 1, . . . , KK )], solving (4.16) becomes a linear programming problem as

T λi μT x − xT x = λi μ − σ 2 x.

.

(4.19)

Especially when the cost for each customer .ck is the same, the problem simplifies to selecting the highest (at most) N entries from the .λi μ − σ 2 vector, which is the same as sorting the K entries, .O(K log K),3 in a computational perspective. To summarize, with the assumption of having independent responses and the same targeting costs, our customer selection procedure guarantees a near-optimal solution, with a computational complexity .[O(K log K)] equivalent to that of sorting K entries in a vector. This is the most significant benefit of our heuristic algorithm, which enables customer selection even with a very large number of customers.

3 To be more accurate, it is .O(K min(log K, N )) as the complexity for selecting the highest N values from K values is .O(KN ). However, on the assumption we target a certain proportion of customers from all, we use .O(K log K).

4.2 Consumer Targeting

79

4.2.4 Experiment on Data 4.2.4.1

Description of Data

The anonymized smart meter data used in this section is provided by the Pacific Gas and Electric Company (PG&E). The data contains the electricity consumption of residential PG&E customers at 1-h intervals. There are .218, 090 smart meters, and the total number of .24h load profiles is .66, 434, 179. The data corresponds to 520 different zip codes and covers climate Zones 1–4, 11–13, and 16 according to the California Energy Commission climate zone definition. The targeting methodology is tested by focusing on a cool climate zone (Zone 3, .32, 339 customers) and a hot climate zone (Zone 13, .25, 954 customers) during the summer season, May to July 2011. Zone 3 is a coastal area, and Zone 13 is an inland area. Weather data corresponding to the same period is obtained from weather underground for each zip code in these two zones.

4.2.4.2

Consumption Model Fitting Result

We start the targeting process by fitting the consumption model to each hour of the day. First, we perform model selection utilizing an F-test to select between the models in (4.10) [model (3)] and (4.11) [model (4)] for each customer. The F-test is conservative for model (3) as the reference temperature is restricted to be an integer between 68 and 86. The resulting selection according to climate zone is shown in Fig. 4.30. Notice that model (3) is more frequently selected in Zone 13 (hot) than in Zone 3 (cool). In fact, in the cool climate zone, the model does not significantly outperform model (4). Model (3) is a better fit during higher consumption hours (3–9 pm) for both climate zones. The .R 2 value distribution for each climate zone is used to check whether how much consumption variance can be explained by the model. Considering that the system peak hours of consumption for a typical utility occur between 4 and 6 pm Fig. 4.30 Hourly model selection result by F-test [the percentage of customers for which the model in (4.10) is selected]

80

4 Adapt Load Behavior as Technology Agnostic Solution

Fig. 4.31 .R 2 distributions in both climate zones

during the summer season, we assume these are the hours when DR is most required and provide results for hours in this range (Fig. 4.31). Other hours displayed similar results. In hot Zone 13, the model explains about 42% of variances on average, while in cool Zone 3, it explains 8–9% of variances. Considering that models (3) and (4) are simple linear models, and the only independent variable in them is the zip code level external temperature, we can conclude that temperature is an important factor of electricity consumption in Zone 13. Note that the main purpose of these temperature models is to estimate customers’ temperature sensitivities in concerned hours on weekdays during the summer period. In general, to explain customers’ energy consumptions better with achieving higher 2 .R values, we can fit customers’ consumptions on more features if available (e.g., indoor temperature, the day of the week, thermal resistance, seasonal effect, and previous interval consumption) with more advanced models [167, 168]. As shown in Fig. 4.32, in the hot climate zone, consumption has positive sensitivity with respect to temperature for most households. In the cool area, many customers are mostly insensitive to temperature, and the absolute sensitivity is relatively small, yet some customers have positive sensitivity. This suggests that the example DR program would not be as effective in the cool area as in the hot area. It also demonstrates that careful selection of customers in cool areas can reveal some potential for DR. Figure 4.33 shows the scatterplots of .μk and .σk in (4.9) in both climate zones when .T rk is .3◦ F (.1.67 ◦ C). Notice the data is more scattered in Zone 3 (cool) than in Zone 13 (hot). Though both plots are on the same scale, (.μk , .σk ) in Zone 3 is much

4.2 Consumer Targeting

81

Fig. 4.32 Temperature sensitivity distributions

Fig. 4.33 .μk and .σk scatterplots in both climate zones

more scattered over the limit of the plot. As seen in Fig. 4.33, most of customers in Zone 13 have positive temperature sensitivity, and (.μk , .σk ) stays within a relatively small range, while Zone 3 has customers with a very wide range of (.μk , .σk ).

82

4 Adapt Load Behavior as Technology Agnostic Solution

4.2.4.3

Targeting Result Analysis

Section 4.2.2 developed an SKP for optimal customer selection, and two different algorithms are provided to solve this combinatorial problem in Sect. 4.2.3.3: (1) an efficient heuristic (Algorithm 3) and (2) a greedy algorithm (Algorithm 4). In this section, we show the customer selection results by solving the SKP (4.8) with the assumptions that .ck is the same for all k and . is diagonal. Algorithm 4 Gradual greedy algorithm Require: μ and σ 2 vectors in (4.19). x = 0; T0 = T . for i from 1 to N do Find j from below and set xj = 1 in x : Solve the problem below and save xi : j = arg max k|xk = 0} {

μk , σk

(If ρ ∗ ≤ 0) s.t. μk ≥

Ti − 1 N +1−i

(4.20)

Ti = Ti−1 − μj end for

First, we provide the trade-off curve between DR availability and DR reliability with fixed N, which is the most important statistic for a utility manager with a limited budget to run a DR program. Then, we present the relation between varying T (availability) and the minimum N that achieves 95% probability. Through this plot, we can provide the cost required to achieve certain energy-savings. Also, we show the relation between varying N and the maximum probability (reliability) in (4.8) given T , which corresponds to the trade-off between DR reliability and the cost with a given T . Lastly, we provide the plot of M and the approximation bound. Figure 4.34 shows the DR availability-reliability trade-off curve within a certain limit of the number of participating customers. The interpretation example can be that if a utility wants to save 1625 kWh during 5–6 pm at Zone 13, we cannot guarantee a saving of more than .50% even with the best 2000 customers. A utility manager can generate this plot with setting N depending on their budget and decide how much energy they will target to save with a certain reliability sacrifice. Figure 4.35 shows the relation between maximum probability and N given T in (4.8). We assume that DR events can happen for each hour during 4–6 pm, ◦ ◦ .T rk is .3 F (.1.67 C), and T is 1000 kWh. According to the plots, the heuristic algorithm always achieves equal or higher probability than the greedy algorithm. As designed, the greedy algorithm guarantees more than 50% probability when the optimal solution can achieve more than 50%. Note the reliability changes from 0 to 100% within 30 customers in the hot climate zone though about 1100 customers are required to achieve 1000 kWh of energy-saving. This implies that an inappropriate

4.2 Consumer Targeting

83

Fig. 4.34 Trade-off curves between DR availability and reliability

Fig. 4.35 Maximum probability (reliability) and N for a DR program with .T rk = 3 ◦ F(1.67 ◦ C)

84

4 Adapt Load Behavior as Technology Agnostic Solution

Fig. 4.36 M and approximation bound

number of enrollments can cause a program to fail, reinforcing the need for the analytic targeting mechanism. In contrast, in the cool climate zone, it takes more than 60 customers to reach reliability from 0 to 100%. As shown in Fig. 4.33, the reason is that a small number of the customers with positive mean responses in Zone 3 have relatively large standard deviations compared to the customers in Zone 13. Figure 4.36 shows the relation between varying T (availability) and the minimum N required to achieve the corresponding T with guaranteeing the reliability (.> 95%). We varied T from 200 to 4000 kWh for both climate zones. Note that energysavings of 4000 kWh or more are achievable with a high probability in Zone 13, while it is not possible to achieve even more than 1600 kWh in Zone 3. This fact supports the reasonable conjecture again that targeting customers in hot climate zones is more effective in achieving energy-savings than targeting customers in cool climate zones. Also, note that the relation between T and N in the figures can be approximated by quadratic equations. Finally, Fig. 4.37 shows the approximation bound of Algorithm 3. For T , we use 2000 kWh, and for N, the minimum N that achieves more than 95% probability is selected for Zone 13. In Zone 13, from .M = 10, the approximation bound is .0.983, which means only 11 sortings of K values are needed to achieve the near-optimal solution of (4.8).

4.2 Consumer Targeting

85

Fig. 4.37 T and the minimum N to achieve T with more than 95% probability

4.2.5 Conclusion In this section, we investigated an efficient customer selection process for DR programs based on a large amount of hourly consumption data. We ensured that the proposed methodology scales to large datasets and is composed of a simple linear response modeling and a fast heuristic selection algorithm to solve the SKP. Moreover, we proved the approximation bound of the heuristic algorithm solution, which is an almost optimal solution in the hot climate zone. As an example, the DR program shows the result of the approach we suggested, where the program controls the temperature setpoint during the summer season. From the experimental results, we found that there are many customers, even in cool climate zones, who are very sensitive to the outside temperature and use their AC system actively. Therefore, when the number of recruiting customers, N, is small, the energy-saving potential can be higher in the cool climate zone. The probability of achieving the targeted energy-savings changes rapidly over a small range of N in hot climate zones, which means it is very important to select the right customers to achieve the targeted energy-savings with minimum enrollment and cost for the utility side. The proposed method can be extended in many ways. For example, the proposed heuristic algorithm does not have to be confined to a DR program targeting. If any application problem can be formulated in the same optimization problem format, the heuristic algorithm can be applied with a low computational cost. Also, by changing DR programs and response modeling, more refined and practical research can be done. Additionally, it might be important to include other practical constraints or requirements from DR programs.

86

4 Adapt Load Behavior as Technology Agnostic Solution

4.3 Demand Response In the previous sections, we analyzed the different user behaviors and targeted customers for our programs. This is because residential customers are increasingly participating in demand response programs for both economic savings and environmental benefits. For example, baseline estimation-based rewarding mechanism is currently being deployed to encourage customer participation. However, the deterministic baseline estimation method good for commercial users was found to create erroneous rewards for residential consumers. This is due to uncertainty associated with residential customers and the inability of a deterministic approach to capture such uncertainty. Different than the deterministic approach, we propose in this section to conduct probabilistic baseline estimation and pay a customer over a period of time when the customer’s predicted error decreases due to reward aggregation. To achieve this goal, we analyzed .12, 000 residential customers’ data from PG&E and propose a Gaussian process-based rewarding mechanism. Real data from PG&E and OhmConnect are used in validating the algorithm and showing fairer payment to residential customers. Finally, we provide a theoretical foundation that the proposed method is always better than the currently used industrial approaches.

4.3.1 Introduction The Federal Energy Regulatory Commission defines demand response (DR) as electric usage adjustments by the consumers from their normal consumption patterns [169]. Such adjustments are in response to (1) changes in the price of electricity over time or (2) incentive payments designed to induce lower electricity consumption at usage peaks or when the system reliability is jeopardized [170]. Traditional DR programs are usually designed for large commercial customers, where a baseline is used for rewards. As the power usage of these customers is predictable, current baseline estimation methods for commercial users assume that the uncertainties can be ignored. So, a deterministic baseline evaluation is used for the electricity consumption estimation based on the no-DR period [171, 172]. The difference between an estimated normal consumption and the actual usage is used to calculate the savings [173–175]. For example, deterministic methods such as simple load average and temperature-based linear regression have been used for commercial customers with satisfactory results [176–178]. The DR program had great success with large power consumption users. Greentech Media reported in 2013 that .8.7 million dollars in revenue had been generated within 7 months in the Pennsylvania, Jersey, and Maryland (PJM) Power Pool by conducting demand response in system operation with mostly large customers. While large customers currently create a significant portion of the revenue in the DR programs, the smaller residential consumers hold the key to potential growth in

4.3 Demand Response

87

the DR customer number and the DR revenue. For example, there are .9.3 million customers who participated in DR programs by March 2016 in the United States, but more than .90% of them are in the residential sector. In addition to making profits, DR at the residential level is also becoming an attractive solution for the radically increased renewable energy to balance the local power flow. For these reasons, California Public Utilities Commission (CPUC) issued the Electric Rule 24, calling for the opening of direct participation of residential customers in DR programs. Now, consumers can call into the Pacific Gas and Electric Company (PG&E)’s Intermittent Renewable Management Pilot Phase 2 program and bid for aggregated 15-minute load reductions, so they can earn payments directly from CAISO’s Proxy Demand Resource product. In response to this incentive, private companies like OhmConnect started to expand beyond utility-operated demand response for residential customers. As these consumers are reluctant to let aggregator companies have total control of their related assets [179], a baseline-based rewarding method for large customers was initially indiscriminately applied to residential customers. However, high uncertainty in these consumers’ load consumption makes the estimation error rate as high as .50% [180], leading to lots of complaints from participating consumers. Therefore, a new baseline estimation method and a complementary reward mechanism are needed to keep the current consumers and attract new participants. For improving the deterministic baseline estimation, [181] proposes to consider non-DR days preceding the DR event and choose the average load of the highest consumption days within those days for a baseline. Such a method is called HighXofY (used by New York Independent System Operator (NYISO)). Mohajeryami et al. [182, 183] compare HighXofY, LowXofY, and MidXofY in [181] with an exponential moving average (used by ISO New England) and regression methods with adjustments. An economic analysis of a hypothetical peak time rebate (PTR) program is carried out afterward. To improve accuracy, non-DR participants can be used in the control group [184–186]. However, it may be hard to uniquely define the best control group that properly captures user behavior of the treatment group (DR participants). To resolve this problem, [187] presents a clustering-based method, where customers are first divided into groups. Within each group, DR participants’ baselines are estimated by non-DR participants’ loads. While there are some improvements, the drawback of deterministic methods lies in (1) their failure to utilize the historical data in capturing the dynamics of complex user behaviors [178, 188], particularly important for small to medium consumers with more variability [189], and (2) their unfair rewards, which can have very different baseline estimation errors for different DR participants. As large utility companies have started to make historical data available to approved third parties, e.g., the Green Button initiative in CA [190], we propose to (1) conduct machine learning of historical data to capture the residential consumer uncertainties and (2) reward customers at a similar baseline-estimation-error rate for fairness. Specifically, our data analytics of a large residential customer dataset shows properties of Gaussianity, so we propose to use Gaussian process (GP) regression for machine learning [191, 192]. This is because GP regression naturally provides the

88

4 Adapt Load Behavior as Technology Agnostic Solution

prediction of uncertainties inherent in the customer loads. It also has the flexibility of an adaptive component design according to customer behavior [191]. Based on probabilistic estimates, we further propose to reward consumers until most users’ aggregated but averaged rewarding uncertainty decreases to a tolerable level, e.g., .5%. Finally, we prove that the Gaussian process-based baselining method’s mean estimate is equivalent or better than the estimates generated by currently used baseline estimation methods. For simulation, we use an hourly PG&E dataset with .12, 000 residential customers [180, 193] and OhmConnect dataset with 425 users, where the demand response period occurs in the afternoon and evening of summer days. By using these datasets, the proposed method is compared with other state-of-the-art baseline estimation approaches. The results show that the probabilistic estimate not only has a mean estimate better than the currently used deterministic estimates but it also provides a new .95% confidence zone estimate, which covers true load values completely. Notably, we add a machine learning method based on a gradient boosting model for comparison. Its worse performance indicates that our data analytics for machine learning modeling is necessary for estimation accuracy. If we further aggregate a user’s estimates over days based on the mean and variance estimates, the rewarding error can reduce to .5%. This result aligns well with our theoretical expectation, thus validating the correctness of the estimate. While we provide simulation results for all customers, we notice that different customers reach the .5% error tolerance threshold at a different speed. So, our suggestion of waiting until most of the consumers reach a threshold is practical to both the aggregators and the consumers. The innovation comes from the following: (1) motivate the need of probabilistic baselining, (2) use load pattern to justify the Gaussian process (GP) modeling for residential customers, and (3) use real dataset for design and validation. Compared to [17], we use feature extraction to demonstrate why a Gaussian process is proper for modeling uncertainty. We show how to embed different covariance functions to the GP platform to model residential users’ power assumption pattern. Instead of a simple demonstration of aggregation in users, we also extensively simulate aggregations in the days for fairer payments. Different user types are also compared to understand user behavior and its impact on payments. Finally, the computational time is analyzed for large-scale implementation.

4.3.2 Probabilistic Baseline Estimation for Residential Customers For commercial customers, the consumption without DR signal is quite regular. Therefore, a deterministic baseline estimation shown in Fig. 4.38 is used to calculate the reward. Once the baseline estimate is found, the difference between the actual consumption and the baseline estimate can be used for reward calculation.

4.3 Demand Response

89

Fig. 4.38 Deterministic baseline estimation for rewards

Fig. 4.39 Customer 469 (PG&E database): a large variance

As residential customer-level demand response is becoming more important, various schemes have been developed to evaluate responsive loads for ancillary services to grids [176]. 1. Simple Average: average loads in the past 10 days. 2. Selected Simple Average: an average of the highest (or median) three out of ten most recent days, e.g., HighXofY [181] or MidXofY. 3. Weighted Average: weighted average over historical loads in the past 10 days. 4. Morning Usage Adjustment: methods above can also be accompanied by a morning usage adjustment of the event day to improve performance [176]. 5. Regression: exponential smoothing model [188] in time or piecewise linear model over temperature [189]. However, such deterministic baseline estimation approaches usually produce large reward errors for residential customers’ baselining due to their highly uncertain power consumption [180]. Figure 4.39 shows the mean with a large variance of a consumer’s daily data from a PG&E dataset. For example, the .95% confidence interval for standard deviation can be as large as the mean, making a deterministic baseline estimation fail to give a fair reward to the consumer when the customer

90

4 Adapt Load Behavior as Technology Agnostic Solution

is rewarded with such high uncertainty. Therefore, we propose to estimate the variance of the baseline estimator [194], in addition to estimating the mean. We define a probabilistic baseline estimation problem for a residential customer as the following: • Given: historical load .yi for the no-DR period and historical temperature data .ti for both no-DR and DR periods, where i is the time index • Find: probabilistic load estimate .yi∗ for the DR period

4.3.3 Probabilistic Baseline Estimation via Gaussian Process Regression To obtain a probabilistic estimation for .yi∗ , we analyze the PG&E and OhmConnect data for a proper statistical model first. The loads have been normalized and decomposed by removing seasonality components. Specifically, a repeating pattern within each year is known as seasonal variation. For removing seasonality, the original time series is often decomposed into three sub-time series: “seasonal,” “trend,” and “random” [195]. We remove the seasonality by (1) importing the data, (2) detecting the trend, (3) de-trending the time series, (4) averaging seasonality, (5) leaving random noise, and (6) reconstructing the original signal. Figure 4.40a and b shows the distributions of two loads at two different time stamps. The red dashed lines are the fitted Gaussian distributions to the data. Figure 4.40c shows the joint distribution of the same load in two time indices of a day. Together with these three figures, our data analysis shows the loads at different time slots can be modeled as a multivariate Gaussian distribution. As we cannot exhaust all possible joint distribution due to space limit, we conduct D’Agostino’s K-squared test [196] and the Jarque-Bera test [197] for checking Gaussianity rigorously. The key step of these two tests is to find test numbers that have a p-value greater than .0.05. Specifically, we randomly select 200 users and

Fig. 4.40 The Gaussianity of residential customers’ loads at different time indices. (a) The histogram of normalized load distribution at one time index. (b) The histogram of normalized load distribution at another time index. (c) The joint load distribution of the two time indices

4.3 Demand Response

91

Fig. 4.41 D’Agostino’s K-squared test and the Jarque-Bera test

conduct both tests on the residuals at the same time of day for 2 months. More than .70% of them have a p-value greater than .0.05, indicating that they are samples drawing from Gaussian random variables. Notice that, even for users with smaller p-value, we cannot claim that they are non-Gaussian. This is because the residuals are samples from different days. The result is shown in Fig. 4.41. Because of the Gaussianality, we propose to use Gaussian process regression to infer the probabilistic function .yi (x i ), where .yi is the load and .x i is a column vector containing time index i and the temperature .t i . In Gaussian process (GP) regression, the regressed result is based on GP input variables’ distances to historical input variables and the covariance function based on input variables. It makes fewer assumptions about covariance functions and lets the designer choose based on one’s prior knowledge of the data property [191]. For example, GP regression only requires a positive semidefinite property, leading to large classes of suitable functions, e.g., squared exponential, Matern class, linear covariance, Brownian motion, periodic covariance, neural network covariance, etc. Furthermore, one can also construct a new covariance from existing ones by using certain rules, e.g., product, convolution [192]. Therefore, .x i s at different time slots form the matrix X. The joint probability distribution of the load data .y in a vector form during the no-DR period and the load data .y ∗ during the DR period is modeled as .

y y∗



 ∼N

   μX K(X, X), K(X, X∗ ) , K(X∗ , X), K(X∗ , X∗ ) μX∗

where the matrix .X = [x 1 , · · · , x i , · · · ] with the time index i belonging to the noDR period. The matrix .X∗ = [x 1 , · · · , x i , · · · ], where the time index i belongs to the DR period. .m(y) represents the mean of .y and is a function of X. .K(·, ·) is the covariance function. Therefore, the proposed Gaussian process model is specified by a mean function and a covariance function of the data [192]. By using a Bayesian framework for the Gaussian process [192], the mean estimate is

92

4 Adapt Load Behavior as Technology Agnostic Solution

E(y ∗ ) =μX∗ + K(X∗ , X)[K(X, X) + σn2 · I ]−1 · (y − m(X)),

.

(4.21)

where .μX∗ takes the expectation for each row of .X∗ . The prior distribution of the noise is defined to be .N(0, σn2 ). And the covariance estimate is Cov(y ∗ ) =K(X∗ , X∗ ) − K(X∗ , X) · [K(X, X) + σn2 · I ]−1 K(X, X∗ ).

.

(4.22)

As shown in (4.21) and (4.22), the covariance function design .K(·, ·) is the key to extracting features from highly uncertain residential loads. Theorem 4.1 Gaussian process regression provides a better estimate than the linear regression-based estimate with respect to the likelihood function. Proof See [16] for a proof.



4.3.4 Feature Extraction: Covariance Function Design In the following, we demonstrate how to use the flexibility of Gaussian process to embed various user behaviors into baseline estimation automatically.

4.3.4.1

Embedding Distance-Based Correlation

A future load equals the summation of a past load and the change in between. This leads to a stronger correlation of load values between two time slots closer to each other. As shown in Fig. 4.42a, the correlation decreases from a 1-hour interval to an 11-hour interval for the PG&E dataset and OhmConnect dataset. Therefore, we employ a squared exponential covariance based on Euclidean distance, with a hyperparameter .θ1 to control the length scale: x − x  )22 kd (x, x  ) = exp − . 2θ12

.

(4.23)

For .kd , the distance-based covariance function is dependent on the distance between input variables x and .x  . This means that the further away the historical data is from our current data in the input space, the less impact the historical data will have on the forecasted output data, e.g., y.

4.3.4.2

Embedding Periodic Pattern

From Fig. 4.42a, the correlation becomes high again after 24 hours. As different residential customers may have different periodicities with different weights, we

4.3 Demand Response

93

Fig. 4.42 Distance-based correlation and periodicity analysis. We do not plot higher frequencies, although they do exist and will be included if their magnitude reaches the thresholds. (a) Autocorrelation with different intervals: The x-coordinate represents load at time t. The ycoordinate represents load at time .t + t0 , where .t0 is the time interval between loads at two indices. (b) Periodicity analysis of residential load series

propose to let periodic patterns be automatically detected and embedded into the covariance function. For example, Fig. 4.42b shows the different periodicities of our dataset, where the x-coordinate represents the period. While the 12-hour period in Fig. 4.42b is significant, it is likely to be ignored by a typical prediction design based on experience instead of a data-driven fashion. However, by using the data itself, we can set up a threshold to extract the periodicity automatically. This highlights that the data-driven approach is important for an automatic feature extraction and the consequent improvement in the estimator accuracy. While there are many functions to represent a periodic pattern, different bases can map to each other by algebraic transformation. Here, we propose the following covariance function based on sinusoid functions:

94

4 Adapt Load Behavior as Technology Agnostic Solution

Fig. 4.43 The relationship between residential loads and temperature in the PG&E dataset



 x − x  22

kp (x, x  ) = exp − 2θ32 sin2 θ4 x − x  2 − . 2θ22

.

(4.24)

The first part of this covariance function with a .sin sign is for the periodicity in power systems. The second part of the square exponential is to penalize the periodic pattern according to the distance. Therefore, .kp is used to capture the periodic pattern in power systems with the help of .θ3 and .θ4 . .θ2 is used to make a closer point having a larger impact on the estimate.

4.3.4.3

Embedding Piecewise Linear Pattern in Temperature

As loads are usually sensitive to temperature, the temperature is also used in our covariance function. The PG&E dataset shows a strong correlation between temperature and power usage in Fig. 4.43. Therefore, a piecewise linear relationship can be embedded with the covariance function below, where the temperature range is divided into several zones (i.e., six zones in [189]), leading to a vector form .x of the input data that fit a piecewise linear regression: kt (x, x  ) = σ02 x · x  ,

.

(4.25)

where .x · x  is the dot product of two vectors. Specifically, the covariance function above is for embedding the inner product of the temperatures. The linear kernel is not like the others in that it’s non-stationary, where a stationary covariance function is one that only depends on the relative position of its two inputs and not on their absolute locations. For example, we first convert the scalar temperature variable t to a sixdimensional vector, with six thresholds, .T1 = 30, .T2 = 45, .T3 = 60, .T4 = 75, .T5 = 90, and .T6 = 105, which is for considering the nonlinear demand-temperature relationship. For each temperature t, its corresponding six-dimensional vector .x = [t1 , · · · , t6 ] is defined as:

4.3 Demand Response

• • • • • •

t1 t2 .t3 .t4 .t5 .t6 . .

95

= min(max(0, t − T1 ), T2 − T1 ) = min(max(0, t − T2 ), T3 − T2 ) = min(max(0, t − T3 ), T4 − T3 ) = min(max(0, t − T4 ), T5 − T4 ) = min(max(0, t − T5 ), T6 − T5 ) = max(0, t − T6 ).

For example, if the temperature is t = 77, then .x = [15, 15, 15, 2, 0, 0].

4.3.4.4

Embedding More Functions

One can also embed other types of covariance function by identifying special patterns of an individual load via data mining. For example, one may find a rational quadratic covariance useful for distance-based correlation, where .θ5 controls length scale and .θ6 controls the shape: x − x  22 −θ6 kr (x, x  ) = 1 + . 2θ5

.

(4.26)

For this covariance function, the covariance of two loads depends on the distance between the corresponding time indices. Remark 4.1 For irregular usage during a day for residential customers, the covariance functions, such as (4.26), are a characterization of the uncertainties. This is because a covariance is a measure of the joint variability of two random variables. For example, if the irregularity of a load is high, the value of a corresponding covariance function is usually high, which will lead to an estimate of high uncertainties.

4.3.5 Utilizing Probabilistic Estimate for Fair Payment to Residential Customers Baseline estimations on a day-by-day basis usually result in large uncertainties. For example, Fig. 4.44a shows different estimation results for one user during the DR period (1:00–6:00 pm). The error is large for different approaches. And the width of the .95% confidence interval (CI) for the Gaussian process-based approach is as large as the true load value. Since the estimation is represented in a probabilistic form, we can aggregate over days because the variance grows slower than the mean with daily aggregation. Although the load reduction is needed on a specific day for demand response, we use aggregated days for reducing reward error for payments. To formulate the load reduction, we first define .Ld as the actual load recorded by the smart meter. With a baseline estimation, the estimated load on day d is defined as .Ld,estimated . With .Ld and .Ld,estimated , we calculate the difference for the

96

4 Adapt Load Behavior as Technology Agnostic Solution

Fig. 4.44 The prediction variance decreases as we aggregate the baseline estimate over days. (a) Load prediction for 1 day based on normalized aggregation over 1 day. (b) Load prediction for 1 day based on normalized aggregations over 6 days. (c) Load prediction for 1 day based on normalized aggregations over 11 days

load reduction represented by .Sd = Ld,estimated − Ld . Without loss of generality, let the rewards per unit load reduction be .P = 1. The expected  total reward .Pμ  with aggregation is .Pμ,aggregation = Ed∈{1,··· ,D} ( D S ) · P = μD, where the d=1 d operator E represents expectation. On the other hand, the deviation of the aggregated usage of a single√ user √ standard√ over D days is .Pσ = Dσ P = Dσ . Therefore, the variance per day is .σ/ D. As D increases, the relative uncertainty of the payment decreases. When the reward μ2 .Pμ > Pσ · P or .D ≥ , utilities or private companies can reward customers σ2 with relatively large confidence corresponding to the standard deviation. Therefore, aggregation over days allows for payments or allocation of DR rewards with a much higher confidence, offering the reliability desirable by consumers.

4.3.6 Simulation Result We use two datasets for simulations. The first one is the PG&E dataset with .12,000 residential customers in Northern California for simulation. It contains anonymized and secured hourly smart meter readings on electricity consumption for a period of 1 year from August 1, 2011, to July 31, 2012 at 1-hour intervals. The second dataset is from OhmConnect. It contains anonymized and secured smart meter readings on electricity consumption for 425 OhmConnect customers for a period of 1 year spanning from January 1, 2014, to December 31, 2014, at 1-hour intervals. The average hourly demand is .0.62 kWh, and the standard deviation of the average hourly demand over 425 customers is .0.43 kWh. WeatherUnderground.com is used for temperature information for (1) August 1, 2011, to July 31, 2012, and (2) January 1, 2014, to December 31, 2014. Our sample is fairly large in both space and time and comes from different sources, so we believe it serves as a legitimate basis for this proof-of-concept demonstration. A description of the methodology implementation is as follows:

4.3 Demand Response

97

• Gaussian process-based method: Use the temperature and time stamp as input X and the load at the time stamp as the output y, and then feed them into the GP model to get the optimized hyper-parameters of the covariance matrix. The training length is for 2 months, and the model has been evaluated for multiple days. The Python package “pyGPs” is used for implementation. • Selected simple averaging method: Take the load of the last 10 days and choose the median eight loads to average. • Regression model-based method: The model assumes that a load is a function of time in a week and assigns a regression coefficient to each 1-hour interval. The model also assumes that demand is a piecewise linear and continuous function of outdoor air temperature [198]. Following [177], we divide each observed temperature into six equal-sized temperature zones. A regression coefficient is assigned to each bin. • Gradient boosting model (GBM): For comparison, a machine learning-based method, GBM is added. It combines simple regression models gradually to form a powerful regression model [199]. The loss function of the gradient descent is the sum squared residual of the training data. In the simulation, we use the same features as the regression model, and the GBM gets an optimal ensemble of 100 simple decision tree regression models by using gradient descent method to find the relative weights of different decision tree regression models. For the overall error, [180] used averaged mean absolute percentage error (MAPE) for commercial customers. However, for residential customers, there are many near-zero consumption periods in a day which may cause large MAPE with little absolute error in these periods. Therefore,we choose the root-mean-square  2 error (RMSE) for the y-coordinate: .RMSE = N1 N i=1 (yi,estimated − yi,true ) , where the time index i is iterating through 12–6 pm (the peak hours) for all DR days. For example, there are 6 points in the 6-hour period, and if we aggregate the DR savings for 3 days, there are .N = 6 × 3 = 18 points in total.

4.3.6.1

Improved Daily Accuracy Without Day Aggregation

For each customer, Theorem 4.1 shows that the GP-based approach provides a better estimate than the linear regression-based estimate with respect to likelihood. Our dataset’s customers follow this property. As an illustration, Fig. 4.44a shows a probabilistic estimate of the afternoon loads on day 290 for customer 1003. The result from the probabilistic GP-based approach is plotted, where the red line represents the mean estimate and the light red region represents the confidence interval. For comparison purposes, we also plot the simple average method over 10 days and the regression method based on temperature. The x-coordinate represents 24 hours in a day. The y-coordinate represents the hourly load. The mean estimates for the afternoon loads based on GP are similar to the actual loads. Besides, we can visualize data randomness with the error via confidence intervals (CIs). The

98

4 Adapt Load Behavior as Technology Agnostic Solution

Fig. 4.45 Error comparison for methods with (1) GP-based (mean estimate), (2) gradient boosting model (GBM)-based, (3) averaging-based, and (4) temperature-based regression

95% CIs completely cover the path of actual loads. In contrast, the average and regression methods do not work well. Across all customers, Fig. 4.45 shows the RMSE for different methods and different days in a week. The figure shows that the GP-based method has smaller errors for all days. The result that the proposed GP-based method is better than GBM-based method shows that our data analytics for choosing a machine learning model is necessary for estimation accuracy.

.

4.3.6.2

Reduced Relative Confidence Intervals with Day Aggregation

Figure 4.44b and c shows the relative confidence intervals if we average the probabilistic baseline estimation over multiple days. For a 6-day average, the estimated confidence region becomes smaller. For an 11-day average, the confidence region becomes even smaller. This means that rewarding the consumer based on his normalized estimation uncertainty can reduce the risk of erroneous payments. To compare different users with different consumption patterns, we select f our users (users 6, 19, 33, and 60 of our datasets). Figure 4.46 shows that the normalized aggregated variances are different among various users. User 3 and 4’s estimation uncertainties decrease to a pre-defined threshold much faster. This is because their consumption is more predictable. User 2 reaches the threshold slower. User 1 is the slowest in this group, but the prediction variance reduces significantly over day/event aggregation. When it comes to rewarding, one can have different strategies. One method is to reward consumers until all customers’ aggregated but averaged rewarding uncertainty decreases to a tolerable level, e.g., .5%. However, this may lead to too long a waiting time as some customers may have a larger uncertainty than other customers. By rewarding customers when most of them reach a pre-defined threshold, we not only reward the customer quickly but also make sure that the rewarding uncertainty is low for most of the customers. Therefore, we propose to reward customers when most of them reach a pre-defined threshold.

4.3 Demand Response

99

Fig. 4.46 Prediction variance and error for four different users with day aggregation. (a) Variance estimates of four customers with day aggregation. (b) RMSE for the mean estimates of four customers with day aggregation. (c) User 1. (d) User 2. (e) User 3. (f) User 4

4.3.6.3

Reduced Relative Error with Day Aggregation

To validate what we observed and proposed in the last section, we show the error evaluation in Fig. 4.46b for each user. As predicted by the variance comparison in Fig. 4.46a, the relative error in Fig. 4.46b decreases with day aggregation. Also, the order of the error curves for different users is in accordance with different user curves for variance estimates in Fig. 4.46c–f. Figure 4.46 presents the user-by-user comparison by showing individual user’s variance and error together. Basically, the error curve of each user follows the variance curve. This validates the accuracy of the proposed rewarding mechanism based on day/event aggregation. Therefore, as uncertainty decreases dramatically with aggregation over days, both the aggregation companies (utilities and private aggregator companies) and the customers are offered more confidence for fair DR rewards.

4.3.6.4

Computational Time

It is known that the computation complexity of a GP-based regression is .O(n3 ) which is large when applied to a big dataset. However, our proposed GP-based regression algorithm is trained using one customer at a time. With a 15-minute interval on typical smart meters with a less than 10 years’ record, the baselining can

100

4 Adapt Load Behavior as Technology Agnostic Solution

be calculated in several minutes, which can be easily implemented to Hadoop/Spark parallel jobs on a computer cluster. Finally, as the reward evaluation is a post-event analysis, it does not require real-time computation. Our method is sufficiently fast for such a task. Remark 4.2 Although the MapReduce framework is quite useful for speeding up, it is mainly for speeding up queries. However, our GP-based method is not based on queries. It is based on conducting GP regression for each customer, storing the learned mapping parameters for each customer, and applying the learned mapping parameters for forecasting/estimating the demand response period for each customer.

4.3.7 Conclusion Demand response helps meet the climbing need for energy through the reduction in flexible loads. Although the current deterministic methods provide simplicity in estimating for commercial customers, they introduce large errors when applied to residential customers due to these methods’ failure to capture various uncertainties due to different customer behaviors. To characterize uncertainty, we propose a probabilistic Gaussian process (GP)-based baselining method after data analytics. The proposed method can actively discover customer patterns and embed such knowledge into learning in a short time. By using the probabilistic estimate, we further show that the proposed payment mechanism reduces rewarding errors while increasing the day aggregation. Finally, we prove the improved performance of the proposed GP-based method over the current deterministic approaches even without considering the variance estimate and reward aggregation. Simulation results over PG&E and OhmConnect datasets show that our method not only improves the mean estimate’s accuracy but also enables the new variance estimate to capture estimation uncertainties for fairer rewarding. In the future, we will analyze the impact of energy increase successive to the load reduction in addition to evaluating the load reduction in the demand response hours.

4.4 Energy Coupon as Demand Response In the previous sections, we introduced different methods to adapt load behavior as technology agnostic solutions for demand response program. However, the incentives can come in different forms. In this section, we will introduce a brandnew method based on coupons. For example, in the deregulated market operated by the Electric Reliability Council of Texas (ERCOT), load-serving entities (LSEs) usually purchase electricity from the wholesale market (either in the day-ahead or real-time market) and sign fixed retail price contracts with their end-consumers.

4.4 Energy Coupon as Demand Response

101

Therefore, incentivizing end-consumers’ load shift from peak to off-peak hours could benefit the LSE in terms of reducing its purchase of electricity under high prices from the real-time market. As the first-of-its-kind implementation of coupon incentive-based demand response (CIDR), the EnergyCoupon project provides endconsumers with dynamic time-of-use DR event announcements, individualized load reduction targets with EnergyCoupons as the incentive for meeting these targets, as well as periodic lotteries using these coupons as lottery tickets for winning dollar-value gifts. Several methodologies are developed for this special type of DR program, including price/baseline prediction, individualized target setting, and a lottery mechanism. This section summarizes the methodologies, experimental design, critical findings, as well as the potential generalization of such an experiment. Comparison of the EnergyCoupon with a conventional timeof-use (TOU) price-based DR program is also conducted. Experimental results in the year 2017 show that by combining dynamic coupon offers with periodic lotteries, the effective cost for demand response providers for the EnergyCoupon can be substantially reduced while achieving a similar level of demand reduction as conventional DR programs.

4.4.1 Introduction During the past decade, there has been an increasing penetration of renewable energy resources (such as wind and solar generation) in the power grid. For instance, the Electric Reliability Council of Texas’ (ERCOT) wind and solar generation has more than doubled in their fuel mix in the past decade, from 7.5% in 2008 to 18.6% in 2018 [200]. On the other hand, demand response (DR) has been identified as having the potential to become a flexible resource to solve the reliability and efficiency issue of the power grid incurred by renewable penetration [201]. Demand response is defined as “the changes of end-consumers’ electricity consumption in peak hours from their normal patterns” [202]. Many independent system operators in the United States, including the Electric Reliability Council of Texas (ERCOT), New York ISO (NYISO), California ISO (CAISO), and ISO New England, already have several ongoing day-ahead and real-time DR programs in their operating areas for providing energy reserve and auxiliary services [203–205]. The story of demand response in the United States begins in the 1970s, growing with the popularity of household air-conditioning [206], for which many of the DR programs have been designed and implemented. With development over almost 40 years, it is generally accepted that DR programs can now be categorized into two dimensions by (1) the subject who takes control over devices (direct load control vs. self-controlled and market-based programs) and (2) the scale of target endconsumers (large industrial/commercial customers vs. small residential customers). The term “direct load control” indicates that the DR operator (such as the utility) can remotely turn on/off or modify the setpoint of customers’ equipment. The amount of load shedding can be precisely controlled at the expense of customers’

102

4 Adapt Load Behavior as Technology Agnostic Solution

comfort and satisfaction (for instance, an air conditioner might be turned off for some hours during a hot summer day). In contrast, market-based DR programs tend to use price signals or other incentives to encourage customers’ self-motivated load control behaviors. Such programs usually have less impact on customer comfort and satisfaction but are less precise and effective when a specified target of demand reduction goal needs to be achieved. Because of their profit-seeking characteristics and higher electricity usage than small residential customers, industrial and commercial customers usually have more self-motivation and better performance than small residential customers in participating in DR programs. Energy management systems have been developed to help increase energy efficiency in data centers, retail stores, telecom providers, etc. and to coordinate with market-based signals (such as real-time electricity and gas prices) [207]. On the other hand, residential customers are often more concerned with their personal comfort. Their acceptance of price-based mechanisms (such as time-of-usage (TOU), critical peak pricing (CPP) [208], and market-index retail plans offered by the utility) still remains at a low level, with the majority of residential end-consumers choosing fixed-rate electricity retail plans. Given the fact that residential electricity consumption leads electricity usage in the United States (38%, as compared with commercial at 37% and industry at 25%) [207], the potential of residential DR is far from fully explored. Table 4.6 summarizes some recent research and operational programs using different approaches to demand response. There has been some academic research [208, 214, 215] and commercial implementation (e.g., ENERgy Network Operations Center (ENERNOC) [216], OhmConnect [217]) of market-based DR; however, as an alternative approach to current existing market-based solutions, the efficiency gain of coupon incentivebased demand response (CIDR) is still underexplored. CIDR aims at providing coupon-based incentives to reduce the electricity consumption of residential endconsumers during peak hours [218–220]. Compared to traditional DR programs, this mechanism has the following advantages: it is purely voluntary, penalty-free to customers, and compatible with the fixed-rate electricity retail plans which are most popular among residential end-consumers. A program named EnergyCoupon is the first-of-its-kind implementation of CIDR, with further additional innovations such as (1) dynamic DR events to end-consumers with individualized reduction targets and (2) periodic lotteries designed to convert coupons earned in DR events

Table 4.6 Classification of demand response programs Customer type Large commercial and industrial Small residential

Direct load control (centralized) Research papers [209–211], direct load control programs [212] Direct load control programs [212]

Market-based Energy management systems [207]

Variable-rate retail plans [213], CPP [208], EnergyCoupon [214]

4.4 Energy Coupon as Demand Response

103

into dollar-value prizes. A small-scale pilot experiment was conducted in 2016, with substantial load profile changes of the residential participants in the posterior analysis [214]. In terms of (2) the periodic lottery, many academic, as well as commercial, studies have shown how “nudge engines,” such as games and lotteries, could help to encourage the desired behavior of humans. For instance, [221] tries to discover the social value of energy-saving, [220] models the CIDR system as a two-stage Stackelberg game, and [222–224] use the “mean field games” framework to describe end-consumer behaviors in the DR program with lotterybased incentives. Furthermore, the lottery-based incentive scheme has already been implemented in a platform that aims at encouraging uniform temporal demand on public transportation [225] and relieving congested roadways [226]. However, apart from some ongoing experiments [227, 228], there is not much attention given to trying to adapt the lottery idea to electricity DR programs. Built upon our previous studies in 2016, a larger-scale experiment was conducted in 2017, with a much more comprehensive design and critical assessment.4 The improvements of the experiment (’17) include but are not limited to (1) an extra comparison group for data analysis; (2) an improved baseline prediction algorithm (named as the “similar day” algorithm); and (3) two subgroups divided from the treatment group facing fixed and dynamic DR events separately. More facts and comparisons between the two experiments are listed in Table 4.7. We will show in later sections that these changes help to analyze end-consumers’ behaviors in-depth. The main contributions of the EnergyCoupon program are as follows: 1. Providing price and baseline prediction algorithms suitable for DR programs 2. Systematically documenting the experimental design, data collection, and posterior analysis for the selected residential customers 3. Experimental results showing load shedding/shifting effects, different behaviors over fixed/dynamic coupon targets, financial benefits of the LSE and endconsumers, impact of periodic lotteries on human behaviors, as well as the effective cost-saving of the EnergyCoupon over traditional DR programs

Table 4.7 Overview of EnergyCoupon experiments in year 2016 and 2017 Year Experiment length (weeks) Treatment group size Comparison group existence Baseline algorithm Active subjects defined by Number of active subjects

2016 12 8 No “Hybrid” Energy-saving 3

2017 12 29 Yes “Similar day” Lottery participation 7

4 Unless otherwise specified, in the remaining part of this paper, “experiment (’16)” refers to our previous study conducted in the year 2016, and “experiment (’17)” refers to the new one in 2017.

104

4 Adapt Load Behavior as Technology Agnostic Solution

This section is organized as follows: Sect. 4.4.2 introduces the system architecture and the interface of the EnergyCoupon App. Key algorithms, including price prediction, baseline prediction, individualized target setting, and periodic lottery, are explained in Sect. 4.4.3. Experimental design is described in Sect. 4.4.4, and data analysis is discussed in Sect. 4.4.5. We finally conclude our findings in Sect. 4.4.6.

4.4.2 System Overview The EnergyCoupon system is designed to inform end-consumers of an upcoming DR event along with individualized targets, measure the demand reduction within the DR event, provide statistics and tips for energy-saving, as well as conduct periodic lotteries. Figure 4.47 exhibits the system architecture of EnergyCoupon. As the core component in the architecture, an SQL database is hosted on a server running 24/7, interacting with the data resources (shown in blue blocks), the mathematical algorithms (green blocks), and the lottery scheme (pink blocks). The EnergyCoupon App (both Android/IOS versions available) is developed and installed in the mobile phones of the treatment group. The app (interface shown in Fig. 4.48) receives and shows coupon targets, tips, and statistics from the server and enables the app user to participate in periodic lotteries. A brief overview of all the other crucial components in Fig. 4.48 is as follows: 1. SmartMeterTexas: This is the source of the electricity consumption of all endconsumers at 15-min resolution [229]. In our study, we received this information each day from a collaborating retail provider. Data is used in both baseline

Fig. 4.47 System architecture

4.4 Energy Coupon as Demand Response

105

Fig. 4.48 EnergyCoupon App interface. (a) The main page, coupon targets, and tips. (b) Usage statistics. (c) Lottery interface

2.

3.

4.

5.

6.

prediction and coupon target generation algorithms, which will be introduced in Sects. 4.4.3.2 and 4.4.3.3. ERCOT Data: This is the source of day-ahead and real-time market prices, as well as the system load in the ERCOT area [200]. Pricing data is used in the price prediction algorithm described in Sect. 4.4.3.1. Weather Data: This is the source of weather information used in the price (Sect. 4.4.3.1) and baseline prediction algorithms (Sect. 4.4.3.2). Weather information was pulled from the website of Weather Underground (a commercial weather service provider) [230]. Price Prediction: This is an algorithm whose purpose is to predict in advance whether a dynamic DR event should be announced. Our goal is to ensure that this can be done with a lead time of at least 2 hours in advance of the event, providing the participants enough time to respond to DR events. This algorithm is introduced in detail in Sect. 4.4.3.1. Baseline Estimate: This is an algorithm whose purpose is to predict the “normal consumption” of the end-consumer without considering the impact of DR. This algorithm is designed to eliminate the gaming effects described in [231] and tries to balance the accuracy of the predication and its computational cost. Details are included in Sect. 4.4.3.2. Tips and Usage Statistics: The following types of usage statistics and personalized tips are randomly shown on the user app interface: (a) high price alert based on the price prediction algorithm (Sect. 4.4.3.1) for the upcoming hours; (b) coupons acquired every day and total coupons acquired last week; (c) energy consumption for the users in the past week and estimated electricity bill based

106

4 Adapt Load Behavior as Technology Agnostic Solution

on retail price; and (d) gold, silver, or bronze medal as an indicator of a user’s saving behavior in the past week compared with other participants. All the above statistics, as well as a figure showing the detailed energy consumption curve, were included in an email and sent to the user every week, which further helps the user to better engage in the demand response program. 7. Coupon Generation: DR events are determined according to price prediction, and personalized targets are generated based on the user’s predicted baseline at the time interval when a DR event is triggered. See Sect. 4.4.3.3 for details. 8. Lottery: Periodic lotteries enable the end-consumer to convert his/her coupons earned into dollar-value gifts. See Sect. 4.4.3.4 for more details.

4.4.3 Experimental Algorithms In this section, we elaborate on the key analytics behind the experiment (’17). The methodologies introduced include price prediction, baseline estimate, coupon generation, and lottery. These analytics are important not only for this experiment but also for designing other possible demand response mechanisms.

4.4.3.1

Price Prediction

In demand response, end-consumers are incentivized to perform a load shedding or load shift from peak (wholesale price) hours to off-peak (wholesale price) hours. In order to run our EnergyCoupon system in real time, we must be capable of predicting the high price occurrences ahead of time. A lot of research has been carried out on the topic of the electricity price prediction. For example, time-series models have been used to predict day-ahead electricity prices in [232, 233]. A combination of wavelet transform and an autoregressive integrated moving average (ARIMA) model is used in this context in [234]. A hybrid solution method using both time series and a neural network is presented in [235]. In [236], spot price prediction is discussed when both load prediction and wind power generation are involved. However, our goals for price prediction are to some extent different from previous work. Since our question is whether or not to trigger the DR event for potential peak prices, the precise prediction of the market price will be less important. Instead, we only want to predict if the 30-minute average wholesale market price 2 hours later is likely to be higher than a certain threshold (in EnergyCoupon, “high price” is defined as greater than or equal to $50 per MWh). Furthermore, time-series techniques show good performance in handling data with repeating periods, such as 24 hours, and achieve high accuracy in predicting the following successive samples. While the high prices that we target in our scenario have some relation to the time of day (typically, the late afternoon), they do not have a precise correlation at a 24-hour period and are more related to events of that day (such as the ambient

4.4 Energy Coupon as Demand Response

107

temperature). Finally, for an app such as EnergyCoupon, an online algorithm with low computational complexity is preferred. Accounting for all these concerns, we design and deploy a customized decision tree to deal with price prediction in our system. The decision tree is a well-known classifier, with selected features in non-leaf nodes and labels in leaf nodes. An advantage of the decision tree is the fact that it allows for easy interpretability, which enables one to identify which features are most relevant and why. Different from the traditional approach, we have unbalanced error concerns in our EnergyCoupon system, since a false high price alert which might trigger more DR events will not induce much loss to the EnergyCoupon program because of the fixed budget for weekly lottery prizes (while the coupons issued during the DR event might be slightly depreciated). However, a failure to catch an actual high market price may have a more significant opportunity cost for a potential saving in demand response. Hence, our decision tree should have a higher tolerance to false-positive errors than false-negatives. This requirement can be captured by adjusting the penalty ratio between two kinds of errors in the training stage, though one must be careful while doing so because of the risk of overfitting the training set. An exhaustive search was conducted in the plane of two parameters, minimum leaf size and penalty ratio between the two types of errors, to address this trade-off and set the values at 70 and .1 : 8, respectively. Details are presented in [214] and omitted here given the focus of this section. Considering the DR procedure conducted in our system, we believe that a 2-houradvance notification is a reasonable time window for participants to react. Given this goal, we need to select features for our classifier from a large body of data and possible features. Since weather determines air-conditioning usage and dominates household electricity consumption in Texas, and it also has a crucial impact on renewable energy availability, five fundamental feature classes are chosen: Price (.π), Demand (P ), Temperature (T ), Humidity (H ), and Wind Speed (W ). Furthermore, we choose the temporal offsets in each feature class according to the self- and crosscorrelation between the feature and the price label. In addition, a numerical study was carried out to choose a proper threshold for our field experiments to label data (price) samples. Table 4.7 in [214] shows a prediction accuracy of over 90% in the validation dataset. Full details on training data preparation, feature selection, and performance evaluation are beyond the scope of this section. Readers may refer to [214] for more information.

4.4.3.2

Baseline Estimate

As defined by the US Department of Energy, the baseline is the “normal consumption pattern” by end-consumers without the impact of DR [202]. A daily baseline prediction algorithm is crucially important to our EnergyCoupon program since it affects the energy reduction measurement, as well as the number of coupons the participant earns during a DR event. Energy reduction for end-consumer i on

108

4 Adapt Load Behavior as Technology Agnostic Solution

D k, is calculated as the difference between interval k on a particular day D, .PDR,i D the consumer’s predicted baseline .Pbase,i k and his/her real electricity consumption D .P k (as shown in Eq. (4.27)), which can be measured by the smart meter real,i installed in his/her household with high reliability: D D D PDR,i k = Pbase,i k − Preal,i k.

.

(4.27)

There are two major concerns within the design of a baseline algorithm, namely, (i) baseline manipulation—the end-customer may intentionally increase their usage during certain periods of time in advance in order to fabricate a reduced load appearance during a DR event—and (ii) user’s dilemma: if targets are set for the endcustomer with a baseline that depends on a short window of time in the past (such as the previous few days/weeks), the baseline for a responsive user will continuously reduce in the future, resulting in potentially unattainable reduction targets with the progress of the experiment. There exist several works [214, 231, 237] that discuss these issues pertaining to conventional baseline estimate algorithms that are widely used by some major independent system operators (ISOs) in the United States [203, 204]. As one candidate solution to these concerns, the “hybrid” method adopted in our previous experiment (’16) computes a weighted average of the consumer’s own recent consumption and the whole group’s consumption [214]. However, in the postexperiment analysis, we discovered that this algorithm neither (i) eliminated gaming effects nor (ii) provided good baseline prediction because of the large diversity among residential end-consumers. However, we could not address them during the experiment in 2016 and had conjectured that a “similar day” algorithm might be a better solution [214]. The proposed “similar day” algorithm derives from the k-nearest neighbors algorithm (k-NN) and kernel regression [238, 239]. The main idea behind the algorithm is to build up a statistical model of a particular home by using a consumption dataset of that specific end-consumer for a year in advance of the experiment. Since we empirically observe that the feature that best correlates with energy usage is the ambient temperature, the algorithm focuses on finding a window of temperature that has a close fit with the temperature profile of the target time window to predict in the following manner. For instance, to predict a given 6-hour time window of a certain end-consumer in the future, the “similar day” algorithm first obtains the historical consumption for 1 year of the same user before the experiment begins. Then the candidate’s “similar” time windows are selected based on the following criteria: 1. Selected time window(s) should have the same length (6 hours) and time of the day with the predicted time period. Weekday/weekend days are treated separately, e.g., only time windows on weekdays can be selected when the target time window is on a weekday. 2. Selected time window(s) should have similar ambient temperature with the predicted time period (measured by Euclidean distance in Eq. (4.28)).

4.4 Energy Coupon as Demand Response

D,l,t TMSE =

.

109

Nt 1  (T D,t (k) − T l,t (k))2 Nt

(4.28)

k=1

D, l represents the index of the target day and a particular historical day; t ∈ {1, 2, 3, 4} is the index of the time window representing the time period of each hour ending 1–6, 7–12, 13–18, or 19–24; and .Nt is the number of samples in each section. Therefore, the day-ahead baseline in this section is calculated as the average consumption of all corresponding .Ns similar time windows for the same end-consumer:

.

D .Pbase,i k

Ns 1  D = Preal,i k Ns

(4.29)

k=1

Therefore, the “similar day” algorithm (1) predicts the baseline calculating the average consumption of the similar 6-hour time window in the history and (2) effectively eliminates the gaming effect of participants since no recent behavior (consumption data after the experiment begins) of the consumer is considered. Because of the benefits mentioned above, the “similar day” algorithm was implemented in both baseline estimate and data analysis in the recent EnergyCoupon experiment (’17) in 2017. The accuracy of the “similar day” algorithm in baseline prediction was evaluated shortly before the experiment started in 2017. Results show the average mean absolute percentage of error (MAPE) was around 20% on average for all participants, which is about the same level (15–30%) with other machine learning methodologies applied to individual households [168]. The “similar day” algorithm has been extended to other areas such as non-intrusive load monitoring (NILM) and has achieved over 80% accuracy in all tested datasets [240].

4.4.3.3

Individualized Target Setting and Coupon Generation

In the EnergyCoupon program, there are two types of DR events: “fixed” and “dynamic” events. Both types of events last for 30 minutes and can only be triggered between 1 and 7 pm each day. However, these two types of events follow quite different triggering methodologies:

Fixed DR Events We conduct a statistical analysis of historical prices in ERCOT’s real-time market [200] and observe that high wholesale market prices more often occur at certain hours in the day than others, and the “high risk” hours vary over the month of the year. By following this discovery, no more than three “fixed” DR events will show

110

4 Adapt Load Behavior as Technology Agnostic Solution

at the fixed “high risk” hours every day, and the fixed hours may be different from month to month and from weekdays to weekends.

Dynamic DR Events These are DR events that are triggered when the 2-hour advanced price prediction algorithm (introduced in Sect. 4.4.3.1) indicates that the price is likely to be higher than $50/MWh. There is no restriction on the number of “dynamic” events in a day. Sometimes we use the term “hybrid event” to denote the situation when both types of DR events can be triggered for the user. After the time period of a DR event is determined by either methodology, a multilayer coupon target is generated based on the individual predicted baseline (as shown in Fig. 4.49). Based on reaching the different levels of reduction (such as 30 and 70%) from the baseline, the participant will be given a different number of coupons. In the EnergyCoupon App, this procedure is visualized as comparing the participant’s real-time consumption with different colored areas (white, yellow, and green) of the baseline. When the consumer’s consumption lies between 70% of baseline and above, no EnergyCoupon is earned for this event; otherwise, the consumer will be awarded two EnergyCoupons when his/her consumption lies between 30 and 70% (the yellow area) and two EnergyCoupons if under 30% of the predicted baseline (the green area). We will use “coupon” as the synonym of EnergyCoupon in the rest of this section. Figure 4.50 summarizes the logic flow of a coupon target generated based on algorithms introduced in Sects. 4.4.3.1 to 4.4.3.3.

Fig. 4.49 Individual target setting

4.4 Energy Coupon as Demand Response

111

Start

Real-time price data

Historical power usage

Real-time price prediction (2 hour a head)

Baseline prediction

Yes

No Price > threshold?

Yes Fixed coupon interval?

No

Coupon target setting

No coupon target Coupon target distributed

End Fig. 4.50 EnergyCoupon algorithm flowchart

4.4.3.4

Lottery Algorithms

We use a lottery system to convert EnergyCoupons into monetary rewards. The concept of “prospect theory” that models the behavior of humans when exposed to lottery schemes [241–244] has been developed over the years. The general finding

112

4 Adapt Load Behavior as Technology Agnostic Solution

is that humans are much more risk-seeking under larger low-probability rewards engendered by using a lottery system. Hence, lotteries have the potential of attaining larger reductions from the user population than a fixed reward. We observed this same effect during earlier numerical studies [224] and hence employed a lotterybased reward system in all our field trials. In our experiment, weekly lotteries are conducted to convert end-consumers’ coupons earned during DR events into dollar-value prizes. In each lottery, a participant is allowed to bid on any number of coupons between zero and the total number of coupons in his/her account; the more coupons he/she bids, the higher the probability that he/she will win the prize. A pyramidal lottery scheme is designed, with three Amazon gift cards of face values of $20, $10, and $5 as the first, second, and third prizes each week. The brief lottery procedure is to conduct a top-down drawing at each level of the pyramid, remove the coupons of the winning user, and move to the lower level and continue. Hence, each participant will have at most three chances of winning a prize with progressively smaller rewards at each drawing. Note that if a participant chooses to only bid a portion of his/her coupons at a particular lottery game, the remaining coupons can be saved in his/her account for future use. Therefore, a participant can be strategic in choosing the number of coupons that he/she bids in each game.

4.4.4 Experimental Design 4.4.4.1

Brief Summary of Experiment (’16)

A small-scale preliminary EnergyCoupon experiment was conducted between June and August in 2016, with seven end-consumers in a residential area in Cypress, Texas, enrolled in the program. During the 12-week experiment, each participant received a few 30-minute-length DR events along with individualized coupon targets, between 1 and 7 pm every day, and was allowed to participate in lotteries with a total prize of $35 Amazon gift cards each week. Peak time estimates, individualized target settings, coupon generation, and lottery schemes followed the algorithms described in Sect. 4.4.3. A hybrid baseline prediction method was used for the baseline estimate, and the “similar day” algorithm was used in posterior data analysis. The experiment revealed a load shifting effect from peak to off-peak hours; it yielded substantial savings for the LSE, about $0.44/(week.·user) on average and $1.15/(week.·user) per active user. Readers can refer to [214] for more details.

4.4.4.2

Subject in Experiment (’17)

A larger-scale EnergyCoupon experiment was conducted in the summer of 2017, with 29 anonymous residential end-consumers in The Woodlands, Texas, who were

4.4 Energy Coupon as Demand Response

113

Fig. 4.51 Subjects in experiment (’17). (a) Treatment vs. comparison group. (b) Subgroup 1 vs. Subgroup 2. (c) Active vs. inactive subgroups. Numbers in brackets are group sizes

recruited to form the treatment group. All participants were customers of a local retail electric provider. Their participation was purely voluntary, and participants were free to quit the experiment at any time (although there was no one who quit). In addition, the retail electric provider also provided us with some residential electricity consumption data from another 16 anonymous households for the same period of time. These end-consumers formed the comparison group, and they neither participated in the DR event nor the periodic lotteries. The relationship between the treatment and comparison group is shown in Fig. 4.51a.

4.4.4.3

Procedure in Experiment (’17)

All the treatment and comparison group participants had a smart meter installed in their household before the experiment, which made their 15-minute interval of electricity consumption data available on SmartMeterTexas.com, a website endorsed by the Public Utility Commission of Texas [229]. With the permission of all participants, we were able to obtain their Electric Service Identifier ID (ESIID), register an account for them, and download their historical and real-time electricity consumption data periodically through the secure backend server located on the campus of Texas A&M University. In test Week 0 (June 10–June 16, 2017), the treatment group subjects were asked to download and install the EnergyCoupon App, get familiar with the interface, practice how to undertake energy reduction by following individualized coupon targets, and participate in a trial lottery. The electricity consumption data during this period of time was neither considered as experimental data nor used as the historical data in the baseline estimate. During the experiment, the treatment group subjects were able to see all daily “fixed” coupon targets at the beginning of each day or “dynamic” coupon targets at least 2 hours prior to the DR event. A subject who wanted to save energy and

114

4 Adapt Load Behavior as Technology Agnostic Solution

earn coupons could turn off or change the setpoints of his/her appliances during the 30-minute-length DR event period without the need of notifying the organizer. The subject’s electricity consumption would be recorded by the smart meter installed in the house, and data would become available and downloaded to the server within 36 hours after the DR event. Thereafter, each subject would be awarded coupons based on his/her coupon target achievement during the DR events. In the first 3 weeks (June 17, 2017, to July 7, 2017), all the subjects in the treatment group were faced with “hybrid” coupon targets for their demand response. Starting from Week 4 (July 8, 2017), until the end of the experiment, subjects were randomly assigned to two subgroups (Subgroups 1 and 2 or S1 and S2 for short) of almost the same size (14 subjects in S1 and 15 in S2). Subjects in S1 only received “fixed” coupon targets, while those in S2 only received “dynamic” coupon targets (Fig. 4.51b). The “similar day” algorithm was used in the baseline estimate, and coupon target generation followed the algorithm in Sect. 4.4.3.3. DR events can only be triggered between 1 and 7 pm each day. Weekly lotteries were conducted during the experiment, with each lottery cycle beginning at 12:00 am on Saturday and ending at 11:59 pm on the Friday of the following week. Lotteries are designed according to the schema explained in Sect. 4.4.3.4. In the posterior analysis at the end of the whole experiment, we further categorized all the subjects into another two subgroups according to their lottery engagements: “active” subgroup contains “active” subjects who participated in at least 5 out of a total of 11 lotteries and the remaining treatment group subjects are regarded as “inactive” subjects and are assigned to the “inactive” subgroup. Figure 4.51c shows the relationship between two dimensions of categorization (based on coupon targets and lottery engagement), as there are a total of seven active subjects among all participants, with two belonging to S1 and five belonging to S2. In contrast, among the remaining 22 inactive subjects, 12 of the inactive subjects belong to S1 and 10 belong to S2. As we have briefly described in Sect. 4.4.1, some major differences exist between the designs of the EnergyCoupon experiment (’16) and (’17). The change of the algorithm from “hybrid” to “similar day” and the removal of normalization in the baseline estimate help to increase the baseline prediction precision, as well as eliminate the gaming effect. The availability of the comparison group provides an alternative means of measuring energy-saving for the treatment group, and the assignment of S1 and S2 helps to reveal more intricate behavior of the treatment group subjects.

4.4.5 Results and Discussion In this section, we present an analysis of the data collected in our experiment (’17).

4.4 Energy Coupon as Demand Response

4.4.5.1

115

Energy-Saving for the Treatment Group

There are two ways to measure the electricity reduction for the treatment group during the experiment: by means of comparing their electricity consumption with (i) the comparison group and (ii) their own predicted baseline. Figure 4.52 exhibits the energy consumption ratio (we will call it “ratio” for short in Sects. 4.4.5.1 and 4.4.5.2) of the treatment and comparison groups following method (i). The ratio is defined as the group’s weekly consumption between 1 and 7 pm divided by their own historical consumption during the same period in the previous year (2016). A lower ratio indicates a relatively greater behavior change of making more energy reduction during the experiment than that in the previous year. Figure 4.52 also shows the energy-savings for the active subjects during the experiment. While the ratios for inactive and comparison groups are overall close to each other in most weeks during the experiment, there is a clear gap between the active subjects (red curve) and these two groups. It seems that the active subjects who have more lottery engagement also have significant better-than-average energysavings behaviors, with a maximum of around 40% in Week 8. The disadvantage of method (i) is that multiple variables between 2 years, such as temperature, are not well-controlled. Therefore, energy-saving for the treatment group cannot be characterized precisely.

Fig. 4.52 Energy consumption ratio at 1–7 pm for experiment (’17) based on the consumption of the same days in 2016

116

4.4.5.2

4 Adapt Load Behavior as Technology Agnostic Solution

Comparison Between Active and Inactive Subjects in Treatment Group

As introduced above, method (ii) calculates the energy consumption ratio using the subject’s own estimated baseline as the denominator. Figure 4.53 shows ratios of active and inactive subgroups and the entire treatment group. We will show in the following paragraphs that the observation of energy-savings using method (ii) is like method (i) in some sense. The performance of the inactive subgroup is consistent, with the ratio around .1.0 in most weeks and never falling below .0.9. This is in line with our intuition that less engagement in the lottery is a sign of a lack of enthusiasm about energy-saving via the EnergyCoupon program. Since inactive subjects form the majority of the treatment group (as shown in Fig. 4.51), the gap between the inactive subjects and the average value is minor, and this is partially because the majority of subjects in the treatment group are inactive subjects. In contrast, the curve for the active subgroup is far below the other two curves, indicating a significant energy-saving and load pattern change for active subjects during the experiment. Energy-savings for the active subgroup gradually increase in the first few weeks and reach a peak at about 40% in Week 8. After Week 9, the saving begins to decline, until being only 10% in Week 11. The rebound of the ratio can be explained by the arrival of Hurricane Harvey, which was in the area for the end of Week 10 and the whole of Week 11. Flooding and potential house repair

Fig. 4.53 Energy consumption ratio at 1–7 pm for active/inactive subjects by week, based on their baseline

4.4 Energy Coupon as Demand Response

117

Fig. 4.54 Daily consumption vs. baseline for active/inactive subjects during the week (July 29– August 4, 2017). (a) Active subjects. (b) Inactive subjects

likely distracted many of the subjects from participating in the DR program during that time. To better visualize the load pattern change for the active subjects, 1 week during the experiment (July 29–August 4, 2017) was selected as an example, and the daily average of electricity consumption vs. baseline is illustrated for both active and inactive subgroups (Fig. 4.54). It can be calculated that for this particular week, energy-saving during 1–7 pm for active subjects was 28.9%, while that of inactive subjects was only .−0.2%. The close-to-zero energy-saving for inactive subjects is unsurprising, and it also supports the precision of our baseline estimate algorithm to some extent. However, the surprising finding from Fig. 4.54a is the load shedding effect in non-peak hours (25.0%). This observation clearly conflicts with the assumption of pure load shifting in our previous paper [214]. Therefore, an assumption could be created that there is some “inertia” in demand response; incentivized energy reduction in peak hours would influence that of off-peak hours.

4.4.5.3

Comparison Between Subjects in Treatment Group Facing Fixed/Dynamic Coupons

Beginning with Week 3 and until the end of the experiment, the treatment group subjects were randomly assigned into two subgroups, S1 and S2, facing “fixed” and “dynamic” coupon targets, respectively. We aim to discover how different types of coupon targets could have an impact on end-consumers’ energy-saving. The energysavings for two subgroups S1 and S2 during 1–7 pm are exhibited in Fig. 4.55a. As observed from Fig. 4.55a, the subjects in S1 and S2 cannot be considered homogeneous, as the energy-saving (35 vs. .−5%) was quite different in Weeks 1–3 when they were facing the same “hybrid” coupon targets. For the following weeks

118

4 Adapt Load Behavior as Technology Agnostic Solution

Fig. 4.55 Behavior comparisons between subjects in S1 and S2. (a) Average energy-saving at 1–7 pm. (b) Coupon target achievement percentage

(3–10), i.e., when subjects were separated with different coupon targets, we see an “activation” phenomenon by the dynamic coupon targets, as S2’s savings jump from .−5 to 15%, while no such effect is observed for S1 subjects. In Week 11, the energysaving for S2 returns to the initial level. This can be attributed to the hurricane, as mentioned before. Figure 4.55b illustrates the coupon target achievement ratios for active subjects in two subgroups. The ratio is defined as the proportion of DR events that the subjects at least earn one coupon (which is equal to a reduction of at least 30% energysaving from their baseline). Comparing Fig. 4.55a and b, an interesting finding is that although S1 has overall higher energy-saving than S2 in all periods, both subgroups reach a similar level of coupon achievement. One possible explanation for this observation is that S1 subjects facing “fixed” DR events would prefer to program their home appliances (such as AC) in advance to hit all coupon targets and do not change their setpoints frequently, while S2 subjects facing “dynamic” DR events tend to check the app and DR events more frequently and try to “play” to catch the coupon targets which only started to appear 2 hours before real time. Figure 4.56 shows the load patterns for two active subjects in S1 and S2 as an example and how the subject in S1 reduces the consumption for the entire afternoon vs. S2 moving his/her consumption to catch the yellow targets.

4.4.5.4

Financial Benefit Analysis

In our earlier analytical model and numerical studies, we assumed that all the subjects perform pure load shifting from peak to off-peak hours [214]. With such an assumption, the DR program would lead to a win-win situation with positive financial benefits to both the retail provider and active end-consumers. The brief

4.4 Energy Coupon as Demand Response

119

Fig. 4.56 Energy consumption curve for two active subjects on July 19, 2017. (a) Subject no. 19 (in Subgroup 1). (b) Subject no. 18 (in Subgroup 2) Table 4.8 Financial benefit of the retail provider and active subjects Subjects Savings ($/(week.·subject))

Retail provider − 2.7 − 4.0 = −4.1

.2.6

Active subjects + 2.7 = 6.7

.4.0

explanation of this is that with pure load shifting in DR, the retailer will not lose its retail revenue and can purchase electricity at the time period with lower wholesale market prices. At the same time, end-consumers earn a reward for their energysaving behavior. However, our finding of load shedding behaviors in Sect. 4.4.5.2 conflicts with this pure load shifting assumption. Therefore, it is not obvious that the retail provider and end-consumers can still reach a win-win situation as described before. Below is the analysis we did using the newly gathered data in the experiment (’17). The net benefit for the retail provider consists of three parts: (i) the savings in (wholesale) electricity purchase in high-price hours, (ii) the decrease of sales revenue due to the load shedding effect, and (iii) the cost of rewards issued to lottery winners. Our calculation shows that the saving in three parts is $2.6, $.−2.7, and $.−4.0/(week.·subject). Because of the load shedding effect, the benefit in (a) is not enough to cover the loss in (b), and the retail provider suffers a net loss of around .−$4.0 for each active user per week. Note that this loss is localized to the year 2017 because of the low oil prices and consequent low electricity peak prices in the summer of that year. Had the same DR program been conducted in other years, such as 2019, it might have yielded substantial benefits because of the record-breaking high prices [245] (Table 4.8). An active subject, in contrast, on average receives $4.0 in lottery rewards per week from the retail provider; at the same time, the load shedding effect leads to the decrease of his/her electric bill by around $2.7 per week. Therefore, our EnergyCoupon program brings positive financial benefits to active subjects.

120

4 Adapt Load Behavior as Technology Agnostic Solution

Although the DR program may not bring a win-win situation to both the retail provider and end-consumers, it still does increase the social welfare on the demand side, as the summation of benefits is positive ($2.6/(week.·active subject)). We can also conclude that the financial benefit of the retail provider in the DR program is closely related to the load shifting/load shedding pattern of each subject; in experiment (’17), the load shift is minor, and the load shedding is major, and the cost-savings from its wholesale electricity purchase may not cover the loss of its retail revenue, which leads to a net financial loss to the retail provider. The retail provider’s profitability can also be affected by other factors such as (1) real-time electricity price (the high real-time electricity price would increase the value of demand reduction and therefore increase the profit for the retailers); (2) the management of the lottery budget, with a proper choice of prize, that could possibly decrease the total cost of the lottery while keeping demand reduction at an acceptable level; and (3) possible subsidies on demand response programs in some countries or grids. It is worth noticing that since the factors mentioned above could vary with different human participant groups, in different physical areas, or even in different years in the same area, there is no general conclusion as to whether the EnergyCoupon (or other DR programs) would help the retail provider to save money or not.

4.4.5.5

Influence of the Lottery on Human Behavior

As discussed in Sect. 4.4.3.4, the lottery scheme is considered to provide an incentive to promote desirable behaviors (such as more energy-savings and participation) of the treatment group. Table 4.9 lists numbers showing that the influence of periodic lotteries on participant behavior. The first column in Table 4.9 shows that winning a lottery prize has a positive impact on future energy-savings, as lottery winners make an average energy-saving improvement of 10.7% in the next lottery cycle.5 In contrast, the average energysaving improvement for participants who win nothing is close to zero (.−0.03%). The second and third columns clearly demonstrate that lottery winners, on average, tend to have higher engagements than other participants in the next lottery (56.6 vs. 40.0%) and the next three lotteries (80.5 vs. 70.0%). Therefore, we can summarize Table 4.9 Subjects’ behavior changes due to lottery Subjects

All winners Other participants

5 As

Energy-saving improvement (1–7 pm) 10.7 .−0.03

Next lottery participation prob. (%) 56.6 40.0

Prob. of at least one participation in the next three lotteries (%) 80.5 70.0

an example, 1% improvement means if this week’s saving is 10%, next week will be 11%.

4.4 Energy Coupon as Demand Response

121

Table 4.10 Comparison between EnergyCoupon and previous experiment Comparison item Category Treatment group size Experiment length Number of DR days Peak hours Energy reduction Effective cost compared with retail price.b a b

Critical peak pricing Price-based DR 71 June–October Certain CPP days (12) Noon–6 pm 12% 368%

EnergyCoupon Incentive-based DR 29 June–August Daily (77) 1–7 pm 10.7%.a 58.8%

Electricity reductions for active and inactive subjects are 34.8 and 7.36%, respectively We choose the retail price in Anaheim in 2005 as $0.095/kWh[246] and the average retail price in The Woodlands, TX, in 2017 as $0.090/kWh[213]

that the lottery prize has a positive impact on both energy-saving and lottery engagement in future lottery cycles.

4.4.5.6

Comparison with Previous CPP Experiment

In this subsection, we compare our EnergyCoupon experiment with a typical pricebased DR experiment conducted in Anaheim, California, in 2005 [208]. Critical peak pricing (CPP) was used in this experiment. In this study, CPP days are selected based on a price prediction algorithm and on CPP days during noon–6 pm. Subjects in the treatment group receive $0.35 for every kWh reduction from the baseline. Some comparisons of these two experiments are listed in Table 4.10. We observe that our experiment reached a similar level of energy reduction to that of the CPP experiment (10.7 to 12%). Since our EnergyCoupon provides DR events every day compared to only 12 CPP days in the CPP experiment, the EnergyCoupon project helps to save a much higher amount of energy in total. In addition, effective cost is calculated as an indicator of cost-saving efficiency for each experiment, as it is defined as, on average, the money it costs the retailer to pay for participants’ reducing 1 kWh of electricity during peak hours. This value in our experiment is calculated by the total value of lottery prizes divided by the energy reduction for all treatment group subjects. Data analysis shows an effective cost in our experiment of $0.053/kWh, which is only 1/7 of that in the CPP experiment (which is directly given as $0.35 in the experimental design).

4.4.5.7

Cost-Saving Decomposition

Table 4.10 shows the significant difference between the effective costs between two demand response experiments. As an indicator of relative cost-effectiveness, the effective cost-saving ratio (ECSR) is defined as the ratio of effective cost

122

4 Adapt Load Behavior as Technology Agnostic Solution

(normalized by retail price) between two experiments. If we use the CPP experiment [208] as the reference case, the ECSR of EnergyCoupon is calculated as ESCR =

.

CCP P CEnergyCoupon

=

368% = 6.34 58%

(4.30)

ECSR > 1 indicates that the EnergyCoupon is more cost-effective than the CPP. There are different factors that may contribute to the high ECSR value shown in Eq. (4.30), such as (i) the innovative coupon design in the CIDR mechanism, (ii) the development of the EnergyCoupon mobile app that improves the communication with participants, as well as (iii) the lottery scheme that encourages more participation because of humans’ risk-seeking behavior. In this section, we are interested in how the lottery scheme (factor (iii)) contributes to the cost-savings in our EnergyCoupon program. If we assume that all the factors listed above contribute independently to ECSR and can be measured by multipliers, then .

αβ = ECSR =

.

CCP P = 6.34 CEnergyCoupon

(4.31)

where .α and .β are multipliers representing the contribution of the lottery scheme (factor (iii)) and other factors ((i), (ii), etc.), respectively. The value .α can be estimated using cumulative prospect theory. As a behavioral game theory, this theory describes the individual choice between risky probabilistic alternatives [44]. It models the probability weighting and loss aversion, which lead to the overweighting of small probabilities and the underweighting of moderate and high probabilities. In the game with potential outcomes .x1 , x2 , . . . , xn and respective probabilities .p1 , p2 , . . . , pn , a gain prospect .f = (x1 p1 ; x2 , p2 ; . . . ; xn , pn ) describes a prospect result in the multiple outcome .xi with probability .pi , i ∈ {1, 2, . . . , n}, and (i)xi < xj , if i < j, i, j ∈ {1, 2, . . . , n}

.

(ii)

n 

pi = 1.

i=1

For instance, in EnergyCoupon, on average, each active subject has approximately 7.0% chance to win each prize ($20, $10, and $5) in the weekly lottery; the probability for an inactive user to win each prize is around 2.3%. Therefore, the prospect of each active/inactive subject faces (.fa and .fb ) can be described as fa = ($0, 0.79; $5, 0.07; $10, 0.07; $20, 0.07),

.

fb = ($0, 0.931; $5, 0.023; $10, 0.023; $20, 0.023), and .n = 4 for both prospects.

(4.32)

4.4 Energy Coupon as Demand Response

123

The prospect theory defines the utility of a certain prospect f as V (f ) =

N 

.

πi V (xi )

(4.33)

i=0

where V is the utility function, .πi are decision weights calculated as πi = ω(pi +, . . . , +pn ) − ω(pi+1 +, . . . , +pn ), 0 ≤ i ≤ N

.

(4.34)

and .ω is the probability weighting function. Equation (4.33) can be explained as the utility of a prospect f equaling the sum of all decision weights .πi times the utility of the corresponding outcomes .xi . It is worth noting that decision weight has close correlation with the probability .pi , but they may have different values. Deviation of .πi from .pi represents the way the lottery scheme “distorts” human beings’ feeling for the probabilities. Furthermore, we introduce the equivalent of prospect f as c, which can also be described as V (c) = V (f ),

.

(4.35)

Therefore, the equivalent for prospect .fa as .ca represents the fixed return an active participant receives that would make him/her indifferent between choosing the fixed return .ca and playing in the lottery .fa . The same explanation applies to .cb (inactive users). Given the total number of active users .Na and inactive users .Nb , the total equivalent c = ca Na + cb Nb ,

.

(4.36)

would be an estimate of the total direct cash needed in the experiment to maintain the same level of incentive to the treatment group if no lottery scheme is adopted. In the EnergyCoupon experiment, .Na = 7 and .Nb = 22, which reflect the number of active/inactive participants. As the next step, we would like to get an estimate of values. Combining the definition of fixed-return equivalent (4.35) with Eq. (4.36), we have V (c) =

N 

.

πi V (xi ).

(4.37)

i=0

Since, in our experiment, each lottery prize is relatively small (.xi < $200, i ∈ 1, 2, 3, 4), the utility function is linear and can be removed from both sides of (4.33) [244] as

124

4 Adapt Load Behavior as Technology Agnostic Solution

c=

N 

.

πi xi .

(4.38)

i=0

By combining Eqs. (4.38) and (4.34), we can calculate the equivalent per active/inactive user. We take a typical active user as an example. The equivalent of prospect can be calculated as .

ca = π2 × 5 + π3 × 10 + π4 × 20, .

(4.39)

π2 = ω(0.07 × 3) − ω(0.07 × 2) = ω(0.21) − ω(0.14), .

(4.40)

π3 = ω(0.07 × 2) − ω(0.07) = ω(0.14) − ω(0.07), .

(4.41)

π4 = ω(0.07).

(4.42)

The value can be estimated from Fig. 4.47 in reference [242], as median .c/x in prospect .(0, 1 − p; x, p) is an estimate of .ω(p). From the curve .x < 200, we get the value .ω(0.21) = 0.26, .ω(0.14) = 0.22, and .ω(0.07) = 0.16. Therefore ca = (0.26 − 0.22) × 5 + (0.22 − 0.16) × 10 + 0.16 × 20 = 4.0

.

(4.43)

Similarly, we can calculate the equivalent .cb = 2.4. According to Eq. (4.36), the total equivalent for all participants .c = 4.0 × 7 + 2.4 × 22 = $80.8. Total equivalent c shows the estimate of direct cash needed in our experiment to maintain the same level of incentive to the treatment group if no lottery scheme is adopted. Therefore, multiplier .α is estimated as the ratio of equivalent cash divided by total weekly lottery prizes .α = 80.8/35 = 2.3. According to Eq. (4.31), .β = 2.75, and we can conclude that the lottery scheme and other EnergyCoupon designs have similar levels of contribution to reducing the effective cost in our experiment.

4.4.6 Conclusion This section presents the design and critically assesses the empirical experiment of a coupon incentive-based demand response program for end-consumers over a 2-year period in Houston, TX, area. Different from traditional price-based DR programs, EnergyCoupon has the following features: (1) dynamic time-of-use DR events and individualized coupon targets; (2) end-consumers that receive coupon targets and usage statistics through a mobile app; (3) voluntary participation in demand response events, and (4) periodic lottery that allows the participant to convert their coupons into dollar-value prizes. Data analysis shows the significant load shedding effect for the treatment group; however, not much load shifting effect is observed. In addition, we observe the positive impact of lottery prizes on the growth of desirable behavior, such as energy-saving improvements and lottery participation.

4.4 Energy Coupon as Demand Response

125

Our posterior analysis also shows that EnergyCoupon has a much lower effective cost (¢5.3/kWh) compared to previous CPP projects (¢35.0/kWh). Using prospect theory, we estimate that the design of system architecture and lottery scheme has equal-level contributions to the cost-savings. This section is generalizable toward other Internet-of-Things-enabled demand response activities and could shed light on the overall discussion of incentive-based versus price-based demand response. Future work would examine the value added by obtaining consumer behavior data in this experiment. Another possible avenue of future work is to further develop a platform that allows for the end-consumers to aggregate and participate in wholesale-level ancillary services.

Chapter 5

Use of Energy Storage as a Means of Managing Variability

5.1 Adding Storage to the Mix Distributed energy resources have been proposed as a promising solution to make households self-sufficient and increase power supply reliability. In this section, we first examine the reliability values of distributed solar+storage systems, considering their possibility of survival during a rare weather event or after rapid deployment thereafter. To this end, we develop a theoretical basis and rule-of-thumb formulas for estimating the reliability value of a solar+storage system. The analytical results are used to calculate the optimal solar+storage capacities when reliability values are considered on top of the economic values and investment costs. Furthermore, to improve the accuracy of the proposed reliability value, we introduce a datadriven correction term based on realistic simulations using hourly electric load from 600 households in Austin, Texas. Case studies demonstrate that 50% of households can achieve over $196.55 in annual reliability values by installing a solar+storage system with typical capacities. Combining our theoretical model with empirical data, we obtain an approach for a fast yet accurate evaluation of the reliability value of solar+storage systems while avoiding the need for complex simulations. Therefore, our proposed theoretical analysis enables unsophisticated individual consumers to gauge the reliability benefits of distributed solar+storage systems amidst rare weather events when making an investment decision.

5.1.1 Introduction Power grids have been traditionally designed to be reliable during normal weather conditions and in response to foreseeable contingencies on adverse weather conditions. However, as high-impact and low-probability events, rare weather events pose great challenges to the secure operation of power grids and the reliability of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Xie et al., Data Science and Applications for Modern Power Systems, Power Electronics and Power Systems, https://doi.org/10.1007/978-3-031-29100-5_5

127

128

5 Use of Energy Storage as a Means of Managing Variability

the power supply [247]. A list of major power outages in the United States caused by rare weather events is shown in Table 5.1. In the United States, the annual economic losses caused by weather-related damage to transmission and distribution networks range from $20 to 55 billion [248]. Despite low probability, rare weather events usually cause severe consequences and damage to power grids, thereby leading to inadequate power supplies and even blackouts. Distributed energy resources (DERs) have been advocated as a promising solution to make households self-sufficient and increase system-wide power supply reliability [249]. With the development of photovoltaic and battery storage technologies, an increasing number of electricity consumers have installed solar+storage systems. These DERs can reduce consumer electric bills, satisfy local electricity load, and increase consumer reliability values against rare weather events. Several studies have focused on the self-sufficiency of electricity consumers supported by DERs and the evaluation of reliability values of DERs. In [250], a community biomass plant, biogas plant, and solar heater are proposed as an economical solution for electric power and gas demands in an Indian village. In [251], renewable energy resources are integrated into the supply network of commercial buildings. A mixed-integer programming model is developed to maximize the self-sufficiency of the buildings. Case studies based on a commercial company quantify the reliability values of renewable energy resources. In [252], a realistic upper limit to the household self-consumption level is obtained by scheduling electric appliances in a building. Based on hourly solar power and electric load data from 200 Swedish households, case studies demonstrate that optimally scheduling electric appliances can increase solar consumption by approximately 200 kWh per year for a household. In [253], a system comprising combined heat and power, solar, and battery storage is proposed to improve household self-sufficiency. An optimal planning model is formulated to minimize the total operating and investment costs of DERs. Case studies based on 30 households indicate that the power supply reliability can be highly improved by installing these DERs. In [254], a mixedinteger linear optimization model is formulated to minimize the total operating and investment costs for solar+storage systems. The model is employed to study the effects of the temporal resolution of electric load and solar power on household self-sufficiency and optimal sizing results. In [255], a model is proposed for predicting household self-consumption and the optimal sizing of solar+storage systems. Case studies based on various European households illustrate that DERs can increase household reliability values, but achieving 100% self-consumption is not cost-efficient. In [256], a model is presented to optimize the sizing of solar+storage systems in a household under different feed-in tariffs. The results show that solar+storage systems can increase consumers’ reliability values and economic benefits in Germany, but they are not yet profitable in Ireland. In [257], the reliability values of battery storage systems are investigated. An optimization model is proposed to maximize household self-sufficiency, and case studies based on 2000 Swedish households are conducted with batteries of different capacities. In [258], the reliability values of renewable energy resources are evaluated in a renewable-

5.1 Adding Storage to the Mix

129

Table 5.1 Major power outage caused by rare weather events Time November 2015

Location Spokane, Washington

Events A windstorm

September 2016 February 2017

Tallahassee, Florida Scranton, Pennsylvania Northeastern United States

Hurricane Hermine A storm

March 2017

Michigan

A windstorm

September 2017

Southeastern United States

Hurricane Irma

September 2017

Puerto Rico, Dominica, etc.

Hurricane Maria

March 2017

A thunderstorm

Impacts Power lines were damaged, leading to over .161,000 people living without electricity Over .350,000 people lived without power for 1 week The storm caused .285,000 people to be without power for more than 3 days Ten million people from New York, New Jersey, Maryland, and Pennsylvania lived without electricity Approximately one million people were without power for 2 days Six million in Florida, 1.3 million in Georgia, and 0.2 million in South Carolina lost power Over 547 people were killed, and the total economic losses reached $103.45 billion. Many people in Puerto Rico lost power for several months

powered system. By comparing the system operation with and without renewable power, the loss of load expectation (LOLE) can be calculated based on different setups for load profiles. Additionally, some literature has investigated the impacts of DERs on the resilience and recovery of power grids against rare weather events. In [259], renewable energy resources are dispatched for line hardening, which effectively improves the resilience of distribution grids. In [260], a resilience-oriented planning technique is proposed for hardening distribution lines and deploying DERs and automatic switches. In [261], micro-grids with wind turbines and photovoltaics are used to restore critical load after a natural disaster, enhancing the resilience of a distribution grid. In the existing literature, the household self-sufficiency and reliability values of DERs have been evaluated by formulating and solving optimization programs [262], which minimize household operation costs or maximize DER reliability values. The optimization programs usually consider detailed modeling for DERs, e.g., the dynamics of stored energy for a battery, forecast errors for behind-the-meter solar power, and so on [263]. However, it is challenging for a real-world household to implement such optimization models for self-sufficiency levels and DER reliability values. With the rapid development of distributed solar and storage sources, it is imperative to develop a simple and straightforward formula for DER reliability that can be used by unsophisticated users to make investment decisions. In this section, we aim to address the public’s interest regarding how selfsufficient a household can be when installing solar+storage and how to quantify

130

5 Use of Energy Storage as a Means of Managing Variability

the reliability value of solar+storage after a rare weather event breaks down the connected power grid. Thus, we examine the reliability values of distributed solar+storage systems and provide guidance for electricity consumers living in areas vulnerable to potential rare weather events. The major contributions are as follows: 1. We develop a theoretical estimate for the reliability value of a distributed solar+storage system after a rare weather event, which enables unsophisticated individual consumers to gauge their self-sufficiency levels and reliability values. 2. We provide analytical expressions for optimal solar+storage sizing, which yields in a simple way to calculate the optimal solar+storage capacities when the reliability values are considered on top of the economic values and the investment costs. 3. The accuracy of the proposed theoretical estimates is validated against realistic simulations based on 600 households in Austin, Texas. We introduce a datadriven correction term to our model using hourly load and solar power data. Our proposed theoretical analysis provides unsophisticated individual consumers with a simple account for the reliability benefit of distributed solar+storage systems amidst rare weather events when making investment decisions.

5.1.2 Formulation In this section, a household has an individual load, a solar panel, and a battery storage system. We consider the daily operation of such a system. The daytime period (i.e., the time period when solar production is positive) is denoted by .Td , and the nighttime period is denoted by .Tn .

5.1.2.1

Solar Generation

The solar radiance during the daytime .Td is modeled as a stochastic process r(t) : t ∈ Td with its probability distribution estimated from historical data [264]. For a solar panel with capacity .Ks , if not damaged during the rare weather event, the solar generation process .s(t) : t ∈ Td relates linearly to the solar radiance process:

.

s(t) = r(t)Ks , t ∈ Td ,

.

(5.1)

where we assume that the radiance process is already scaled by the efficiency parameter related to the cell temperature of the solar panel [265]. The accumulated solar generation during the day is 

 S(Ks ) =

.

Td

s(t)dt = Ks

Td

r(t)dt = Ks R,

(5.2)

5.1 Adding Storage to the Mix

131

where R is a random variable whose distribution can be derived from the distribution of the solar radiance process.

5.1.2.2

Load

In this section, the load is the electrical load in a household, representing the aggregation of electrical appliances. We model the load as a stochastic process .l(t) : t ∈ Td ∪ Tn with a probability distribution that can be estimated from historical data. The accumulated daytime and nighttime loads are denoted by .Ld and .Ln , respectively:  Ld =

.

5.1.2.3

 Td

l(t)dt, Ln =

Tn

l(t)dt.

(5.3)

Storage

After a power grid breakdown, the reliability value of battery storage lies in the functionality of storing excess solar energy during the day to be used during the night. For simplicity and with the aim of providing guidance to unsophisticated consumers, we use an idealized storage model that only accounts for the energy capacity of the storage, denoted by .Kb . The discrepancy between this model and a more realistic model, with the power limit and charging/discharging losses of the energy storage operating on a finer time granularity, is quantified and corrected with an empirical term in the case study section. It is worth mentioning that the economic value of battery storage is the profit via its arbitrage against time-of-use rates when the power grid normally operates, as described in Sect. 5.1.2-E. As such, the random charge during the day .C(Ks , Kb ) and discharge during the night .D(Ks , Kb ) are .

C(Ks , Kb ) = min{(S(Ks ) − Ld )+ , μb Kb }, .

(5.4)

D(Ks , Kb ) = min{C(Ks , Kb ), Ln },

(5.5)

where .(·)+ = max{·, 0}. In (5.4), the charge during the day .C(Ks , Kb ) is restricted by the surplus solar energy .(S(Ks ) − Ld )+ and the usable storage capacity .μb Kb . .μb represents the depth of discharge (DOD) of the battery storage. In (5.5), the discharge during the night .D(Ks , Kb ) is restricted by the charge during the day and a consumer’s night load. Note that the solar capacity .Ks and the storage capacity .Kb are measured in kW and kWh, respectively.

132

5 Use of Energy Storage as a Means of Managing Variability

5.1.2.4

Reliability Value

In this section, p denotes the probability of the breakdown of a power grid connected with households, which is caused by rare weather events. In practice, the probability p can be estimated by using statistical models or simulation methods based on historical data of network failures and weather conditions [266]. Several models have been proposed in the existing literature to depict power system damage, outage duration, and restoration after rare weather events [267]. Among these models, the negative binomial generalized linear model (NBGLM) is comprised of the most commonly used approaches, and it allows regression analysis of count data. NBGLM comprises two components, (i) a conditional distribution for the count of rare weather events and (ii) an equation relating the distribution parameters to a function of explanatory variables [268], shown as follows: fY (y|α, λ) =

.

(y + α) (y + 1)(α −1 )



α −1 α −1 + λ

α −1 

 λ ,. α −1 + λ

log(λ) = βX

(5.6) (5.7)

where y represents the number of grid outages caused by rare weather events leading to blackouts of the household and .fY (·) is the probability function of y. .α is the overdispersion parameter of the negative binomial distribution. .X is the vector of explanatory variables. For example, .X for hurricanes usually includes maximal wind speed, duration of strong winds, number of distribution lines and transformers, and so on [269]. .λ is a parameter related to explanatory variables, and .β is the vector of regression parameters to be estimated. Therefore, given .fY (y), the cumulative distribution function .FY (y) can be obtained. Then, the probability p is p = 1 − FY (0),

.

(5.8)

where .FY (0) indicates the probability that there are no grid outages leading to blackouts of the household. Conditioning the event on a power grid breakout and during the days without the power grid, for a household with solar+storage capacities .(Ks , Kb ), let the accumulated loss of load be .(Ks , Kb ). Then, the reliability value of a solar+storage system for a household is defined as follows: Definition 5.1 The reliability value .VR (Ks , Kb ) of the solar+storage systems with capacities .(Ks , Kb ) is defined as the value of the avoided loss of load, i.e., VR (Ks , Kb ) = pπR E[(0, 0) − (Ks , Kb )],

.

(5.9)

where the operator .E represents the expectation of a random variable. .πR is the household’s value of (unit) lost load (VOLL), which can be obtained from consumer surveys [270] and consumer production functions [271]. In (5.9), .(0, 0) −

5.1 Adding Storage to the Mix

133

Fig. 5.1 Illustration for the reliability value of solar+storage systems

(Ks , Kb ) represents the avoided accumulated loss of load by using solar+storage with capacities .(Ks , Kb ). The reliability value of solar+storage systems is illustrated in Fig. 5.1. The left figure in Fig. 5.1 shows that without solar+storage systems, all-electric load is unserved in a household after a power grid breakout. However, in the right figure in Fig. 5.1, the daytime load can be served by solar power, and the storage can shift the surplus solar power to satisfy parts of the nighttime load, effectively reducing the loss of load. Therefore, the reliability value of solar+storage systems is defined as the avoided loss of load.

5.1.3 Optimal Investment Problem The ability to characterize the reliability value allows consumers to make more informed decisions regarding solar+storage investment. Let .VE (Ks , Kb ) and .VF (Ks , Kb ) be the economic value and investment cost, respectively, of a solar+storage system with capacities .(Ks , Kb ). In this section, the economic value refers to the profits brought by using and selling solar energy and the profits via a battery’s arbitrage. Then, the optimal investment decision is characterized by the optimal capacities that solve the following optimization: .

max J (Ks , Kb ) = VE (Ks , Kb ) + VR (Ks , Kb ) − VF (Ks , Kb ),

Ks ,Kb ≥0

(5.10)

where .J (Ks , Kb ) represents a household’s total benefits by investing in solar+storage. Solving the problem above would require a detailed model about the economic value of solar+storage and the investment costs, which is outside the scope of this work. See [272, 273] for theoretical treatments and [274] for empirical studies on this topic. For the investment results in this section, we use a simple model for the economic value and investment cost:

134

5 Use of Energy Storage as a Means of Managing Variability .

VE (Ks , Kb ) = αs Ks + αb Kb , .

(5.11)

VF (Ks , Kb ) = βs Ks + βb Kb + γs IKs >0 + γb IKb >0 ,

(5.12)

where .αs and .αb > 0 are the annual economic value for each unit of solar and storage capacity, respectively; .βs and .βb > 0 are the investment cost per kW and kWh for solar and storage, respectively; and .γs and .γb > 0 are the one-time fixed and installation costs for solar and storage systems, respectively, modeling the costs of inverters and labor for installation [275]. .Ix is equal to 1 when the variable .x > 0 and 0 when .x ≤ 0.

5.1.4 Main Results In this section, we provide a closed-form expression for the reliability value of a solar+storage system. Therefore, we can provide analytical expressions for the optimal solar and storage capacities when the reliability values are considered on top of the economic values and the investment costs.

5.1.4.1

Reliability Value and Optimal Investment Decision

Theorem 5.1 The reliability value of a solar+storage system with capacities (Ks , Kb ) is VR (Ks , Kb ) = pπR Nrec E[min{S(Ks ), Ld } + min{C(Ks , Kb ), Ln }],

.

(5.13)

where Nrec is the number of days in a year and VR represents the annual reliability value. The storage charging C(Ks , Kb ) is defined in (5.4). The first term in the bracket, min{S(Ks ), Ld }, is the daily daytime reliability value, and the second term, min{C(Ks , Kb ), Ln }, is the daily nighttime reliability value. See [276] for the proof of Theorem 5.1. Then, we have the following observations regarding the reliability formula. Remark 5.1 The reliability value can be alternatively expressed as VR (Ks , Kb ) = pπR Nrec E[min{S(Ks ), Ld + Ln }

.

+ min{((S(Ks ) − Ld )+ − μb Kb )+ , (Ln − μb Kb )+ }],

(5.14)

where the first term in the bracket, min{S(Ks ), Ld + Ln }, quantifies the ideal reliability value of the solar panel if sufficient storage is available and the second term, min{((S(Ks ) − Ld )+ − μb Kb )+ , (Ln − μb Kb )+ }, quantifies the loss due to insufficient storage capacity. To identify the optimal solar and storage capacities, we note the following structural property of the reliability value as a function of (Ks , Kb ).

5.1 Adding Storage to the Mix

135

Lemma 5.1 The function VR (Ks , Kb ) is nondecreasing and concave in (Ks , Kb ). See [276] for the proof of Lemma 5.1. The concaveness allows us to solve the optimal investment problem using the first-order conditions of the optimization, which leads to the following equations: hs (Ks , Kb )

.

   = αs − βs + pπR Nrec E R IKs R≤Ld + IKs R>Ld IKs R−Ld ≤min{μb Kb ,Ln } , . (5.15) hb (Ks , Kb ) = αb − βb + pπR Nrec μb P (μb Kb ≤ min{Ln , (Ks R − Ld )+ }) .

(5.16)

See [276] for the derivations. Given the discrete nature of the investment cost, the optimal investment involves comparing several possibilities. In particular, let (Kˆ s , Kˆ b ) be the solution to equations hs (Ks , Kb ) = hb (Ks , Kb ) = 0, let (K¯ s , 0) be the solution to equation hs (Ks , 0) = 0, and let (0, K¯ b ) be the solution to equation hb (Ks , Kb ) = 0. Then, we find the optimal investment by comparing the values achieved by candidates in K = {(Kˆ s , Kˆ b ), (K¯ s , 0), (0, K¯ b ), (0, 0)}. Theorem 5.2 The optimal investment decision (Ks∗ , Kb∗ ) is the solution to the following program: .

5.1.4.2

max

(Ks ,Kb )∈K

J (Ks , Kb ).

(5.17)

Example: Deterministic Case

We consider a simple scenario where the load and solar radiance are deterministic. To better understand the reliability values of solar+storage systems, we assume that the economic benefits of solar+storage systems are not sufficient to cover the investment costs, i.e., αs − βs < 0, αb − βb < 0.

.

(5.18)

Lemma 5.2 In a deterministic case, the optimal investment .(Ks∗ , Kb∗ ) is one of the three decisions, i.e., .(0, 0), .(Ld /R, 0), and .((Ld + Ln )/R, Ln /μb ), shown as follows: ⎧ (0, 0), y1 ≤ 0, y2 ≤ 0, ⎪ ⎪ ⎨ ∗ ∗ (Ld /R, 0), y1 ≥ 0, y1 ≤ y2 , .(Ks , Kb ) = ⎪ ⎪ ⎩ ((Ld + Ln )/R, Ln /μb ), y2 ≤ 0, y2 ≤ y1 ,

(5.19)

136

5 Use of Energy Storage as a Means of Managing Variability

where Ld − γs , . R Ld Ln y2 = x1 + x2 − γ s − γb , . R R y1 = x1

.

(5.20) (5.21)

x1 = αs − βs + pπR Nrec R, .

(5.22)

x2 = αs − βs + [(αb − βb )/μb + pπR Nrec ]R.

(5.23)

See [276] for the proof. The results shown in Lemma 5.2 indicate that an electricity consumer can compare the total costs of the three investment decisions, i.e., (1) no investments, (2) to invest in a solar panel with capacity .Ld /R, and (3) to invest in a solar+storage system with capacities .((Ld +Ln )/R, Ln /μb ), respectively. Then the strategy with minimal costs is selected as the optimal solution.

5.1.5 Case Studies Based on 600 households in Austin, Texas, we simulate the reliability values of solar+storage systems against rare weather events. For realistic simulations, we establish a benchmark model using hourly load and solar power data in Sect. 5.1.5.1. Section 5.1.5.2 elaborates on the data and parameter setups. In Sects. 5.1.5.3– 5.1.5.5, we simulate the theoretical estimates, the realistic results, and the optimal investment decisions.

5.1.5.1

A Benchmark Model

We formulate the following detailed model as a benchmark to estimate the realistic reliability values of distributed solar+storage systems:  .

max VR = pπR

 l(t) − l N (t) dt,

t∈Td ∪Tn

(5.24)

subject to l(t) = l N (t) + pP V (t) − pcha (t) + pdis (t), t ∈ Td ∪ Tn , .   X = l N (t), pP V (t), pcha (t), pdis (t), e(t) ∈ X,

.

(5.25) (5.26)

where .l(t) is a household electric load at time t; .l N (t) is the net load of the household at time t; .pP V (t) is the power from the solar panel at time t; .pcha (t) and .pdis (t) are the charging and discharging power, respectively, of the battery storage at time t;

5.1 Adding Storage to the Mix

137

e(t) is the stored energy in the battery at time t; .X is the variable vector; and .X is the set of constraints. See [276] for details.

.

5.1.5.2

Data Description

The yearly electric load of 600 households is collected from [277]. The yearly solar power and radiance data are adopted from the National Renewable Energy Laboratory [278]. According to the solar power data, day hours are set from 6:00 to 19:00, and night hours are set from 1:00 to 6:00 and from 19:00 to 24:00. The cumulative distribution functions (CDFs) of the daytime and nighttime loads of 600 households are shown in Fig. 5.2. The hourly and daily solar generation of 1-kW solar panels are shown in Fig. 5.3. The value of VOLL .π R is set to $11.96/kWh [279], and the probability p is set to 0.0027 [280]. The parameters of battery storage [281] are shown in Table 5.2. Three cases under different setups are shown in Table 5.3: In BC (base case), a household is assumed to install a battery and a 5-kW solar panel. In LBC (larger battery case),

Fig. 5.2 The CDFs of the daytime and nighttime loads of 600 households

Fig. 5.3 Hourly and daily power generation of 1-kW solar panels

138

5 Use of Energy Storage as a Means of Managing Variability

Table 5.2 The parameters of a battery storage Power capacity 5 kW

Energy capacity 13.5 kW

Round-trip efficiency 90%

DOD 100%

LBC 5 0–4

LSC 5–15 1

Table 5.3 The setups for three cases Case Solar (kW) Battery (unit)

BC 5 1

Fig. 5.4 Annual reliability value of solar+storage systems for BC

with the capacity of a solar panel investment at 5 kW, the impacts of the battery number in a household are simulated. In LSC (larger solar case), the impacts of the solar capacity in a household with one battery installed are quantified.

5.1.5.3

Theoretical Estimates

For BC, the annual reliability values of solar+storage systems are shown in Fig. 5.4. The annual reliability values for 600 households range from $27.65 to $250.19. The median value is $196.55, indicating that the annual reliability values of solar+storage systems reached $196.55 for over 50% of households. For LBC and LSC, the annual reliability values of solar+storage systems are shown in Figs. 5.5 and 5.6, respectively. When the battery number increases from 0 to 1, the reliability values dramatically increase, with the median value changing from $133.77 to $196.55. However, the reliability values make little difference when the battery number increases from 1 to 4, which indicates that one battery is mainly sufficient for realistic households, and the household reliability values are limited by solar generation. When the solar capacity increases from 5 to 15 kW, the median value of the reliability values keeps increasing from $196.55 to $238.24.

5.1 Adding Storage to the Mix

139

Fig. 5.5 Annual reliability value of solar+storage systems for LBC

Fig. 5.6 Annual reliability value of solar+storage systems for LSC

The parameters of solar+storage systems can influence the reliability values. In this subsection, we evaluate the impacts of the DOD and capacity of a battery storage. Figure 5.7 shows the median annual reliability value of solar+storage systems with different levels of DOD and capacity. When a battery’s DOD rises from 60% to 100%, the median annual reliability values gradually increase, but the marginal reliability value decreases because some households already have sufficient battery storage. Additionally, the median annual reliability value increases with a battery’s energy capacity. For example, when DOD is 60%, compared with the 8-kWh battery, the 13.5-kWh battery can increase the median value by $16.43, and the 20-kWh battery can increase another $4.19. On the other hand, the marginal reliability value is further reduced with the increase of a battery’s capacity. With DOD rising from 60% to 100%, the median reliability values of three cases increase by $16.18, $4.77, and $0.58, respectively.

140

5 Use of Energy Storage as a Means of Managing Variability

Fig. 5.7 Median annual reliability value with different levels of DOD and capacity

Fig. 5.8 The deviations between theoretical and realistic reliability values for BC

5.1.5.4

Realistic Results

In this subsection, the realistic results from the benchmark model in Sect. 5.1.5.1 are compared with those from the proposed theoretical estimates. We aim to introduce a data-driven correction term to improve the accuracy of our proposed theoretical model. For BC, the deviations between the theoretical and realistic reliability values are shown in Fig. 5.8. From the results, the theoretical analysis overestimates the annual reliability values by ignoring the energy loss of the battery storage. The median value of the deviations is $26.84. For BC, the relationship between the deviations and the theoretical reliability values is shown and regressed in Fig. 5.9. In BC, a quadratic function is employed to regress the relationship between the deviations and theoretical values:

5.1 Adding Storage to the Mix

141

Fig. 5.9 The relationship between the deviations and the theoretical reliability values in BC

Fig. 5.10 The relationship between the deviations and the theoretical reliability values in LBC and LSC

D = −0.00217VR2 + 0.718VR − 26.4,

.

(5.27)

where D is the statistical deviation, i.e., the data-driven correction term for the theoretical reliability values satisfying .VR∗ = VR − D, and .VR∗ is the reliability value after correction. In LBC and LSC, the regressions between the deviations and the theoretical reliability values are shown in Fig. 5.10. We use the following quadratic functions to regress the data-driven correction terms in different cases: ⎧ − 0.001VR2 + 0.484VR − 13.5, LBC:0 battery, ⎪ ⎪ ⎨ .D = − 0.0004VR2 + 0.331VR − 8.12, LSC:10kW solar, ⎪ ⎪ ⎩ − 0.0001VR2 + 0.216VR − 0.190, LSC:15kW solar.

(5.28)

142

5 Use of Energy Storage as a Means of Managing Variability

Fig. 5.11 Deviations of the reliability values based on a 5-min and 15-min time resolution compared with 1 h Table 5.4 The economic parameters of solar and storage systems DER Solar Storage

Annual economic value $44.74/kW $49.27/kW

Investment cost per unit $48.28/kW $49.65/kW

One-time fixed cost $1000 $700

To evaluate the impacts of temporal resolution, the reliability values are simulated based on a 5-min, 15-min, and 1-h resolution. Compared with 1 h, the deviations of the reliability values based on a 5-min and 15-min time resolution are shown in Fig. 5.11. As one can observe, with the increase in temporal resolution, the annual reliability values slightly decrease. The median reliability values based on a 5-min, 15-min, and 1-h resolution are $160.68, $161.05, and $162.87, respectively. Compared with 1 h, the median percentages of the deviations based on a 5-min and 15-min resolution are 1.23% and 0.98%, respectively. The reason for the deviations is that the difficulty of satisfying load increases when the simulation time resolution grows high, therefore raising the loss of load. However, it is acceptable to simulate the reliability values with 1-h time resolution since the deviations are small.

5.1.5.5

Optimal Investment Decision

In this subsection, the optimal investment decisions of 600 households are investigated based on our theoretical analysis. The parameters related to the economic and reliability values of solar [282] and storage systems [281] are shown in Table 5.4. The payback period (P P ) is set to 20 years, and the interest rate (I R) is 1.25% [283]. Therefore, we apply the capital recovery factor .I R(1+I R)P P /[(1+I R)P P −

5.1 Adding Storage to the Mix

143

1] to calculate the annualized one-time fixed costs for solar and storage: solar : 1000 ×

1.25%(1 + 1025%)2 0 = $56.82, (1 + 1025%)2 0 − 1

storage : 700 ×

1.25%(1 + 1025%)2 0 = $39.77. (1 + 1025%)2 0 − 1

.

(5.29)

Note that the annual economic value and the investment cost per unit are annualized. The optimal investment decisions for solar and storage are shown in Fig. 5.12. From the results, the median values of the optimal solar and storage capacity are .8.00 kW and .8.02 kWh, respectively. Under optimal investment decisions, the annualized reliability values and total revenues of 600 households are shown in Fig. 5.13. The median values of the reliability values and total revenues are $216.58 and $65.85, respectively. This indicates that by making strategic investments for solar+storage, 50% of households can receive over $65.85 in annual total revenues.

Fig. 5.12 The optimal investment decisions for solar and storage

Fig. 5.13 The annualized reliability values and total revenues of 600 households

144

5 Use of Energy Storage as a Means of Managing Variability

Table 5.5 Probability of a power grid breakdown in different cases Case Probability

Base case 0.0027

Median prob 0.005

High prob 0.01

Fig. 5.14 The optimal investment decisions obtained by increasing the investment costs per kW and kWh for solar and storage

By considering reliability values in the planning for solar and storage, households can be incentivized to invest in more DERs against the economic loss caused by rare weather events. To evaluate the investment decisions under different costs for solar and storage, we conduct a sensitivity analysis on the investment costs per unit. The investment decisions based on the parameters in Table 5.4 are the base case. Then, the optimal investment decisions by increasing investment costs per kW and kWh for solar and storage are shown in Fig. 5.14. With the increase in the investment costs for solar and storage, the optimal solar and storage capacities decrease. When the annual investment cost for solar reaches $56.28/kW, the median value of optimal solar capacity is 3.65 kW, decreasing by 54.37% compared with the base case. When the investment cost for storage reaches $54.65/kWh, no households will invest in storage because the economic and reliability values cannot recover the investment costs. In practice, different regions can suffer different levels of rare weather events, leading to various sizing for solar+storage. To evaluate the impacts of the probability p on the investment decisions, we investigate the optimal sizing for solar+storage with different values of the breakdown probability. The probabilities of a power grid breakdown in different cases are shown in Table 5.5. The optimal capacities for solar+storage with different probabilities are shown in Fig. 5.15. With the increase in the probability of a power grid breakdown, the households need larger capacities for solar+storage to satisfy individual loads. The median values of optimal solar capacity in three cases are 8.00 kW, 10.06 kW, and 12.50 kW, respectively. The median values of optimal storage capacity in three cases are 8.02 kW, 11.01 kW, and 13.63 kW, respectively.

5.1 Adding Storage to the Mix

145

Fig. 5.15 The optimal investment decisions for solar and storage with different probabilities

5.1.5.6

Discussions

In this subsection, the main results of the case studies are summarized. Based on 600 households in Austin, Texas, the case studies demonstrate that (1) solar and storage are important to a household in increasing self-sufficiency and power supply reliability. Fifty percent of households can achieve $196.55 in annual reliability values after installing a 5-kW solar panel and 13.5-kWh battery storage. (2) One 13.5-kWh battery storage is sufficient for realistic households, and the household reliability values are limited by solar generation. By improving solar capacity from 5 kW to 15 kW, the median reliability values can increase by 21.21%. (3) The theoretical results overestimate DER reliability values by ignoring the energy losses of batteries. The temporal resolution of electrical load and solar power has a slight impact on the simulation results. Thus, it is acceptable to simulate the reliability values with 1-h time resolution instead of 15 min or 5 min. (4) A simple method is provided to calculate the optimal solar+storage capacities when the reliability values are considered over the economic values and the investment costs. Fifty percent of households will make strategic investments for over 8.00-kW solar and 8.02-kWh storage.

5.1.6 Conclusion This section provides a theoretical foundation for estimating the reliability value of distributed solar+storage systems. The proposed analysis enables unsophisticated individual consumers to gauge the reliability benefits of distributed solar+storage systems amidst rare weather events. Case studies based on 600 households in Austin, Texas, validate the effectiveness of the proposed methods and models. In our future work, three issues deserve an in-depth investigation: (1) A realistic and comprehensive model is needed for the optimal sizing of solar and storage

146

5 Use of Energy Storage as a Means of Managing Variability

systems considering reliability values against rare weather events. The modeling for the economic values of DER, the life cycle of battery storage, and the rare weather events should be improved. (2) An energy-sharing scheme is needed to increase household self-sufficiency and reliability values by sharing surplus solar and storage. (3) The impacts of distributed solar+storage systems on the resilience and recovery of distribution grids against rare weather events should be further investigated.

5.2 Long-Term Planning via Scenario Approach After looking into the sizing problem of storage devices in a deterministic fashion, we also consider stochastic properties in this section for a new design. This is because many renewable generation are naturally intermittent, e.g., solar generation and wind generation. To consider the impacts of uncertain renewables in storage planning, this section proposes a probabilistic framework for storage planning. It explicitly models the operation risk by incorporating individual chance constraints. The chance constraints are converted to deterministic constraints using historical data via the scenario approach. The probabilistic storage planning problem is further decomposed into an upper-level problem and several lower-level problems and solved via a sub-gradient cutting-plane algorithm. Numerical simulations are conducted on the IEEE 6-bus system.

5.2.1 Introduction Rapidly growing penetration of renewable generation in power systems has introduced significant opportunities and challenges in the planning practice of the grid. This section introduces a novel data-driven approach to the planning of future grid with probabilistic guarantees. At the planning stage, in order to account for the uncertainty and inter-temporal variability brought about by the renewables, an increasingly popular approach is to install proper-sized energy storage at proper locations [284]. For example, the California Public Utilities Commission has mandated a merchant energy storage procurement goal of 1325 MW by 2020 [285]. With proper sizing and siting, energy storage systems could effectively reduce potential risks and costs caused by the uncertainties and unpredictability of renewables. To determine the optimal size and location of the storage system, a storage planning problem is often formulated and solved. The storage planning problem is closely related to conventional transmission planning and generation expansion. Therefore, most methods for power system planning are applicable to the storage planning problem. Typically, most planning

5.2 Long-Term Planning via Scenario Approach

147

problems were formulated as deterministic optimization problems, regardless of the fact that the planning problem consists of uncertainties by nature [286]. In recent years, researchers have proposed many power system planning techniques considering uncertainties. Broadly speaking, they can be categorized as stochastic planning and robust planning, which are based on stochastic programming and robust optimization, respectively. Stochastic planning models uncertainties as probability functions or samples drawn from the underlying distribution. It typically minimizes the expected total system cost [287, 288]. Robust planning takes an alternative approach. It seeks the optimal solution in the worst case in a predefined (possibly deterministic) uncertainty set. Both stochastic and robust-based planning approaches have pros and cons. It is well-known that stochastic programming could be computationally intractable due to too many samples and sometimes overoptimistic. Robust optimization is relatively computationally efficient but could be conservative when the uncertainty set is not carefully chosen. In order to avoid being overly optimistic or conservative, it is desirable to have a framework with adjustable risk levels. Chance-constrained optimization (CCO), which explicitly models the risk level in the optimization problem, is a natural choice to address this issue. Motivated by this reason, this section formulates the planning problem through the lens of CCO and proposes a probabilistic storage planning framework. The main contributions of this section are as follows: 1. We propose a probabilistic storage planning framework, which guarantees the system risk level is within acceptable ranges. 2. The probabilistic storage planning problem is solved via the scenario approach, which exploits the value of data and provides rigorous guarantees on the actual risk level of the operation sub-problems. Before discussing the probabilistic storage planning problem, we need to define some nomenclature. In the following, we define the nomenclature of sets and their indices. .N s and .N n represent the subsets of buses with and without energy storage; I and .I (n) represent the set of generators and set of generators on bus n; .o(l) and .r(l) represent the sending and receiving nodes of line l; i and I represent the index and number of generators; t and T represent the index and number of hours; and s and w represent the index of storages, generators, and wind farms. Moreover, d and D represent the index and number of typical days; n and N represent the index and number of buses; s and S represent the index and number of storage systems; and j and J represent the index and number of transmission lines. In addition to sets, we need to define some important variables in the following. .Es,n is the storage capacity (energy, MWh); .Ps,n is the storage capacity (power, ch , p dis are the charging and discharging powers of the storage; .p g MW); .pn,d,t n,d,t i,d,t SOC is the state of is the hourly generator output; .θn is the nodal voltage angle; .en,d,t charge of the storage; .λLMP n,d,t is the real-time nodal LMP; and .φ, ψ, γ are the dual variables.

148

5 Use of Energy Storage as a Means of Managing Variability

Parameters represent the constants that are needed to define the system. E is the annual storage In the following, we define important parameters. .Cs,a P energy investment cost; .Cs,a is the annualized storage power investment cost; g dis ch .c is the hourly incremental generation cost; .cn , cn are the hourly incremental i g

g

discharging/charging cost; .pi , pi are the generator output lower/upper limit; pil , pil are the transmission power flow lower/upper limit; .ηnch , ηndis are the storage

.

charge/discharge efficiency; .pnch , pndis are the storage charge/discharge limit; .eSOC are the storage energy limit; .C Budget is the storage investment budget; .Kd is the weight of typical day; .xl is the transmission line reactance; and .ρ max , ρ min are the maximum/minimum power/energy ratio. f,(ω) Finally, we define the following scenarios. .ω is the index of scenarios, .Pn,d,t is (ω)

the wind farm output on scenario .ω, and .Ln,d,t is the load on scenario .ω. This section is organized as follows. Section 5.2.2 formulates the probabilistic storage planning framework. Section 5.2.3 solves probabilistic storage planning via the sub-gradient-based decomposition algorithm and applies the scenario approach to the lower-level operation sub-problems. Numerical results are presented in Sect. 5.2.4. Concluding remarks and future work are in Sect. 5.2.5.

5.2.2 Probabilistic Storage Planning 5.2.2.1

Deterministic Storage Planning

A typical formulation of storage planning is presented below. It determines the optimal location and capacity of the storage system to be installed at the planning stage.

.

min

N 

E Cs,a Es,n +

i=1

+

P Cs,a Ps,n +

i=1 D  d=1

s.t.

N 

Kd

D 

Kd

d=1

I T  

g g

ci pi,d,t

t=1 i=1

T  N 

ch dis (cnch pn,d,t + cndis pn,d,t ).

(5.30a)

t=1 n=1

N  E P (Cs,a Es,n + Cs,a Ps,n ) ≤ C budget.

(5.30b)

n=1

ρ min Es,n ≤ Ps,n ≤ ρ max Es,n.  g  f dis ch w pi,d,t + pn,d,t − pn,d,t + pn,d,t + pj,d,t i∈I (n)



 j |n∈o(j )

(5.30c)

j |n∈r(j ) f

pj,d,t = Ln,d,t

(λLMP n,d,t ).

(5.30d)

5.2 Long-Term Planning via Scenario Approach g

g

pi g ≤ pi,d,t ≤ pi g

149

g

g

(5.30e)

(ψi,d,t , φi,d,t ). up

g

− Ridown ≤ pi,d,t − pi,d,t−1 ≤ Ri f

f

f

pj ≤ pj,d,t ≤ pj pj,d,t =

f

R R (ψi,d,t , φi,d,t ).

f

(5.30g)

(ψj,d,t , φj,d,t ).

1 (θo(j ),d,t − θr(j ),d,t ) xj

SOC SOC ch en,d,t − en,d,t−1 = pn,d,t ηsch −

f

(γn,d,t ).

dis pn,d,t

ηsdis

(5.30f)

SOC (γn,d,t ).

(5.30h)

(5.30i)

ch ≤ Ps,n 0 ≤ pn,d,t

ch ch (ψn,d,t , φn,d,t ).

(5.30j)

dis ≤ Ps,n 0 ≤ pn,d,t

dis dis (ψn,d,t , φn,d,t ).

(5.30k)

SOC ≤ Es,n 0 ≤ en,d,t

SOC SOC (ψn,d,t , φn,d,t )

(5.30l)

The objective is to minimize the total investment and operational costs of a power system. Constraint (5.30b) ensures that the total investment is within budget. Constraint (5.30c) limits the power/energy rate of the storage system. Constraints (5.30d)–(5.30g) represent the operational constraints of the power system, including nodal load supply balance (5.30d), generation upper and lower bounds (5.30e), generation ramping limits (5.30f), DC power flow equation (5.30h), and transmission line flow limit (5.30g). Constraints on the energy storage system are found in (5.30i)–(5.30l). Constraint (5.30i) represents the state of charging (SOC) status changes at time t, charging and discharging power within limits (5.30j)–(5.30k), and SOC state within limits (5.30l). Since an accurate long-term planning decision must account for its impact on short-term system operations [288, 289], we consider a detailed operation model in (5.30). The energy and power capacity of the sth energy storage system at bus n is denoted by .Es,n and .Ps,n . We first assume that there could be an energy storage system installed on every bus. After solving the storage planning problem, if the ∗ and .P ∗ are close to zero, i.e., .E optimal solution .Es,n s,n ≈ 0 and .Ps,n ≈ 0, then s,n the energy storage system will not be installed at bus n.1 E , C P are calculated by (5.31), The present values of annual investment costs .Cs,a s,a where .σ is the annual interest and . is the lifetime of the storage system [290]. E Cs,a = CsE

.

 σ (1 + σ ) P P σ (1 + σ ) = C , C . s,a s (1 + σ ) − 1 (1 + σ ) − 1

(5.31)

1 A more accurate model will introduce binary variables, which might lead to computational difficulties. Formulation (5.30) is essentially a relaxation of the integer programming formulation.

150

5 Use of Energy Storage as a Means of Managing Variability

5.2.2.2

Storage Planning with Probabilistic Guarantees

The deterministic version neglects the fact that the storage planning is by nature a decision-making problem under uncertainties. We improve the deterministic storage planning problem by including the following D individual chance constraints:



P

.

f

pj,d,t

j |n∈r(j )

i∈I (n)







g

dis ch w pi,d,t + pn,d,t − pn,d,t + pn,d,t +

f

f

f

f

pj,d,t ≥ Ln,d,t ; pj ≤ pj,d,t ≤ pj



≥ 1 − d ,

j |n∈o(j )

d = 1, · · · , D

(5.32)

Each chance constraint ensures that there is enough supply to meet demand, and the power flows do not exceed the flow limitation in typical day d with a probability of at least .1 − d . The pre-defined constant .d represents the acceptable risk level. The smaller the .d (i.e., .d → 0) is, the more secure the system. By including D chance constraints (5.32) in the storage planning problem, we formulate the following probabilistic storage planning problem: min (5.30a) .

s.t. (5.30b)(5.30c)(5.30e)(5.30f)(5.30h)(5.30i)(5.30j)(5.30k)(5.30l) and (5.32)

5.2.2.3

Structure of the Storage Planning Problem

The storage planning problem can be written as a bi-level problem [288], featuring one upper-level problem and D lower-level problems. The upper-level problem is in (5.33), which determines the sizing and siting of the storage system (.Es,n , Ps,n ).

.

min

N 

E Cs,a Es,n +

n=1

N 

P Cs,a Ps,n .

(5.33)

n=1

s.t. (5.30b)(5.30c) Each one of the D lower-level problems (5.34) considers detailed operations g ch , p dis ) in one of D typical days. Equation (5.34) is essentially a DC (.pi,d,t , pn,d,t n,d,t optimal power flow (DCOPF) problem, which minimizes the weighted operation cost of a typical day, considering constraints such as line flow and generation limits.

.

min D1 = Kd

T  I  t=1 i=1

g g

ci pi,d,t + Kd

T  N  ch dis (cnch pn,d,t + cndis pn,d,t ), t=1 n=1

s.t. (5.30d)(5.30e)(5.30f)(5.30g)(5.30h)(5.30i)(5.30j)(5.30k)(5.30l).

(5.34)

5.2 Long-Term Planning via Scenario Approach

151

The upper-level decision variables enter the lower-level problem in constraints (5.30j), (5.30k), and (5.30l).

5.2.3 Solving Probabilistic Storage Planning This section elaborates on the method to solve the probabilistic storage planning problem. Section 5.2.3.1 introduces the scenario approach to handle chance constraints arising in the lower-level operation problem. Section 5.2.3.2 applies the scenario approach on probabilistic storage planning, and a sub-gradient-based decomposition algorithm is employed to solve the problem.

5.2.3.1

Introduction to the Scenario Approach

The scenario approach is one of the first and most well-known methods to solve chance-constrained programs [291]. Considering the following chance-constrained program: .

min c x, . x

s.t. f (x) ≤ 0, .   P g(x, ω) ≤ 0 ≥ 1 − ,

(5.35a) (5.35b) (5.35c)

where x is the decision variable and random variable .ω ∈  denotes the uncertainties. All deterministic constraints are represented by .f (x) ≤ 0. The chance constraint (5.35c) ensures the inner stochastic constraint .g(x, ω) ≤ 0 is feasible with high probability at least .1 − . To solve the chance-constrained program, the scenario approach reformulates (5.35) to the scenario problem (5.36) with N i.i.d. scenarios .N := {ω1 , ω2 , · · · , ωN }. (SP)N : min c x, .

.

x

s.t. f (x) ≤ 0, . g(x, ω1 ) ≤ 0, · · · , g(x, ωN ) ≤ 0.

(5.36a) (5.36b) (5.36c)

∗ which is feasible to all The scenario problem (5.36) seeks the optimal solution .xN ∗ , we define the N scenarios. To evaluate the feasibility and quality of the solution .xN violation probability as follows.

Definition 5.2 (Violation Probability [292]) The violation probability .V(x) of a candidate solution x is defined as

152

5 Use of Energy Storage as a Means of Managing Variability

 V(x) := P ω : g(x, ω) > 0 .

.

(5.37)

The most important result of the scenario approach theory is to demonstrate the relationship between violation probability .V(x) and the sample complexity N , which is built on the concept of support scenarios. Definition 5.3 (Support Scenario [292]) A scenario .ωi is a support scenario for the scenario problem .(SP)N if its removal changes the solution of .(SP)N . .S denotes the set of support scenarios. ∗ and .x ∗ stand for the optimal Definition 5.4 (Non-degeneracy [292]) Let .xN S solutions to the scenario problems .SPN and .SPS , respectively. The scenario problem  ∗ = c x ∗ . .SPN is said to be non-degenerate if .c x N S

The theoretical results of the scenario approach can be classified into two categories: prior and posterior guarantees [291]. The prior ones often provide ∗ . More specifically, the conditions on the sample complexity N before obtaining .xN ∗ ) > ) < β. prior guarantees typically provide lower bounds on N to ensure .P(V(xN ∗ The posterior guarantees make effects after obtaining .xN , which often include the following steps: 1. 2. 3. 4.

Obtaining N i.i.d. scenarios (.N), solve the corresponding scenario problem .SPN . Find the set of support scenarios .S, whose cardinality is denoted by .|S|. Calculate the posterior guarantee .(|S|) using (5.38) in Theorem 5.3;2 ∗ is a suboptimal If .(|S|) is smaller than the acceptable risk level ., then .xN solution to the original chance-constrained problem (5.35). Otherwise, increase the sample complexity N and repeat the process.

Theorem 5.3 (Wait-and-Judge [292]) Given .β ∈ (0, 1), for any .k = 0, 1, · · · , n, the polynomial equation in variable t

.

  N   β  i i−k N N−k − t =0 t k N +1 k

(5.38)

i=k

has exactly one solution .t (k) in the interval .(0, 1). Let .(k) := 1 − t (k). If the scenario problem .SPN is convex, for a non-degenerate problem, it holds that 

∗ PN V(xN ) ≥ (|S|) ≤ β,

.

(5.39)

where .|S| is the number of support scenarios. The main results of this section are based on the posterior guarantees of the scenario approach.

2 For

example, use the ConvertChanceConstraint package in [291].

5.2 Long-Term Planning via Scenario Approach

5.2.3.2

153

Solving Probabilistic Storage Planning via the Scenario Approach

The probabilistic storage planning problem (5.33) features D individual chance constraints in (5.32). We apply the scenario approach to each chance constraint and convert it to N constraints (5.40a)–(5.40b). .

min (5.30a) s.t. (5.30b)(5.30c)(5.30e)(5.30f)(5.30h)(5.30i)(5.30j)(5.30k)(5.30l)  g  f,(1) w,(1) dis ch pi,d,t + pn,d,t − pn,d,t + pn,d,t + pj,d,t j |n∈r(j )

i∈I (n)





f,(1)

(1)

f

f,(1)

f

pj,d,t ≥ Ln,d,t ; pj ≤ pj,d,t ≤ pj ..

(5.40a)

j |n∈o(j )

.. . 

g

w,(N )

dis ch pi,d,t + pn,d,t − pn,d,t + pn,d,t +

f,(N )

pj,d,t

j |∈r(j )

i∈I (n)







f,(N )

(N )

f

f,(N )

f

pj,d,t ≥ Ln,d,t ; pj ≤ pj,d,t ≤ pj .

(5.40b)

j |n∈o(j )

5.2.3.3

Sub-gradient Cutting-Plane Method

The above scenario program is only formulated for the operation sub-problem with the planning decision variables .Es,n and .Ps,n being fixed. Therefore, we use the sub-gradient-based method to decompose the original planning problem into an upper-level planning decision problem and some lower-level operation problems. Meanwhile, the decomposition algorithm has the advantage of solving the largescale power system planning problem with limited computational resources. First, we dualize each one of the D lower-level problems:

.

max D2 =

T  I  

w,(ω)

(ω)

λn(i),d,t,ω (Pn,d,t − Ln,d,t )

t=1 i=1 ω∈(ω)

+

J T   

t=1 j =1 ω∈(ω)

+

I T   up R R (Ri φi,d,t − Ridown ψi,d,t ) t=1 i=1

f

SOC SOC (φj,d,t,ω − ψj,d,t,ω )pj

154

5 Use of Energy Storage as a Means of Managing Variability

+

N T   SOC ch dis (Es,n φn,d,t + Ps,n (φn,d,t + φn,d,t )), .

(5.41a)

t=1 n=1 f

f

f

LMP s.t. γj,d,t,ω + ψj,d,t + φj,d,t + λLMP r(j ),d,t,ω − λo(j ),d,t,ω = 0



f

γj,d,t,ω −



∀ ω ∈ (ω), .

(5.41b)

f

γj,d,t,ω = 0 ∀ ω ∈ (ω), .

(5.41c)

R R R R φi,d,t + ψi,d,t + φi,d,t − φi,d,t+1 + ψi,d,t − ψi,d,t+1  g + λLMP n(i),d,t,ω = ci , t = 1, · · · , T − 1, .

(5.41d)

j |n∈o(j ) g

j |n∈r(j ) g

ω∈(ω) g

g

R R φi,d,T + ψi,d,T + φi,d,T + ψi,d,T +  g λLMP n(i),d,T ,ω = ci , .

(5.41e)

ω∈(ω) SOC SOC SOC SOC φn,d,t + ψn,d,t + γn,d,t − γn,d,t+1 = 0,

t = 1, · · · , T − 1, .

(5.41f)

SOC SOC SOC φn,d,T + ψn,d,T + γn,d,T = 0, . ch ch SOC φn,d,t + ψn,d,t − γn,d,t −

 λLMP n,d,t,ω ω∈(ω)

ch ch SOC φn,d,t + ψn,d,t + γn,d,t +

(5.41g)



ηch

= cnch , .

(5.41h)

dis λLMP = cndis . n,d,t,ω η

(5.41i)

ω∈(ω)

Then add the strong duality condition (5.42) as a constraint. D1 = D2

(5.42)

.

Complete details about the sub-gradient algorithm can be found in [288]. We only provide the key formula (5.43a)–(5.43b) of updating sub-gradients. Equations (5.43a)–(5.43b) differ a little from [288] because of solving different problems.

p,(v)

gn

.

⎧ D  T ch,(v) dis,(v) P ⎪ ⎪ ⎨Cs,a + d=1 t=1 (φn,d,t + φn,d,t ), n ∈ Ns = 0,(v) ⎪ ⎪ ⎩gn0,(v) ρn 0,(v) , n ∈ Nn 1+ρn

.

(5.43a)

5.2 Long-Term Planning via Scenario Approach



p,(v) ge

E + Cs,a = 0,(v) gn

D d=1

155

T

SOC,(v) , t=1 φn,d,t

1 0,(v) , 1+ρn p,(v)

n ∈ Ns n ∈ Nn

(5.43b)

e,(v)

and .gn are the sub-gradients for where v is the number of iterations and .gn 0,(v) is the ratio of power to the energy of the storage storage power and energy. .ρn system.

5.2.4 Case Study 5.2.4.1

Settings

This section presents the numerical results on a 6-bus system as Fig. 5.16. Generator parameters are shown in Table 5.6. The capacities of seven lines are 36, 80, 90, 60, 86, 40, and 40 MW. For simplicity, the target year is represented by a single typical day. The 24-h system load data can be found in [293]. Load is assumed to be distributed equally between buses 3 and 4. Storage devices can be deployed on any bus, whose costs are $20/kWh and $500/KW with a lifetime of 20 years. The investment interest is 10%. The major sources of uncertainties are wind fluctuations and load variations. Two wind farms with the same capacity are located on buses 3 and 4. The total installed wind capacity is .∼ 10% of the system load. The wind fluctuations are assumed to satisfy a Gaussian distribution .N(μ, 0.07μ). Similarly, the load variations satisfy .N(μ, 0.05μ). Although we assume Gaussian distributions for simplicity, it is worth

Fig. 5.16 6-bus system single-line diagram

156

5 Use of Energy Storage as a Means of Managing Variability

Table 5.6 Generator parameters Parameters g ($/MW) max (MW) .G

.G1

.G2

.G3

.G4

14.5 100

24 75

60 50

120 50

.C

Table 5.7 Results with different sample complexities Sample complexity No. of support scenario (.|S|) Out-of-sample .ˆ Posterior .(|S|)

100 21

200 24

300 26

400 24

500 24

0.1770 0.4010

0.0750 0.2334

0.0580 0.1679

0.0540 0.1206

0.0400 0.0971

noting that the theoretical guarantees of the scenario approach (e.g., Theorem 5.3) hold for any distribution. The scenario approach purely relies on the data (scenarios) without making suspicious assumptions on the underlying distributions.

5.2.4.2

Numerical Results

We solve probabilistic storage planning problems with varying sample complexities, i.e., .N = 100, 200, · · · , 500. The confidence parameter .β is .0.001. We use an independent test dataset consisting of .10,000 points to calculate the out-of-sample violation probability .ˆ . Table 5.7 shows the number of support scenarios, posterior guarantees .(|S|) (Theorem 5.3), and out-of-sample .ˆ . There are a few interesting observations in Table 5.7 and Fig. 5.17. First, although we were not able to prove a relatively tight upper bound on .|S|, the number of support scenarios turns out to be a much smaller number .(∼ 24) than the number of decision variables.3 This indicates that the sample complexity required by the storage planning problem could be much less than expected. In other words, we could use a moderate number of scenarios to achieve the acceptable risk level. Second, it is worth mentioning that 24 is the number of operation hours considered in storage planning. Indeed, we could prove that .|S| is no more than 24 without transmission line limits, following similar logic as in [295]. According to Table 5.7, .|S| remains around 24 even with transmission constraints. This implies a moderate number of scenarios might be able to achieve the acceptable risk level even for large systems. Third, the out-of-sample violation probability .ˆ is less than half of the posterior guarantees .(|S|). This is as expected since the scenario approach is essentially a conservative method to find suboptimal and feasible (with high confidence .1 − β) solutions to CCO problems [291]. This feature would be widely accepted when security and reliability is the main concern of the system.

number of decision variables often serves as the upper bound on .|S| for prior guarantees of the scenario approach, e.g., [294].

3 The

5.2 Long-Term Planning via Scenario Approach

157

Fig. 5.17 Violation probabilities with varying sample complexities

Fig. 5.18 System costs under different risk levels

Figure 5.18 shows the total system cost vs. the system acceptable risk level .. The higher the acceptable risk level, the less conservative the decisions might be, thus lowering the operating costs. The total system cost, including the investment cost, presents a similar trend: decreasing with a higher acceptable risk level. In addition to Fig. 5.18, we plot the storage planning results for each bus4 in Fig. 5.19. The locations vary with increasing risk levels. For example, in the most conservative case where . = 0.04, the storage is deployed at buses 2 and 4, while in the least conservative case . = 0.075, the storage is deployed at buses 2, 3, 4, and 6. The relationship between the location of the storage system and risk level is indeed an interesting direction to be further explored.

4 Bus

1 is omitted since no storage is installed in all simulation results.

158

5 Use of Energy Storage as a Means of Managing Variability

Fig. 5.19 Energy capacities of storage of buses under different risk levels

5.2.4.3

Discussions

One major benefit of deploying energy storage is the potential savings in operational costs. To quantify the impacts of storage, we define the saving ratio as follows: saving ratio := 1 −

.

storage investment cost operation cost saving

(5.44)

5.2 Long-Term Planning via Scenario Approach

159

Fig. 5.20 Saving ratios of storage systems under different risk levels

where the operation cost-saving is calculated as the difference between the operation cost (without storage) and .operation cost (with storage). An attractive finding in Fig. 5.20 is that the saving ratio becomes higher when the system is more secure (. → 0). The more risk-averse the system operator is, the higher the saving ratio would be achieved through the installation of energy storage devices. In other words, it will be more beneficial to employ energy storage when system security or reliability is the pivotal concern.

.

5.2.5 Conclusion This section proposes a data-driven probabilistic storage planning framework with adjustable operation risk levels in the presence of uncertainties from renewables and load fluctuations. The scenario approach and a sub-gradient algorithm are used to solve the probabilistic storage planning problem. Some interesting findings are presented in the numerical results. This section is a first step toward a rigorous and practical framework for power system planning with many energy storage devices. Future work would include (1) case studies on real-world size test systems; (2) adding more security constraints (e.g., .N − 1 contingency) in the problem formulation; and (3) exploring the relationship and advantage of the proposed probabilistic framework with stochastic and robust planning.

160

5 Use of Energy Storage as a Means of Managing Variability

5.3 Utility’s Procurement with Storage and/or Demand Response In the last two sections, we propose deterministic and stochastic solutions for planning storage devices. However, privacy is exposed in such analysis. Such a problem is critical at the residential level. Therefore, we would like to accomplish the objective of privacy in a completely privacy-preserving and model-free manner, i.e., without direct access to the state variables (temperatures or power consumption) or the dynamical models (thermal characteristics) of individual homes while guaranteeing personal comfort constraints of the consumers. We propose a twostage optimization and control framework to address this problem. In the first stage, we use a long short-term memory (LSTM) network to predict hourly electricity prices, based on historical pricing data and weather forecasts. Given the hourly price forecast and thermal models of the homes, the problem of designing an optimal power consumption trajectory that minimizes the total electricity procurement cost for the collection of thermal loads can be formulated as a large-scale integer program (with millions of variables) due to the on-off cyclical dynamics of such loads. We provide a simple heuristic relaxation to make this large-scale optimization problem model-free and computationally tractable. In the second stage, we translate the results of this optimization problem into distributed open-loop control laws that can be implemented at individual homes without measuring or estimating their state variables while simultaneously ensuring consumer comfort constraints. We demonstrate the performance of this approach on a large-scale test case comprising of 500 homes in the Houston area and benchmark its performance against a direct model-based optimization and control solution.

5.3.1 Introduction In traditional power grids, uncertainties typically arise on the demand side and are countered by an increase or decrease in the generation of power using operating reserves. However, the large-scale integration of renewables has introduced additional uncertainties into the supply side due to the variability of renewable energy resources. Since generation from renewable energy resources cannot be directly controlled, this new uncertainty on the supply side will need to be offset by tuning the demand via controllable loads [296–298]. This approach, known as demand response, is a rapidly emerging operational paradigm in the modern power grid, wherein an aggregator or load-serving entity (LSE) manages a collection of controllable loads that function as a new type of operating reserve, albeit one that is now on the demand side [299]. Thermal inertial loads such as air conditioners (ACs), heaters, and refrigerators comprise nearly half of the residential demand in the United States [300] and are attractive candidates for demand response due to their ability to store energy and

5.3 Utility’s Procurement with Storage and/or Demand Response

161

alter (delay or advance) consumption without causing significant discomfort to the consumer [301, 302]. This demand response potential can be exploited by LSEs to provide ancillary services to the grid while simultaneously reducing energy costs for individual consumers [303–305]. Early instances of demand response from thermal inertial loads typically employed coarse models of the duty cycles of the loads to compute pre-defined trajectories for load curtailment during periods of peak pricing [306–308]. More recent approaches involve estimating the models and states of the loads and utilizing this information to design and track the desired power trajectory that minimizes costs or provides operational support to the grid [309–311]. In this context, it is desirable to develop model-free privacy-preserving approaches for thermal inertial load management for three reasons. First, thermal models can be used to infer information about the size, layout, and construction of the consumers’ homes, which may constitute a violation of consumer privacy. Second, it is challenging to obtain such models for demand response programs involving large-scale participation from thousands of homes, even with intrusive measurement and monitoring. Finally, for privacy reasons, it is not desirable to measure the temperatures or power consumption of individual homes. Recently, learning-based model-free approaches for the optimization and control of thermal loads have been proposed [312, 313]; however, these approaches are typically not privacy-preserving in that they still involve measuring the internal temperatures and power consumption profiles of homes. Alternatively, privacy-preserving approaches to thermal inertial load management, wherein the power consumption of individual homes is not directly measured, have been proposed [314–316]. However, all of these approaches still utilize thermal models of homes to compute and implement optimal control actions for electricity cost minimization. The aim of this section is to bridge the gap by proposing a model-free privacy-preserving approach for the management of thermal inertial loads. Specifically, we consider the problem of minimizing the cost of procuring electricity for a large collection of homes managed by an LSE. The objective is to pre-cool (or pre-heat) homes by controlling residential thermal loads in order to avoid procuring power during periods of peak electricity pricing. Further, we would like to accomplish this objective in a completely privacy-preserving and model-free manner, i.e., without direct access to the state variables (temperatures and power consumption) or models (thermal characteristics) of individual homes. We propose a two-stage optimization and control framework to address this problem. In the first stage, we use a long short-term memory (LSTM)-based recurrent neural network architecture to forecast hourly electricity prices from historical price data and weather forecasts. Given the hourly price forecast and the thermal models of the homes, the problem of designing an optimal power consumption trajectory that minimizes the total electricity procurement cost can be formulated as a large-scale integer program (with millions of variables) due to the on-off cyclical dynamics of such loads. This integer program has typically been solved using linear relaxations or dynamic programming [314, 317], with explicit closed-form solutions available in special cases where prices are assumed to be monotone [318]. In this section, we propose a simple heuristic relaxation

162

5 Use of Energy Storage as a Means of Managing Variability

to convert this large-scale optimization problem into a model-free optimization problem that can be solved in an explicit and computationally tractable manner. In the second stage, we translate the results of this optimization problem into distributed open-loop control laws that can be implemented at the individual homes without measuring or estimating their state variables while respecting consumer comfort constraints. We demonstrate the performance of this approach on a largescale test case that is comprised of 500 homes in the Houston area, with pricing data from the Electric Reliability Council of Texas (ERCOT), and benchmark the performance of the proposed approach by comparing it with the direct model-based approach in [314]. Notation .R, .R+ , and .Rn denote the sets of real numbers, positive real numbers including zero, and n-dimensional real vectors, respectively. Given .a, b ∈ R, .a∧b =   a, a > b a, a < b . Given two sets A and B, .A\B represents the set and .a∨b = b, a < b b, a > b of all elements of A that are not in B. We denote the Laplace density function with zero mean and scale parameter .a ∈ R+ \{0} by .Lap(a). The gamma density function with parameters .a, b ∈ R+ \{0} is denoted by .(a, b), and the exponential density function with rate .λ ∈ R+ \{0} is denoted by .Exp(λ). We denote by .N(μ, σ, a, b) a truncated univariate Gaussian density function with mean .μ, standard deviation .σ , and support .[a, b].

5.3.2 Problem Formulation We begin by describing the model of a collection of residential thermal loads and formulate the problem of minimizing the electricity procurement cost. For simplicity, we assume that all the loads are air conditioners (ACs). Note that the same analysis can be carried out for heaters, with the objective of pre-heating, rather than pre-cooling, homes.

5.3.2.1

System Model

Consider a population of N homes with controllable ACs managed by a load-serving entity (LSE). Assume that each home has a temperature setpoint that is private to the consumer, denoted by .si , and a comfort range .i , .i ∈ {1, 2, . . . , N}, which denotes the deviation from the setpoint that each consumer is willing to tolerate. Therefore, the temperature of the ith home at any time .t ∈ R+ , denoted by .θi (t), must lie in the comfort band .[Li0 , Ui0 ] = [si −i , si +i ]. The flexibility of the ith consumer, .i ∈ {1, 2, . . . , N}, can be quantified by the range of the consumer’s comfort band, i.e., .2i . The temperature dynamics of the ith home, .i ∈ {1, 2, . . . , N}, is governed by θ˙i (t) = −αi (θi (t) − θa (t)) − βi Pi σi (t),

.

(5.45)

5.3 Utility’s Procurement with Storage and/or Demand Response

163

where .θa (t) represents the ambient temperature at time .t ∈ R+ , .Pi represents the power consumption of the ith AC, .αi and .βi represent the heating time constant (.h−1 ) and thermal conductivity (.◦ C/kW h) of the ith home, and .σi (t) ∈ {0, 1} denotes the on/off state of the ith AC at time .t ∈ R+ , where .σi (t) = 1 indicates that the AC is on and .σi (t) = 0 indicates that the AC is off. When the AC is off, the temperature of the home rises until it reaches the upper bound of the consumer’s comfort band .Ui0 , at which point the AC turns on. Similarly, when the temperature reaches the lower bound of the comfort band, .Li0 , the AC turns off. Therefore, the switching behavior of the ith AC, .i ∈ {1, 2, . . . , N}, can be defined as σi (t) =

.

⎧ ⎨

1, θi (t) = Ui0 , 0, θi (t) = Li0 , ⎩ σi (t − ), otherwise.

(5.46)

The total electrical power consumed by the population of ACs is given by .Ptotal = N  Pi /ηi , where .ηi is the coefficient of performance of the ith AC. i=1

5.3.2.2

Optimization Problem

Define the indicator variable .ui (t) : R+ → {0, 1}, .∀i ∈ {1, 2, . . . , N} where ui (t) = 1 if the ith AC is on at time .t ∈ R+ and .ui (t) = 0 otherwise. We also denote the total number of ACs that are on at any time .t ∈ R+ by .nON (t). For simplicity, we assume without loss of generality that all the ACs have an identical power consumption and coefficients of performance, i.e., .Pi = P and .ηi = η, .∀i ∈ {1, 2, . . . , N}. Let the electricity price forecast and ambient temperature forecast at time .t ∈ R+ be denoted by .πˆ (t) : R+ → R+ and .θˆa (t) : R+ → R, respectively. If these forecasts are known over a T -hour horizon, i.e., .∀t ∈ [0, T ], T ∈ R+ \{0}, then the problem of minimizing the total cost of procuring electricity by the LSE for the collection of ACs over the time horizon .[0, T ] can be formulated as

.

P: s.t.

min

u1 (t),...,uN (t)∈{0,1}N

P η



T

0

 π (t)

N 

ui (t)dt,

i=1

  θa (t) − βi P ui (t), θ˙i (t) = −αi θi (t) − 

.

P η

 0

N T 

(5.47)

ui (t)dt ≤ E,

i=1

Li0 ≤ θi (t) ≤ Ui0 , where .E > 0 is the maximum energy budget of the LSE for the time horizon .[0, T ].

164

5 Use of Energy Storage as a Means of Managing Variability

Assumption We make the following assumptions pertaining to the feasibility of the optimization problem .P: • Without loss of generality, we assume that the initial temperatures are within the user’s comfort constraints, i.e., .θi (0) ∈ [Li0 , Ui0 ]. • For every .i ∈ {1, 2, . . . , N}, when the states are at the upper or lower bound of the comfort band .[Li0 , Ui0 ], there exists a control policy that can maintain the state inside the comfort band. In other words, the dynamics (5.45) are such that for all possible .θˆi (t), the temperature .θi (t) increases with .σi (t) = 0 and decreases with .σi = 1, or .∀t ∈ R+ and .i ∈ {1, 2, . . . , N}: .

− αi (Li0 − θˆi (t)) > 0,

−αi (Ui0 − θˆi (t)) − βi < 0.

Note that the control inputs to maintain the temperature at the upper or lower αi P DOW N (t) = comfort bounds are given by .uU i (t) = βi (θi (t) − Ui0 ) and .ui αi βi (θi (t) − Li0 ), respectively. 5.3.2.3

Model-Based Solution for Benchmarking

We now outline a model-based approach to obtain a power reference trajectory for the collection of loads by solving the optimization problem (5.47) and design a privacy-preserving control law to track this power reference trajectory [314]. We will later use this approach as a benchmark against which we will validate our proposed model-free solution. If the dynamics of the thermal loads (5.45) are known, the optimization problem .P can be discretized in the time variable t and directly solved as a mixed-integer linear program (MILP). However, for N homes with a discretization time step of 1 min, the MILP would involve n .N × 2 × 24 × 60 variables, which would be prohibitively large (of the order of millions of variables) for hundreds or thousands of homes. Hence, the typical approach to solving this MILP involves a linear programming (LP) relaxation, where the integer variable .ui (t) is allowed to vary continuously in the interval .[0, 1], that is, .ui (t) : R+ → [0, 1], i ∈ {1, 2, . . . , N}. Then, .ui (t) can be interpreted as the fraction of time that the ith AC is on during  N each discretization time interval. Let . u∗i (t) i=1 be the solution to the optimization problem .P. Then, the optimal power reference trajectory can be computed as ref (t) = P N u∗ (t). .P i=1 i total η Privacy-Preserving Implementation In order to track this power reference trajectory in a privacy-preserving manner, assume that the LSE does not have access to states of the home, including its setpoint .si , temperature .θi (t), and the state of its AC, .σi (t). First, the LSE estimates the total demand .Ptotal (t) in a privacy-preserving manner as follows. The ith home, .i ∈ {1, 2, . . . , N}, reports, with probability

 1  .p ∈ [0, 1], a corrupted power consumption .Pˆi = .Pi + ni , where .ni ∼  pN , Pe is chosen independently and distributed identically among homes. With this setup,

5.3 Utility’s Procurement with Storage and/or Demand Response

165

it can be shown that the total power can be estimated

ina differentially private  Pe  manner as .Ptotal (t) = Nˆ N P + n, where . n ∼ Lap i=1 i  , and .pN = N, where N  is the number of homes that report their noise-corrupted power consumption. .N Next, the LSE measures the deviation of the total power consumption of the homes, ref .Ptotal (t), from the optimal power reference trajectory .P total (t) and uses a simple PID controller with proportional, integral, and derivative gains .kp , ki , and .kd , t respectively, to compute a velocity control signal .v(t) = kp e(t) + ki 0 e(s)ds + ref kd de dt , e(t) = Ptotal (t) − Ptotal (t), which is broadcast to all homes. The ith AC, .i ∈ {1, 2, . . . , N}, then locally computes its new setpoint as .si (t) = i v(t) and adjusts its comfort band as .[Lit , Uit ] ⊆ [Li0 , Ui0 ] , where .

Lit = min (Ui0 , max (Li0 , si (t) − i )) , Uit = max (Li0 , min (Ui0 , si (t) + i )) .

(5.48)

In this manner, the temperatures of individual homes can be locally regulated in a privacy-preserving manner such that their aggregate power consumption tracks the optimal reference trajectory.

5.3.2.4

Problem Statement

We now state the problem addressed in this section. Problem Given the historical hourly data of electricity prices and ambient temperatures and the ambient temperature forecast .θˆa (t) over a time horizon .[0, T ], the aim of this section is to (i) solve optimization problem .P without explicit knowledge of the values of the thermal parameters .αi and .βi , .i ∈ {1, 2, . . . , N} in (5.47), and (ii) design .σi (t), i ∈ {1, 2, . . . , N} that results in the optimal power consumption determined by the solution of (5.47) when implemented locally at each AC .i ∈ {1, 2, . . . , N}, without access to the state variables .θi (t) or .σi (t) and power consumption .Pi (t) or .Ptotal (t) by the LSE.

5.3.3 Model-Free Privacy-Preserving Optimization and Control Framework In this section, we present a two-stage approach to solve the problem considered in Sect. 5.3.2.4. In the first stage, we begin by forecasting hourly electricity prices based on historical price data and ambient temperature forecasts. We then propose a heuristic relaxation to solve the optimization problem .P in a model-free manner. In the second stage, we discuss control laws for the implementation of this solution.

166

5.3.3.1

5 Use of Energy Storage as a Means of Managing Variability

Stage 1: Optimization

We begin by describing how the price forecast .πˆ (t) can be obtained from historical data.

LSTM-Based Price Forecasting Given the ambient temperature forecast .θˆa (t) over the horizon .t ∈ [0, T ], we begin by using a long short-term memory (LSTM) neural network to forecast the hourly electricity price .π(t), ˆ .t ∈ [0, T ]. We choose to use an LSTM-based prediction, since its memory structure allows us to capture features like seasonal and daily variations in prices. Real-time electricity prices vary rapidly on a minute-by-minute basis. However, significant variations are typically observed at the hourly level, and most procurement by the LSE is also carried out at this timescale. Therefore, we begin by averaging intra-hourly historical data to obtain hourly electricity price data on each day. Similarly, we obtain historical temperature data on an hourly timescale. This hourly price and temperature datasets serve as the inputs to the LSTM. Remark 5.2 The window of prediction for the LSTM is chosen based on two considerations. First, in our simulations, we determined that highly accurate price predictions can be made in short time windows of less than 4 hours. Second, we require that the prediction window is larger than the sum of two time windows .TON and .TOF F , defined as follows: • .TON : the average time required to cool a home from its upper comfort bound to its lower comfort bound, i.e., the average over all .i ∈ {1, 2, . . . , N} of the smallest time .TON,i such that .θi (0) = Ui0 and .θi (TON,i ) = Li0 with .σi (t) = 1, ∀t ∈ [0, TON,i ] • .TOF F : the average “duty cycle” of the residential thermal loads, i.e., the average over all .i ∈ {1, 2, . . . , N} of the smallest amount of time .TOF F,i , such that .θi (TOF F ) = Ui0 , given that .θi (0) = Li0 and .σi (t) = 0, ∀t ∈ [0, TOF F,i ] This is to account for the fact that decisions to pre-cool a home will need to be taken into consideration for the least amount of time .(TON + TOF F ) before price peaks for a feasible implementation.

Model-Free Optimization In order to solve the optimization problem .P without knowledge of the dynamics of individual homes, we begin by making an assumption about the price forecast .π ˆ (t), t ∈ [0, T ]. Assumption We assume that the price forecast .πˆ (t) is unimodal over .t ∈ [0, T ], i.e., there exists .tP EAK ∈ [0, T ], such that .πˆ (t) is monotonically increasing .∀t ≤ tP EAK and monotonically decreasing .∀t > tP EAK .

5.3 Utility’s Procurement with Storage and/or Demand Response

167

This assumption is not unreasonable since historical data indicates a strong unimodality property in hourly electricity prices, typically correlated with hourly variations in temperature and load profiles over the day, thus allowing for the electricity price forecast .πˆ (t) to be closely approximated by a unimodal function as illustrated in Fig. 5.21. We now propose a simple heuristic relaxation to the optimization problem .P, based on Assumption 5.3.3.1. If .π ˆ (t) is unimodal, then an explicit solution to (5.47) can be written as follows. Intuitively, the optimal solution to (5.47) involves designing .ui (t) such that the LSE purchases most of its power during the period when the price is low and uses this energy to pre-cool homes to their lower comfort bound .Li0 , allowing for the ACs to be switched off during the peak pricing period until the temperature reaches the upper comfort band .Ui0 . For this pre-cooling operation, we consider the monotonically increasing portion of the unimodal price function, that is, .πˆ (t), such that .t ∈ [0, tP EAK ]. Additionally, we relax the energy budget constraint by assuming .E = ∞ (an explicit model-based solution to (5.47) incorporating this constraint and the switching dynamics of the loads can be provided along the lines of [318]). We have the following result on the solution to the optimal control problem .P for the period where the price is monotonically increasing. Theorem 5.4 If .πˆ (t), .t ∈ [0, tP EAK ] is monotonically increasing, then there exists t ∗ < tP EAK , such that the optimal solution to (5.47) is given by

.

⎧ ⎪ ⎪ ⎨

1,

t DOW N ui (t), t ∗ .ui (t) = U P (t), t ⎪ u ⎪ i ⎩ 0, t

< t ∗ , θi < t ∗, ≥ t ∗, ≥ t ∗ , θi

∈ (Li0 , Ui0 ], θi = Li0 , θi = Ui0 , ∈ [Li0 , Ui0 ),

(5.49)

P and .uDOW N are as defined in Assumption 5.3.2.2. where .uU i i

Fig. 5.21 Schematic of the optimization and control framework, indicating periods of pre-cooling (PC), off time (OC), and normal cyclical cooling operation (CC)

168

5 Use of Energy Storage as a Means of Managing Variability

In order to apply the result of Theorem 5.4 to solving (5.47) with a unimodal price forecast .πˆ (t), t ∈ [0, tP EAK ] satisfying Assumption 5.3.3.1, it is first necessary to determine the pre-cooling period, denoted by .PC = [0, t ∗ ], as shown in Fig. 5.21, N (t ∗ ) and .θ (t ∗ ) = L . We begin by noting that we such that .ui (t ∗ ) = uDOW a i0 i would like to maintain .ui = 0 for as long as possible around the peak pricing period, without violating consumers’ comfort bounds. We denote this period where .ui = 0 as the off cycle (OC) with duration .SˆOF F . The longest period for which the off cycle can be maintained is the average duty cycle .TOF F as defined in Remark 5.2, that is, ˆOF F = TOF F . Working backward, we can approximate .t ∗ ≈ tP EAK − TOF F /2. .S We then have the following result on the solution to the optimization problem (5.47) during the pre-cooling period and the off cycle. Corollary 5.1 If .πˆ (t) is monotonically increasing for .t ∈ [0, tP EAK ] and monotonically decreasing for .t ∈ [tP EAK , T ], then the solution to (5.47) for .t ∈ [0, tP EAK + TOF F /2] is given by (5.49) with .t ∗ ≈ tP EAK − TOF F /2. After the off cycle, the price .π(t), ˆ t ∈ [tP EAK + TOF F /2, T ] is assumed to be monotonically decreasing according to Assumption 5.3.3.1. During this period, two types of control actions are possible as follows: • Option 1: Maintain .θi (t) = Ui (t) for .t ∈ [tP EAK + TOF F /2, T ]. • Option 2 (cooling cycle or CC): Allow the collection of ACs to evolve according to their natural dynamics (5.45) with control action (5.46). In our approach, we choose the latter, Option 2, for two reasons. First, Option 2 allows for greater comfort for residential consumers by maintaining the average temperature of the home closer to the setpoint of the consumer’s choice. Second, since the ambient temperatures during this period are typically cooler, it may not be optimal to maintain the temperature at the upper comfort bound .Ui0 . In summary, we solve the optimization problem .P by dividing the day into three time horizons, namely, pre-cooling, off cycle, and cooling cycle, for which the control actions .u∗i (t) are determined by Corollary 5.1. Remark 5.3 We make the following remarks about the proposed solution to the optimization problem .P: • We note that .TOF F for a given ambient temperature profile can be easily inferred by observing the total load profile over a day, without any direct knowledge of the dynamics of the homes. Therefore, the solution to (5.47) can be constructed in a completely model-free manner. • This solution relies on an accurate forecast of .tP EAK , which is obtained using the LSTM network described in Sect. 5.3.3.1. Note that the actual magnitude of the peak price is not important to our approach. Therefore, while prediction errors in the price magnitude can be tolerated, it is critical that the LSTM network be tuned such that the time of peak pricing is predicted as accurately as possible.

5.3 Utility’s Procurement with Storage and/or Demand Response

5.3.3.2

169

Stage 2: Private Control Implementation

The description below provides information on how to solve the optimization problem .P as discussed in Sect. 5.3.3.1 and depicted in Fig. 5.21 that can be implemented in a private and distributed manner in each home, without any measurement of the state (temperature and power consumption) of the home by the LSE. At any time .t ∈ [0, T ], the LSE broadcasts one of the following commands to the ACs: ⎧ t ∈ [0, tP EAK − TOF F /2], ⎨ 1, .c(t) = (5.50) 0, t ∈ [tP EAK − TOF F /2, tP EAK + TOF F /2], ⎩ CC, t ∈ [tP EAK + TOF F /2, T ]. The ACs then translate these commands into their private switching state .σi (t), .i ∈ {1, 2, . . . , N}, at each time .t ∈ [0, T ] as follows: ⎧ 1, c(t) = 1, θi (t) ∈ (Li0 , Ui0 ], ⎪ ⎪ ⎪ ⎪ DOW N (t), c(t) = 1, ⎪ u θi (t) = Li0 , ⎪ i ⎪ ⎪ ⎪ 0, c(t) = 0, θ (t) ∈ [Li0 , Ui0 ), ⎨ i U P .σi (t) = ui (t), c(t) = 0, θi (t) = Ui0 , ⎪ ⎪ ⎪ σ (t − ), c(t) = CC, θi (t) ∈ (Li0 , Ui0 ), ⎪ ⎪ ⎪ ⎪ 1, c(t) = CC, θi (t) = Ui0 , ⎪ ⎪ ⎩ 0, c(t) = CC, θi (t) = Li0 .

(5.51)

In contrast to the PID-based differentially private control implementation described in Sect. 5.3.2.3, the control actions (5.51) can be implemented in a simple manner without any measurements being transmitted to the LSE. The only requirement is that the homes be equipped with a smart thermostat that can receive N (t) and instructions broadcast by the LSE. We note that the control inputs .uDOW i U P .u i (t) in (5.51) to maintain a particular temperature .θi (t) once the home has cooled to its setpoint are also already present as an energy-saving measure in most ACs, where they are implemented by turning off the compressor of the AC and do not require knowledge of the thermal parameters of the home.

5.3.4 Case Study In this section, we demonstrate the application of the proposed optimization and control framework in Sect. 5.3.3 on a test scenario in the Houston area and benchmark it against the model-based solution described in Sect. 5.3.2.3. We consider .N = 500 ACs with thermal power .P = 14kW, and efficiency .η = 2.5, with thermal parameters .αi and .βi , .i = 1, 2, . . . , N drawn from the truncated Gaussians .α ∼   N (μα , 0.1μα , 0.9μα , 1.1μα ) and .β ∼ N μβ , 0.1μβ , 0.9μβ , 1.1μβ , respectively,

170

5 Use of Energy Storage as a Means of Managing Variability ◦

1 −1 where .μα = RC h , .μβ = C1 C/kWh, and .R = 2◦ C/kW and .C = 10 ◦ .kWh/ C represent the thermal resistance and capacitance of the ACs, respectively. We assume that the comfort bands of the ACs .i are uniformly distributed in the range .[1, 3]◦ C. As described in Sect. 5.3.3.1, we begin by using an LSTM to forecast the hourly price given historical price data and the ambient temperature profile, as shown in Fig. 5.22(top). To obtain this forecast, we consider an input dataset comprising of (i) real-time electricity price data for Houston, Texas (LZ-HOUSTON node), at 15min intervals over a period of 7 years ranging from 2013 to 2019, available from the Electric Reliability Council of Texas (ERCOT) at http://www.ercot.com/mktinfo/ prices, and (ii) hourly historical weather data, available from the National Centers for Environmental Information at https://www.ncdc.noaa.gov/cdo-web/datatools. We begin by averaging the 15-min prices from the ERCOT dataset to obtain the average hourly historical prices. After suitably scaling the temperature and hourly price datasets, we separate them into training and test datasets, where the training dataset comprises of all price and temperature information for the years 2013–2017 and the test dataset comprises of the same information for the years 2018–2019. We then implement an LSTM network comprised of one hidden layer with five LSTM neurons using Keras (https://keras.io). Based on the considerations described in Remark 5.2, we choose a forecast window of 3 h. The network was found to converge in ten epochs, with a mean absolute error (MAE) of 4.06%. We then compare the two following approaches:

• Private model-free control scheme: We compute the solution to the optimal control problem (5.47) using the approach in Sect. 5.3.3.1, compute the control commands broadcast by the LSE according to (5.50), and implement the corresponding switching actions (5.51). • Model-based control scheme: We compute the solution to the optimal control problem (5.47) by an LP relaxation as described in Sect. 5.3.2.3; compute the velocity control commands broadcast by the LSE using a PID controller with gains .kp = 10−4 , .ki = 10−6 , and .kd = 10−4 ; and determine the control actions of the individual homes according to (5.48). We simulate the response of the homes to each of these control schemes by solving (5.45) with switching action (5.51) over a horizon of .T = 24h by discretization using the Euler method with a step size of 1 second and compute the total power consumption at each time step. Figure 5.22(bottom) shows the temperature profiles of the homes with the model-free control scheme, clearly satisfying consumer comfort constraints. We observe that the ACs are pre-cooled from .t ∈ [0, 16]h and are turned off during the period of peak pricing between .t ∈ [16, 17.5]h. It can be verified that this off cycle aligns with the average duty cycle of the ACs computed from .αi and .βi , .{1, 2, . . . , N }. The total power consumptions under the model-free and model-based control schemes are compared in Fig. 5.22(middle). We observe that the power consumption of the model-free control scheme approximately tracks the mean of the power

5.3 Utility’s Procurement with Storage and/or Demand Response

171

Fig. 5.22 Top: hourly price and ambient temperature. Middle: comparison—power consumption of the proposed model-free framework vs. model-based solution in Sect. 5.3.2.3. Bottom: temperature profiles of ACs

172

5 Use of Energy Storage as a Means of Managing Variability

consumption trajectory generated by the model-based scheme. The average energy consumption .Eavg and energy cost-savings .Es over the day for each control scheme are found to be as follows: Uncontrolled: Eavg = 25.68MWh, Es = 0 . Model-based: Eavg = 22.4MWh, Es = $3787 Model-free: Eavg = 23.0MWh, Es = $3597. Strikingly, the proposed model-free approach has almost no loss of performance as compared to the complex model-based scheme, indicating its potential.

5.3.5 Conclusion and Future Work In this section, we proposed a model-free framework to minimize the cost of procuring electricity for a collection of residential thermal loads by pre-cooling them to avoid purchasing power during peak pricing periods. The proposed approach is privacy-preserving in the sense that it does not require knowledge of the thermal dynamics or measurement of the states of the individual homes. Future work will involve improving the forecast of the time at which the peak price occurs in an online manner to dynamically shape the duration and frequency of the pre-cooling cycles.

Chapter 6

Forecast for the Future

6.1 Forecasting This section investigates the fundamental coupling between loads and locational marginal prices (LMPs) in security-constrained economic dispatch (SCED). Theoretical analysis based on multi-parametric programming theory points out the unique one-to-one mapping between load and LMP vectors. Such one-to-one mapping is depicted by the concept of system pattern region (SPR), and identifying SPRs is the key to understanding the LMP-load coupling. Built upon the characteristics of SPRs, the SPR identification problem is modeled as a classification problem from a market participant’s viewpoint, and a support vector machine-based datadriven approach is proposed. It is shown that even without the knowledge of system topology and parameters, the SPRs can be estimated by learning from historical load and price data. Visualization and illustration of the proposed data-driven approach are performed on a 3-bus system as well as the IEEE 118-bus system.

6.1.1 Introduction A fundamental issue with electricity market operation is to understand the impact of operating conditions (e.g., load levels at each bus) on the locational marginal prices (LMPs). This section examines this key issue of the relationship between nodal load levels and LMPs. This issue becomes complicated when the levels of demand response and variable resources in the grid increase. In the power systems literature, reference [319] is among the pioneering works that uses perturbation techniques to compute the sensitivities of the dual variables in SCED (e.g., LMPs) with respect to parameters (e.g., the nodal load levels). This sensitivity calculation method is widely used in subsequent research. However, this approach is valid only for small changes, and the marginal generator stays the same. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Xie et al., Data Science and Applications for Modern Power Systems, Power Electronics and Power Systems, https://doi.org/10.1007/978-3-031-29100-5_6

173

174

6 Forecast for the Future

Reference [320] observed the “step changes” of LMPs with respect to increasing system load levels and discovered that new binding constraints (transmission or generation) are the reason for the “step changes.” This is followed by further analysis on identifying the critical load levels (CLLs) that trigger such step changes of LMPs [321–323]. This line of work assumes that the system load change is distributed to each bus proportional to the base case load, which, in many instances, does not necessarily represent the real-world situations. Reference [324] analyzed this problem using quadratic-linear programming (QLP), and the concepts of system patterns and system pattern regions (SPRs) were first introduced. The SPRs depict the relationship between loads and LMPs in the whole load space, which is not confined in a small neighborhood of an operating point or constrained by a specific load distribution pattern. This section is inspired by Zhou et al. [324] but focuses on the case of piecewise linear generation costs instead of the quadratic cost case in [324]. The reason that we study the piecewise linear cost case is that piecewise linear cost curves are often representative of the market practice in the real world. In addition, some new theoretical results based on piecewise linear cost curves are derived and are generalizable toward quadratic cost cases. Characterizing the SPRs would provide important insights to both system operators and market participants. Reference [325] advances the theory of SPR from the system operator’s perspective, where the knowledge of system topology and parameters is available. For market participants, such knowledge is not necessarily available. Our previous work [326] examines the issue from a market participant’s viewpoint and applies the geometric features of SPRs to identify them. This section significantly advances our previous work by (1) completing the theoretical characterization of SPRs as a function of nodal load levels; (2) proposing a computational algorithm to identify SPRs using historical data; (3) introducing the posterior probabilities of SPRs with the presence of uncertain system parameters such as transmission limits; and (4) extending the algorithm to consider practical factors such as partial load information and loss component of LMPs. The remainder of the section is organized as follows. Section 6.1.2 provides the analysis of LMP-load coupling in the SCED problem from the viewpoint of MLP theory, with an illustrative example. Section 6.1.3 illustrates the changes of SPRs given changes in system parameters such as transmission limits. Based on the theoretical analysis, a data-driven algorithm for market participants to identify SPRs is described in Sect. 6.1.4. Section 6.1.5 illustrates the performance of the algorithm on the IEEE 118-bus system. Section 6.1.6 explores the impact of nodal load information, and Sect. 6.1.7 provides a critical assessment of the proposed method. Concluding remarks and future works are presented in Sect. 6.1.8.

6.1 Forecasting

175

6.1.2 Theoretical Analysis 6.1.2.1

Notations

The notations of this section are summarized below: mathematical symbols in hollowed-out shapes (e.g., .R) represent spaces, and symbols in Calligra font (e.g., ∗ .Sπ ) denote for sets. The superscript “. ” indicates the variable is optimal; “. ˆ ” ˆ Variables with “. ¯ ” are expectations or average denotes estimated values (e.g., .λ).  values (e.g., .λ¯ ). “. ” denotes the transpose of a vector or matrix (e.g., .1n ). The subscript “.i ” represents the ith element of the vector (e.g., .PGi ), and the superscript “.(i) ” represents the ith element in a set (e.g., .PD(i) ). The vector of .n × 1 ones, matrix of .m × n zeros, and the .n × n identity matrix are denoted by .1n , .0m×n , and .In , respectively.

6.1.2.2

Security-Constrained Economic Dispatch

In real-time energy market operations, the LMPs are the results from the securityconstrained economic dispatch (SCED), which is formulated as follows: min

.

(k) PG

s.t.

nb 

ci (PG(k)i ),

(6.1a)

.

i=1 nb 

(k)

PGi =

nb 

(k)

: λ1 , .

(6.1b)

 (k) (k)  − F + ≤ H PG − PD ≤ F +

: μ+ , μ− , .

(6.1c)

PG− ≤ PG ≤ PG+

: η+ , η− .

(6.1d)

i=1

(k)

j =1

PDj

where .PG(k) is the generation vector at time k and .PD(k) is the load vector at time k. We assume there are both generation and load at each bus. Let .nb denote the number (k) (k) of buses and .nl denote the number of transmission lines, then .PG , PD ∈ Rnb . n ×nb is the shift factor matrix. .H ∈ R l This formulation considers each snapshot independently; therefore it is called static SCED in this section. For simplicity, we write .PG(k) and .PD(k) as .PG and .PD when discussing the static SCED. The objective of SCED is to minimize the total generation cost and satisfy the transmission and generation capacity constraints while keeping the real-time balance between supply and demand. The generation cost function .ci (PG(k)i ) of generator i is increasing and convex, and it is usually regarded as a quadratic function or approximated by a piecewise linear function. To better reflect the current practice in electricity markets, this section studies the SCED problem

176

6 Forecast for the Future

with piecewise linear generator bidding for the consideration of  b functions. And simplicity, the simplest form, i.e., . ni=1 ci (PGi ) = c PG , is being considered in this section. A fundamental concept in electricity markets is the locational marginal price. The LMP .λi at bus i is defined as the change of total system cost if the demand at node i is increased by 1 unit [327]. According to [328], the LMP vector .λ can be calculated by the following equation: λ = λ1 1nb + H  (μ+ − μ− ).

.

(6.2)

We start with the simplest case of static SCED. More elaborated SCED formulations are explained in Sect. 6.1.5.4. Since line losses are not explicitly modeled in the SCED formulation, the LMPs in this section do not contain the loss components. Further discussions on the loss component are in Sect. 6.1.7.4.

6.1.2.3

SCED Analysis via MLP

In real-world market operations, the parameters associated with the SCED above are typically time-varying. Therefore, it is essential to understand the effects of parameters on the optimality of the problem. A multi-parametric programming (MP) problem aims to explore the characteristics of an optimization problem that depends on a vector of parameters [329]. The multi-parametric linear programming (MLP) theory, which is the foundation of this section, pays special attention to linear programming (LP) problems. The objective of this section is to understand the impact of parameters (i.e., load levels, line capacities, etc.) on the outcome of SCED (namely, the prices). We pose the problem in view of MLP and analyze the theoretical properties. In reality, LMP vectors depend upon a number of factors, including (1) the loads in the system, (2) line flow limits, (3) ramp constraints, (4) generation offer prices, (5) topology of the system, and (6) unit commitment results. We first focus on the relationship between the loads and LMPs, assuming the other five factors remain unchanged; then Sect. 6.1.3 considers the line flow limits and ramp constraints, while the influence of generation offer prices is explored in Sect. 6.1.7.3. Future work will investigate the impacts of unit commitment results and the system topology changes on the prices. Consider the static SCED in the standard MLP form:1 Primal: min{c PG : APG + s = b + W PD , s ≥ 0}, .

.





Dual: max{−(b + W PD ) y : A y = −c, y ≥ 0}, 1 In

(6.3a) (6.3b)

other references (e.g., [330, 331]), the primal form of the MLP problem is different. For the convenience of analyzing the SCED problem, we follow the formulations in [329]. Those two forms are interchangeable.

6.1 Forecasting

177

where ⎤ ⎡ ⎡  ⎤  ⎤ 1nb 0 1nb ⎢−1 ⎥ ⎢ 0 ⎥ ⎢ −1 ⎥ nb ⎥ ⎥ ⎢ nb ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ ⎢ + ⎥ ⎢ ⎢ H ⎥ ⎢F ⎥ ⎢ H ⎥ .A = ⎢ ⎥,b = ⎢ ⎥,W = ⎢ ⎥. + ⎢ −H ⎥ ⎢−F ⎥ ⎢ −H ⎥ ⎥ ⎥ ⎢ ⎢ + ⎥ ⎢ ⎣ Inb ⎦ ⎣ PG ⎦ ⎣0nb ×nb ⎦ −Inb −PG− 0nb ×nb ⎡

(6.4)

The load vector .PD is the vector of parameters .θ, and the load space .D is the parameter space .. Since not every .PD in the load space leads to a feasible SCED problem, .D ∈ D denotes the set of all feasible vectors of loads. Gal and Nedoma [331] shows that .D is a convex polyhedron in .D. Definition 6.1 (Optimal Partition/System Pattern) For a load vector .PD ∈ D, we could find a finite optimal solution .PG∗ and .s ∗ . Let .J = {1, 2, . . . , nc } denote the index set of constraints where .nc = 2 + 2nl + 2ng for Eq. (6.3). The optimal partition .π = (B, N) of the set .J is defined as follows: B(PD ) := {i : si∗ = 0 for PD ∈ D}, .

.

N(PD ) := {j :

sj∗

> 0 for PD ∈ D}.

(6.5a) (6.5b)

Obviously, .B ∩ N = ∅ and .B ∪ N = J. The optimal partition .π = (B, N) divides the index set into two parts: binding constraints .B and non-binding constraints .N. In SCED, the optimal partition represents the status of the system (e.g., congested lines, marginal generators) and is called a system pattern. Definition 6.2 (Critical Region/System Pattern Region) The concept critical region refers to the set of vectors of parameters that lead to the same optimal partition (system pattern) .π = (Bπ , Nπ ): Sπ := {PD ∈ D : B(PD ) = Bπ }.

.

(6.6)

For the consideration of consistency, the critical region is called system pattern region (SPR) in this section. According to the definitions, each SPR is one-to-one mapped to a system pattern, the SPRs are therefore disjointed, and the union of all the SPRs is the feasible set of vectors of loads: .∪i Sπi = D. All the SPRs together represent a specific partition of the load space. The features of SPRs, which are directly inherited from critical regions in MLP theory are summarized as follows. Theorem 6.1 The load space could be decomposed into many SPRs. Each SPR is a convex polytope. The relative interiors of SPRs are disjointed convex sets, and each corresponds to a unique system pattern [324]. A hyperplane exists between any two SPRs [326].

178

6 Forecast for the Future

Lemma 6.1 (Complementary Slackness) According to complementary slackness AB PG∗ = (b + W PD )B.

(6.7a)

AN PG∗  AB yB

< (b + W PD )N.

(6.7b)

= −c, yB > 0.

(6.7c)

.

yN = 0

(6.7d)

where the .(·)B is the sub-matrix or the sub-vector whose row indices are in set .B; the same meaning applies to .(·)N . Remark 6.1 The supply-demand balance equality constraint is rewritten as two inequalities in Eq. (6.3). These two inequalities will always be binding and appear in the binding constraint set .B at the same time. One of them is redundant and therefore eliminated from set .B. In the remaining part of the section, set .B denotes the set after elimination. If the problem is not degenerate, the cardinality of binding constraint set .B is equal to the number of decision variables (i.e., number of generators .ng );2 matrix .AB is invertible. Remark 6.2 SCED problems with different generation costs will have different SPRs. For a system pattern .π = (B, N), its SPR would remain the same as long as the generation cost vector c satisfies Eq. (6.7c). Lemma 6.2 Within each SPR, the vector of LMPs is unique [325, 326]. The proof of this lemma follows Eq. (6.7c) (dual form of system pattern definition). Since the system pattern .π is unique within an SPR .Sπ , therefore the solution .y ∗ is unique for any .PD ∈ Sπ . And the vector of LMPs can be calculated using Eq. (6.2). This lemma also illustrates that the LMP vectors are discrete by nature in the case of linear costs. Theorem 6.2 If the SCED problem is not degenerate, then different SPRs have different LMP vectors. The proof of Theorem 6.2 turns out to be non-trivial and is described as follows. If two SPRs have the same LMP, .λ(i) = λ(j ) , their energy components are the same because of the entry-wise equality; then Eq. (6.2) suggests that the congestion components should also be the same: .H  (μ(i) − μ(j ) ) = 0. Given the fact that the null space of .H  is always non-empty,3 a critical question arises: “Is it possible that (i) − μ(j ) belongs to the null space of .H  ?” Or equivalently, “Is it possible that .μ

2 This is consistent with the statement that the number of marginal generators equals the number of congested lines plus one. 3 .dim(N (H  )) = n − n + 1 ≥ 0. The equality holds if, and only if, the topology of the system l b is a tree, where .nl = nb − 1.

6.1 Forecasting

179

Fig. 6.1 3-bus system

different congestion patterns have the same LMP vector?” We show that the answer is “no.” A complete proof of the theorem is provided in [332].

6.1.2.4

An Illustrative Example

The 3-bus system in Fig. 6.1 serves as an illustrative example in this section. It was first analyzed using the Multi-Parametric Toolbox 3.0 (MPT 3.0) [333]; results are shown in Fig. 6.2a. A Monte Carlo simulation is conducted, with load vectors colored according to their LMPs. The theoretical results are verified by the Monte Carlo simulation results. Notice that .PD2 and .PD3 could be negative. This is for the consideration of renewable resources in the system, which are typically considered as negative loads.

6.1.3 SPRs with Varying Parameters Section 6.1.2.3 shows construction properties of the load space with fixed parameters of the system (e.g., transmission constraints). However, these parameters might be time-varying due to reasons like dynamic ratings or active ramping constraints. This subsection reveals more features of SPRs with respect to varying factors in the system. Lemma 6.3 (Analytical Form of SPRs) Let .IB · (b + W PD ) represent the subvector .(b + W PD )B , where .IB is the sub-matrix of the identity matrix whose row indices are in set .B. Then the analytical form of the SPRs could be solved from Eqs. (6.7a) and (6.7b) as follows:

180

6 Forecast for the Future 150

#2 100

PD3

50

#3

#1

0

#4 -50 -100

#5 -100

0

100

200

300

PD2

(a) 150 100

3

50 PD

0 -50 -100 -150 -200

LMP = [20;20;20] LMP = [50;50;50] LMP = [20;50;80] LMP = [20;50;-10] LMP = [20;50;35] -100

0

100 PD

200

300

2

(b) Fig. 6.2 SPRs of the 3-bus system (static SCED). (a) Theoretical Results Using MPT 3.0. (b) Monte Carlo simulation

(IN A · (IB A)−1 IB − IN )(b + W PD ) < 0.

.

(6.8)

Remark 6.3 Equation (6.8) could be written as (IN A(IB A)−1 IB − IN ) · W PD < (IN − IN A(IB A)−1 IB )b.

.

(6.9)

This indicates that the shape of the SPR .Sπ only depends on two factors: (1) the corresponding system pattern .π = (B, N); (2) matrices A and W , namely, the shift

6.1 Forecasting

181

factor matrix H according to Eq. (6.4). Small changes of vector b only parallel-shift the SPRs’ boundaries.

6.1.3.1

Dynamic Line Rating

Dynamic line rating (DLR), contrary to the static line rating (SLR), refers to the technology that optimizes the transmission capacity based on real-time conditions such as ambient temperature and wind speed [334]. It is considered to be more adaptive in maximizing the line potential while keeping the secure grid operation. From a dispatch point of view, DLR can be represented by the changes of transmission limits .F + in Eq. (6.1c). It changes the vector b in Eq. (6.4) and thus translates the boundaries of SPRs. The 3-bus system in Fig. 6.1 with different transmission limits is analyzed via MPT 3.0. Compared with the standard transmission limits .[60; 60; 80], when we increase the limits by .10% (Fig. 6.3a), SPR #3 expands, but SPR #1, #2, and #4 shrink; when we decrease the limits by .10% (Fig. 6.3b), SPR #3 shrinks, but SPR #1, #2, and #4 expand. This verifies the claim that dynamic line ratings only shift the boundaries without altering the shapes of SPRs. The implication of having DLR is that SPRs in Fig. 6.4 are overlapping instead of completely separable in Fig. 6.2b. Details of the Monte Carlo simulation are provided in Sect. 6.1.5.3.

6.1.3.2

Ramping Constraints

The analysis of SPRs can also be generalized to the dispatch models that include inter-temporal constraints such as ramping: PGk−1 − R − t ≤ PGk ≤ PGk−1 + R + t.

.

(6.10)

In Eq. (6.10), .R + and .R − represent the ramp up and down constraints of generators. Adding ramp constraints to the static SCED problem is equivalent to replacing the generation capacity constraints in Eq. (6.1d) with: .

max{PG− , PGk−1 − R − t} ≤ PGk ≤ min{PG+ , PGk−1 + R + t}.

(6.11)

When the ramp capacity is not binding, i.e., .PG− > PGk−1 − R − t and .PG+ < PGk−1 +R + t, the SCED problem is the same as the case where no ramp constraints are considered. The SPRs would be the same as in Fig. 6.2a and b. However, active ramp constraints change the actual generation constraints and, therefore, change the parameter b in Eq. (6.4). This leads to a parallel shift of the boundaries of SPRs. The impacts of ramping constraints on SPRs are similar with the case of dynamic line ratings.

182

6 Forecast for the Future

Fig. 6.3 SPRs of the 3-bus system (static SCED with DLRs). (a) Line Limits: (66, 66, 88). (b) Line Limits: (54, 54, 72)

The 3-bus system, again, is analyzed via both MPT 3.0 and Monte Carlo simulation. Figure 6.5a and b demonstrate the cases where ramp constraints are active. SPRs look similar with parallel changes on the boundaries. When analyzing the load and LMP data, we will again see the overlapping SPRs (Fig. 6.6).

6.1 Forecasting

183

Fig. 6.4 Monte Carlo simulation (static SCED with DLRs)

200 150 100

PD3

50 0 -50 -100 -150 -200

LMP = [20;20;20] LMP = [50;50;50] LMP = [20;50;80] LMP = [20;50;35] LMP = [20;50;-10] -100

0

100

200

300

PD2

6.1.4 A Data-Driven Approach to Identifying SPRs The SPRs depict the fundamental coupling between loads and LMP vectors. Massive historical data could help market participants estimate SPRs, understand the load-LMP coupling, and then forecast LMPs. This section proposes a datadriven method to identify SPRs, which is a significant improvement of the basic method in [326] by considering varying system parameters and the probabilistic nature of system parameters.

6.1.4.1

The SPR Identification Problem

SPR Identification as a Classification Problem A classifier is an algorithm to give a label y to each feature vector x. The feature vectors sharing the same labels belong to the same class. The objective of the classification problem is to find the best classifier which could classify each feature vector accurately. For the parametric classifiers, there is always a training set, i.e., a group of feature vectors whose labels are known. There are two steps in a classification problem: training and classifying. Training is best described as solving an optimization problem over the training set to find the best parameters of the classifier. And classifying is to classify a new feature vector with the trained classifier. According to Sect. 6.1.2.3, the load vectors in an SPR share many common features (e.g., vectors of LMPs). Theorem 6.2 proved that the LMP vectors are distinct for different SPRs. Therefore, one SPR can be regarded as a class, and the LMP vector is the label of each class. Theorem 6.1 proves the existence of

184

6 Forecast for the Future

Fig. 6.5 SPRs of the 3-bus system (SCED with ramp constraints). (a) Previous generation: (30; 30). (b) Previous Generation: (100; 100)

the separating hyperplanes. Since each separating hyperplane labels two SPRs with different sides, it turns out that the separating hyperplanes are classifiers, and the key to identifying SPRs is to find optimal hyperplanes, which is exactly the objective of support vector machine (SVM).

6.1 Forecasting

185

Fig. 6.6 Monte Carlo simulation (SCED with ramp constraints)

140 120 100 80

PD

3

60 40 20 0

LMP = [20;20;20]

-20

LMP = [50;50;50] LMP = [20;50;80]

-40 -60 -50

LMP = [20;50;35]

0

50

100 PD

150

200

2

SPR Identification with SVM Suppose there is a set of labeled load vectors for training, and those load vectors belong to only two distinct SPRs (labels .y (i) ∈ {1, −1}). Then the SPR identification problem with a binary SVM classifier (separable case) is stated below: .

min w,b

s.t

1  w w, . 2

(6.12a)

y (i) (w  PD − b) ≥ 1, y (i) ∈ {−1, 1}. (i)

(6.12b)

The word “binary” here specifies that only two classes (i.e., SPRs) are being considered. Equation (6.12b) is feasible only when the two SPRs are not overlapping and there exists at least one hyperplane thoroughly separating them (separable case). For any load vector .PD in the load space, .w  PD − b = 0 represents the separating hyperplane, where w is the norm vector to the hyperplane. Two lines satisfying  .w PD − b = ±1 separate all the training data and formulate an area with no points inside. This empty area is called margin. The width of the margin is .2/||w||, which is the distance between those two lines. The optimal solution refers to the separating hyperplane that maximizes the width of the margin .2/||w||; therefore the objective of the binary SVM classifier is to minimize the norm of vector w (Fig. 6.7). Due to the existence of multiple SPRs, multi-class classifiers are needed. Since Theorem 6.1 guarantees the existence of separating hyperplanes between every pair of SPRs. The “one-vs-one” multi-class SVM classifier is incorporated into the data-driven approach of identifying SPRs. Detailed procedures are summarized in Sect. 6.1.4.2.

186

6 Forecast for the Future

Fig. 6.7 SPR identification problem with SVM (separable case)

Fig. 6.8 SPR identification problem with SVM (non-separable case)

6.1.4.2

A Data-Driven Approach

SPR Identification with Varying System Parameters When the system parameters are varying (e.g., dynamic line ratings), two SPRs may overlap with each other. The SPR identification problem is no longer a separable case as in Sect. 6.1.4.1. The SVM classifier needs to incorporate soft margins to allow some tolerance of classificationerror. The slack variable si is added to Eq. (6.12a), and penalties of violation C i si are added in the objective function. Large C indicates a low extent of tolerance (Fig. 6.8): .

min

w,b,s

s.t

 1  s (i) , . w w+C 2

(6.13a)

y (i) (w  PD(i) − b) ≥ 1 − s (i) ,

(6.13b)

i

s (i) ≥ 0, y (i) ∈ {−1, 1}.

Fitting Posterior Probabilities The posterior probability is the probability that the hypothesis is true given relevant data or observations. In the classification problem, the posterior probability can be stated as .P(class|input).

6.1 Forecasting

187

Fig. 6.9 The data-driven approach

Estimating the posterior probability is very helpful in practical problems [335]. When identifying SPRs, knowing the posterior probability .P(y = i|PD and y ∈ {1, 2, . . . , n}) is not only about knowing the classification result .y = i (.PD belongs to SPR#i) but also understanding the confidence or possible risk. The market participants could accordingly adjust their bidding strategy and reduce possible loss. Although the posterior probabilities are desired, the standard SVM algorithm provides an uncalibrated value which is not a probability as output [335]. Modifications are needed to calculate the binary posterior probabilities .P(y = i|PD and y ∈ {i, j }). Common practice is to add a link function to the binary SVM classifier and train the data to fit the link function. Some typical link functions include sigmoid functions [335] and Gaussian approximations [336]. In this section, the sigmoid link function is selected due to its general better performance than other choices [335]. In general, there are more than two SPRs. We want to determine the multi-class posterior probabilities .P(y = i|PD and y ∈ {1, 2, . . . , n}). In short, we will use .P(y = i|PD ) to represent multi-class posterior probabilities. Hastie et al. [336] proposed a well-accepted algorithm to calculate multi-class posterior probabilities from pairwise binary posterior probabilities. This algorithm is incorporated into our approach and is briefly summarized in our technical report [332].

The Data-Driven Approach There are three steps in the proposed data-driven approach (Fig. 6.9):

Training Suppose there are n different SPRs in the training dataset. Each time two SPRs are selected and trained, and we get a binary SVM classifier. This pairwise training procedure is repeated .Cn2 = n(n − 1)/2 times, and we collect .n(n − 1)/2 binary classifiers, namely, the .n(n − 1)/2 separating hyperplanes between any two out of n SPRs.

188

6 Forecast for the Future

Classifying/Predicting Given load forecast .PD , we could use the max-vote-wins algorithm to get the classification results: each binary classifier provides a classification result (vote) for the load forecast .PD ; the SPR which collects the most votes will be the final classification result. The load forecast .PD is therefore pinpointed to an SPR. The ∗ LMP forecast: .λˆ (PD ) = λ(i ) where .i ∗ is the index of the SPR winning the most votes. This step is independent of the data post-processing procedure.

Data Post-processing Posterior probabilities .P(y = i|PD ) for .i = 1, 2, . . . , n is calculated by applying Platt’s algorithm and then Hastie and Tibshirani’s algorithm.4 It is worth noting that the proposed approach is generalizable to many other scenarios with overlapping SPRs in the data. Possible extensions are discussed in Sect. 6.1.7.1.

6.1.5 Case Study In this section, we illustrate the proposed data-driven approaches on two systems.

6.1.5.1

Performance Metrics

First, we introduce the performance metrics.

Fivefold Cross-Validation To evaluate the performance of the model to an independent dataset and avoid overfitting, the k-fold cross-validation technique is being used. In k-fold crossvalidation, the overall dataset is randomly and evenly partitioned into k subsets. Each time, a subset is chosen as a validation dataset, and the remaining .k −1 subsets are used for training. This cross-validation process is repeated k times (k folds), and each subset serves as the validation dataset once. The fivefold cross-validation is being used in this section.

4 Details

of these two algorithms are summarized in [332].

6.1 Forecasting

189

Classification Accuracy Classification accuracy is the most common criterion to evaluate the performance of classifiers. The classification accuracy .α is the ratio of the correctly classified points in the validation dataset. When incorporating fivefold cross-validation, the classification accuracy of each fold (.α1 , α2 , . . . , α5 ) is calculated first; then the overall performance of the method is evaluated by the average classification  accuracy .α¯ = ( 5i=1 αi )/5. LMP Forecast Accuracy The proposed approach forecasts the LMP at every bus. The performance of the LMP forecast at bus i is evaluated by the nodal LMP forecast accuracy .βi , which is the average forecast accuracy of all the validation data points (.j = 1, 2, . . . , nv ): βi =

.

nv |λˆi [j ] − λi [j ]| 1  . nv λi [j ]

(6.14)

j =1

The overall LMP forecast accuracy .β evaluates the performance of LMP forecast for the whole system. It is the average of all the nodal LMP forecast accuracy .βi (.i = 1, 2, . . . , nb ): nb 1  .β = βi . nb

(6.15)

i=1

6.1.5.2

Static SCED with Static Line Ratings

This section explores the simplest case: static SCED with SLRs. Since [326] discusses the 3-bus system as well as the IEEE 24-bus system, we only examine the data-driven approach on a 118-bus system. The same dataset generated in this section is used in Sect. 6.1.6.1 as well.

System Configuration Most of the system settings follow the IEEE 118-bus, 54-unit, 24-hour system in [337] but with the following changes: (1) the lower bounds of generations are set to zero, but the upper bounds of generators remain the same as in [337]; and (2) generation costs are linear. Details of the parameters are summarized in [338].

190

6 Forecast for the Future

Load I. I. of Technology [337] also provides an hourly system load profile and a bus load distribution profile. With linear interpolation, the hourly system load profile is modified to be 5-minute based. To account for the variability of loads, we assume the load at each bus follows normal distribution .N(μ, σ ). The expectation .μ of each nodal load is calculated from the system load profile and bus load distribution profile. The standard deviation .σ is set to be .10% of the expectation. 1440 (5 days, 5-minute based) load vectors are generated; then Matpower [339] solves these 1440 SCED problems and records 1440 LMP vectors. These 1440 load vectors and LMP vectors are the training and validation data.

Simulation Results Results are summarized in Table 6.4. The classification accuracy is around 67% but the LMP forecast is satisfying. When the classification result of a load vector is correct, the LMP forecast is correct for every bus, i.e., .β = 100%. Note, even if the classification fails, the overall LMP forecast still has an accuracy of about 90%. This is because the classification errors happen between one SPR and its neighbors. LMPs of adjacent SPRs are similar since only one active constraint is different.5 Therefore, the LMP forecast result is much more accurate than classification (Table 6.1).

6.1.5.3

Static SCED with Dynamic Line Ratings

3-bus System We start with an illustrative 3-bus system example. This succinct example provides key insights and visualization of the proposed method. Table 6.1 Results of the 118-bus system (static SCED with SLR)

5 Lemma

System Patterns of Adjacent SPRs in [332].

Fold 1 2 3 4 5 Avg

Classification 64.24% 67.36% 64.93% 71.18% 65.63% 66.67%

LMP forecast 96.82% 96.71% 96.95% 97.34% 96.84% 96.93%

6.1 Forecasting

191

Data The parameters of the 3-bus system are presented in Fig. 6.1. The dataset is generated using Matpower with the following assumptions: (1) the load vector is evenly distributed in the load space; and (2) the transmission limits F is timevarying: for simplicity, we utilize the following model to calculate the real-time transmission limits F : F = (1 + ξ )F0 .

(6.16)

.

F0 is the “standard” transmission limits and .F0 = [60; 60; 80]. It is the same as the case of static line ratings. .ξ ∼ N(0, 0.1) represents the major factor (e.g., ambient temperature or wind speed) that impacts the transmission capacities. All the data generated is visualized in Fig. 6.4.

.

Simulation Results Table 6.2 summarizes the classification and LMP forecast accuracies. The accuracies are around 95% because of the overlapping SPRs.

Posterior Probabilities The posterior probabilities are visualized. The posterior probabilities of an SPR are composed of a surface (Fig. 6.10a and b). When putting the five surfaces of five SPRs together (shown in Fig. 6.11), the five surfaces intersect with each other and formulate some “mountains” and “valleys.” The “mountains” correspond to the inner parts of SPRs, where the overlapping of SPRs is almost impossible to happen. And the “valleys” always locate at the boundaries among SPRs. Table 6.2 Results of 3-bus system (fivefold validation)

Fold 1 2 3 4 5 Avg

Classification 93.967% 93.236% 94.150% 95.612% 94.150% 94.23%

LMP forecast 96.218% 96.054% 95.767% 96.700% 96.405% 96.23%

192

1 Probability

Fig. 6.10 Posterior probabilities of two SPRs. (a) SPR#3: LMP = (50, 50, 50) (b) SPR#4: LMP = (20, 50, 35)

6 Forecast for the Future

0.9 0.8

0.5

0.7 0.6

0 300

0.5 0.4

200

0.3

100

0.2

0 -100 PD2

0.1 150

0

50 P

100

-100

-50

(a)

0.8 0.6 Probability

1

2.0 0.4

0.5 0

0 -200

-100

0

100

200

0.2

PD3 300 -200

PD2

(b) Fig. 6.11 Posterior probability surfaces 0.8

Probability

0.6 1 200 0.4

0.5

0 -200

0

0.2

PD3

0

0 200 PD2

400 -200

6.1 Forecasting

193

118-bus System A more comprehensive case study is conducted on the 118-bus system to evaluate the performance and computational burden of the data-driven approach on a complex system with realistic settings.

System Configuration The only difference of this feeder from the system configuration in Sect. 6.1.5.2 is transmission limits. To consider DLR, we use the same model as Eq. (6.16). .F0 is the same as the transmission limits in [337] and .ξ ∼ N(0, 0.1).

Performance The algorithm is implemented using the Statistics and Machine Learning Toolbox of Matlab. Table 6.3 summarizes the computation time of each step in the data-driven approach on a PC with Intel i7-2600 8-core [email protected] GHz and 16 GB RAM 2 memory. There are 181 SPRs found in 1152 points for training; .C181 = 16,290 SVM classifiers are trained in 58.72 seconds. On average, one SVM classifier is trained within .0.004 seconds. This is because most of the SPRs are completely separable, as these cases will be solved in an extremely short time. Those adjacent SPRs are overlapping and are the major source of the computational burden (Table 6.4). Table 6.3 Average computation time (in seconds)

Table 6.4 Results of the 118-bus system (dynamic line rating)

Steps (a) training (b) predicting (288 points) (c) data post-processing

Fold 1 2 3 4 5 Avg

Computation time (s) 58.73 26.8504 701.22

Classification 61.11% 59.38% 60.76% 51.39% 55.90% 57.71%

LMP forecast 95.11% 94.53% 95.24% 93.34% 94.22% 94.49%

194

6.1.5.4

6 Forecast for the Future

Case Studies with Ramp Constraints

Settings The parameters of the 118-bus system are the same as in Sect. 6.1.5.2. And the ramp capacities of generators follow the simplified assumption below: each generator could ramp up (down) to its generation limits in 15 minutes. For example, a generator with .G+ = 200 MW and .G− = 125 MW, its ramp capacity is .R + = R − = (200 − 125)/15 = 5 MW/minute. This setting is called .R0 in Table 6.5. Due to the temporal coupling of SCED with ramp constraints, a daily load curve is necessary. The settings of loads are the same as in Sect. 6.1.5.2. 1440 SCED problems are solved consecutively with MATPOWER, and 1440 load vectors and LMP vectors are recorded.

Simulation Results The classification and LMP forecast accuracies are summarized in Table 6.5. With the ramp settings above, the classification and LMP forecast are satisfying. However, different ramp settings would change the results dramatically. As shown in Table 6.5, when generators ramp up/down two times faster (.R/R0 = 2), the ramp constraints would rarely be active; then it is the same as static SCED; when generators ramp up/down two times slower (.R/R0 = 0.5), the actual generation upper/lower bounds are determined by previous dispatch results and ramp constraints. Generation limits become time-varying, and the SPRs overlap. This explains the unsatisfying results when the system lacks ramp capacities. Furthermore, varying SPRs could also explain the price spikes during ramping up hours in the morning and ramping down hours in the early evening.

6.1.6 The Impact of Nodal Load Information Note that one possible contribution of this section is to consider the LMP changes due to nodal load variations. This section is dedicated to a detailed discussion about the impact of nodal load information on the understanding of LMP changes. We first demonstrate the benefits of having nodal load information in Sect. 6.1.6.1; then Sect. 6.1.6.2 illustrates the effects of incomplete load information and the attempts to solve the issue. Table 6.5 Results on 118-bus system with different ramp settings .R/R0

LMP forecast

0.5 44.57%

1 85.10%

2 96.33%

6.1 Forecasting

195

To concentrate on the effects that incomplete load information brings, we make the following assumptions: (1) transmission limits are constant, and no dynamic line ratings are being considered; (2) ramp constraints are not considered.

6.1.6.1

On Nodal Load Levels

Previous literature such as [320] studied the impact of system load levels on the LMPs. An important concept, “critical load level” is defined as the system load level where the step changes of LMPs happen. Many LMP forecast methods were proposed based on identifying CLLs. But the definition of CLL assumes that the nodal load levels of all the buses change proportionally. This assumption constrains the load vectors in the load space to be on a straight line, and the CLLs are indeed the intersection points of the straight line with the boundaries of SPRs. We would like to point out that one possible contribution of this section is to consider the LMP changes due to nodal-level load variations. Contrary to CLL-based methods, which solve a one-dimensional problem, the proposed SVMbased method could explore all the dimensions of the load space and is indeed a generalization of the CLL-based method. Consider the SPR identification problem with only one feature vector: the total demand of the system. Figure 6.12 illustrates the problem formulation. Since only the total demand .PD = PD1 + PD2 is available, the load vectors in the original SPRs are projected to the axis of total demand. Because this is a one-dimensional problem, the SVM classifier degenerates to the case that there is only one decision variable b; the direction of the separating hyperplane w is represented by the positivity of b. The objective becomes finding the optimal value b, which has the least overlapping points of different LMPs. .

min b,s

s.t

Fig. 6.12 Identifying critical load levels



s (i) , .

(6.17a)

(i)

(6.17b)

i

y (i) (PD − b) ≥ 1 − s (i) ,

196

6 Forecast for the Future

Table 6.6 Comparison of CLL and SVM (118-bus system)

LMP forecast Overall Price .> 45 $/MWh Worst forecast (bus no.)

CLL 94.82% 88.86% 73.92% (23)

SVM 95.95% 96.32% 88.17% (23)

Table 6.7 Results of the 3-bus system LSE 1 2 3 CLL

LMP@Bus 1 86.08% 70.69% 87.91% 69.48%

LMP@Bus 2 97.45% 96.13% 98.65% 97.15%

Fig. 6.13 Nodal LMP forecast accuracy

LMP@Bus 3 88.53% 89.31% 93.53% 89.24%

Overall 90.69% 85.38% 93.53% 85.29%

LMP Forecast Accuracy (%)

100

90

80 Load at Each Bus System Load Only 70

0

50 100 Bus Index (1~118)

150

s (i) ≥ 0, y (i) ∈ {−1, 1}. We compare this CLL-based method and the SVM-based method on the 3-bus system and 118-bus system. Results are demonstrated in Tables 6.7 and 6.6 and Fig. 6.13. The performance of both methods is close to the nodal LMP forecast of many buses, but the CLL-based method failed to provide a correct forecast of some specific buses (e.g., bus 23 in Fig. 6.13), while the SVM-based method provides much better results. The SVM-based method is also better at forecasting high prices.

6.1.6.2

Incomplete Load Information

In practice, LSEs or other market participants may not have complete information about load levels at all buses. We investigate the performance of the algorithm by assuming LSEs have access to only (1) the total system-level load and (2) the nodal load levels in its own area.

6.1 Forecasting

197

Fig. 6.14 3-bus system with three loads

To better illustrate the problem formulation, we add a load .PD1 at bus 1 to the 3bus system in Fig. 6.1.6 A modified system is shown in Fig. 6.14. Assume there are three LSEs in the system. LSE #i at bus i has access  to the following information: (1) load at bus i, .PDi , and (2) system-level load, . 3i=1 PDi . With incomplete load information, the SPR identification problem becomes more difficult. For example, LSE 2 observes two SPRs that almost completely overlap with each other (blue and red in Fig. 6.15a). Since the one-to-one mapping of LMP vectors and SPRs is not affected by the incomplete load information, this is still a classification problem. The data-driven approach can still be applied, but the feature vectors are the system load and a subset of nodal load levels, instead of load levels at every bus in Sect. 6.1.4. Simulation results are summarized in Table 6.7. The results indicate that classification accuracy decreases to around 50%, while the LMP forecast accuracy is still satisfactory. This could be explained by the following observations: (1) Fig. 6.15a and b is obtained by projecting the 3D SPRs to a lower dimension space. Since the projection is a linear transformation, although the SPRs are overlapping, their boundaries remain linear; (2) the LSEs may care more about their own LMPs. For example, Fig. 6.15a could be re-colored by the LMPs at bus 2 (Fig. 6.16a). Since there are only two possibilities of LMPs at bus 2 (20 and 50), there are only twocolored regions in Fig. 6.16a. Even with the relatively low accuracy of the overall classification, the forecast of LMPs at bus 2 is still accurate. When forecasting a subset of nodal LMPs, they become the major concern, as it might be more computationally efficient to formulate the problem in a way as Fig. 6.16a shows. The number of classes decreases significantly, and so does the

6 If there are still two loads in the system, knowing system-level load .P D2 + PD3 and .PD2 is equivalent to knowing .PD2 and .PD3 .

198

6 Forecast for the Future

Fig. 6.15 LSEs. (a) LSE 2. (b) LSE 3

250

LMP = [20;20;20] LMP = [50;50;50] LMP = [20;50;80] LMP = [20;50;35]

200

PD

2

150

100

50

0

0

50

100 150 Total Load

200

250

200

250

(a) 120

LMP = [20;20;20] LMP = [50;50;50] LMP = [20;50;80] LMP = [20;50;35]

100

PD

3

80 60 40 20 0

0

50

100 150 Total Load

(b)

computational burden. But the new colored regions might be the union of SPRs. Though the colored regions in Fig. 6.16a are convex, the union of convex sets is usually non-convex. Because of this, the SVM with a linear kernel may not be the best choice. Choosing the best classifier would depend upon the feature of the regions and will be part of the future work. Similar to the case of DLRs or ramp constraints, overlapping SPRs implies uncertainties, and the posterior probabilities are necessary. In Fig. 6.17, the posterior probabilities for LSE #2 and #3 are visualized, respectively. Because of the relatively small resistances of transmission lines, the loss components of LMPs are usually small compared to the other two components.

6.1 Forecasting

199

Fig. 6.16 LMP at bus 2. (a) and system load. (b) .PD3 and system load

250

.PD2

LMP=[20] LMP=[50]

200

PD

2

150

100

50

0 0

50

100

150

200

250

200

250

Total Load

(a) 120

LMP = 20 LMP = 50

100

PD3

80 60 40 20 0

0

50

100

150

Total Load

(b)

Geometrically speaking, each LMP vector is a point in the LMP space, and the LMPs of the same SPR form a cluster. The center of the cluster contains the energy, congestion component, and the average loss component, and the deviations from the center represent varying loss components due to different line flows. We could run a clustering algorithm (e.g., K-means) on the LMP data in order to find out the centers of those clusters. Then the LMP vectors of the same cluster are regarded as the LMPs of the same SPR. By doing so, the SPR identification problem is modeled as a classification problem. The LMP forecast is the forecast of energy components, congestion components, and the average loss components.

200

6 Forecast for the Future

Fig. 6.17 Posterior probability surfaces. (a) LSE 2. (b) LSE 3 Probability

1 0.8 0.5 0.6 0 400

0.4 0.2

200 PD2

0 0

200

100

300

System Load

(a)

Probability

1 0.8 0.5 0.6

0 150

0.4 100 0.2 50 PD3

0 0

100

200

30

System Load

(b)

6.1.7 Discussions 6.1.7.1

On Posterior Probabilities

When dealing with uncertainties, it is natural to analyze the data in a probabilistic manner. The calculation of posterior probabilities is essential and provides the quantification of possible risks. We only propose the method to calculate posterior probabilities in this section, but quantification of the posterior probabilities could yield many interesting applications. For example, LSEs could consider demand response mechanisms to partially change the load vector and thus shift from high-price SPRs. Market participants could also estimate the price volatilities due

6.1 Forecasting

201

to renewables in a system. Further discussions on how to utilize the posterior probabilities for specific applications are our future work.

6.1.7.2

On the Computational Cost

The theoretical analysis reveals that the load space could be partitioned into many SPRs. This overall structure of the load space could help solve the SCED problem and shift part of the online computational burden to offline [340], while the total number of SPRs could help evaluate computational burden to some extent. With MPT 3.0, the exact number of SPRs of some IEEE benchmark systems is calculated. Though the total number of SPRs is finite,7 it grows extremely fast with the scale of the system. However, with the Monte Carlo simulation, we found much fewer SPRs than the theoretical results. Zhou et al. [324] points out that because of the regular patterns of loads, only some subsets of the complete theoretical load space could be achieved, thus helpful in practice. Therefore, only a small subset of the SPRs is meaningful to be analyzed, which suggests the great potential of reducing the computational burden. The proposed approach is also parallel computation-friendly, which could be very useful when dealing with largescale simulations (Table 6.8).

6.1.7.3

On Generation Offer Prices

The marginal costs of generators fluctuate due to many factors such as oil prices. This leads to the changes of generation offer prices c in the SCED formulation. Intuitively, the SPRs would change with respect to large offer price variations. Equation (6.7c) in Lemma 6.1 quantifies the variation of offer prices: for a system pattern .π = (B, N), the corresponding SPR .Sπ would remain the same as long as the generation cost vector c satisfies Eq. (6.7c). An illustrative example is provided below. Suppose a diesel turbine is added at bus 3 in Fig. 6.1, and the offer price of the diesel turbine varies due to the fluctuations Table 6.8 Number of SPRs of some benchmark systems

7A

System info 3 bus system (Fig. 6.1) IEEE 6 bus system IEEE 9 bus system IEEE 14 bus system IEEE 24 bus system IEEE 118 bus system

n −1

loose upper bound is .2ng −1 × Cngg+nl .

MPT 3.0 5 20 15 1470 6 .∼10 –

Simulation (8640 points) 4 7 7 50 445 971

202

6 Forecast for the Future

Fig. 6.18 System pattern regions with different generation offer prices. (a) .c = (20, 50, 65). (b) .c = (20, 50, 100)

of oil prices. Figure 6.18a shows the SPRs when the offer price of the new generator is 65; when the offer price increases from 65 to 100, three SPRs are different while the others remain the same.8 This shows that the SPRs have some extent of robustness to the varying generation offer prices.

8 More

specifically, we can calculate the condition from Eq. (6.7c): if the offer price of the new generator satisfies .c3 < 2c2 − c1 = 80, then the SPRs in Fig. 6.18a would remain the same.

6.2 Price Prediction

6.1.7.4

203

LMPs with Loss Components

Since the line losses are not explicitly modeled in the SCED formulation, all the theoretical analysis is conducted on the lossless LMP vectors. The LMP forecast discussed above is the forecast of the energy components and congestion components. The proposed method could be applied directly on the markets not considering line losses (e.g., ERCOT) and the electricity markets providing the energy component, congestion component, and loss component separately (e.g., MISO). There are many possible methods to forecast the loss components, but will not be addressed in this book. There are economic dispatch models with line losses explicitly modeled (e.g., [341]), a similar analysis using MLP theory could be conducted, but it is beyond the scope of this section.

6.1.8 Conclusions In this section, we examine the fundamental coupling between nodal load levels and LMPs in real-time SCED. It is shown that the load space can be partitioned into convex system pattern regions, which are mapped one-to-one with distinct LMP vectors. Based on the theoretical results, we propose a data-driven learning algorithm for market participants to identify SPRs. Identifying SPRs is modeled as a classification problem, and the proposed data-driven approach is built upon a “onevs-one” multi-class SVM classifier. The proposed algorithm is shown to be capable of estimating SPRs solely from historical data without knowing confidential system information such as network topology and bidding curves. The approach is shown to be extensible toward considering dynamic line ratings, line losses, and partial load information. Simulation results based on the IEEE 118-bus system demonstrate that the proposed algorithm is effective in understanding the past and predicting the future. This section is a first step toward developing theoretically rigorous and computationally feasible algorithms to analyze market prices as a result of varying loading levels. Future work should investigate (1) the system pattern regions with different unit commitment results and system topologies and (2) the impacts of multi-interval temporal constraints on the system pattern regions. Another important avenue of research is to develop an efficient learning algorithm to process a large amount of historical data in near real-time market operations.

6.2 Price Prediction With the growing penetration of renewable energy into the electricity market, it has resulted in a significant change in the electricity market price over the years.

204

6 Forecast for the Future

Therefore, this section explains how to estimate the system pattern regions and investigates price forecasting. This change makes the existing forecasting method prone to error, decreasing the economic benefits. Hence, more precise forecasting methods need to be developed. We start with a survey and benchmark of existing machine learning approaches for forecasting the real-time market (RTM) price. While these methods provide sufficient modeling capabilities via supervised learning, their accuracy is still limited due to the single data source, e.g., historical price information only. In this section, a novel two-stage supervised learning approach is proposed by diversifying the data sources such as highly correlated power data. This idea is inspired by the recent load forecasting methods that perform well. Specifically, the proposed two-stage method, namely, the rerouted method, learns two types of mapping rules. The first stage maps between the historical wind power and the historical price. The second stage uses forecasting rule for wind generation. Based on the two rules, we forecast the price via the forecasted generation and the first learned mapping between power and price. Additionally, we observed that more training is not always better, leading to our validation steps quantifying the best training intervals for different datasets. We conduct comparisons of numerical results between existing methods and the proposed methods based on datasets from the Electric Reliability Council of Texas (ERCOT). For each machine learning step, we examine different learning methods, such as polynomial regression, support vector regression, neural network, and deep neural network. The results show that the proposed method is significantly better than existing approaches when renewables are involved.

6.2.1 Introduction Since the industrial revolution, energy has become a key factor in everyday life [342]. Fossil fuels have become the primary means to generate energy production in the world [342]. However, with the population growth and technological developments, the current world is facing two vital problems, environmental pollution and energy resource shortages [343]. One way to overcome these problems is to improve efficiency and reduce emission [344]. The other way is to develop alternate energy resources [343]. People draw their eyes to renewable resources for their properties of environmental-friendly and sustainability. The most competitive renewables include water, wind, photovoltaic energy, and biofuel. Many of them have proved to be advanced in addressing environmental and energy issues [345, 346], and some of these renewables have been applied to the electricity market. In the last few years, electricity market prices significantly decreased due to the close-to-zero marginal costs from renewable energies [347]. Therefore, electricity market participants are seeking for strategies to be more competitive in the market. Many companies have adopted new electricity price plans [348], for example, timeof-use electricity price plans. These plans charge higher rates when demand is high and lower rates when demand is low. This encourages customers to wisely decide

6.2 Price Prediction

205

their electricity usage and reduce on-peak energy usage [349]. This situation makes not only the producers but also the customers pursue more precise forecasts of the electricity market prices than ever. However, electricity price usually has complex features, such as highly volatile behavior and nonlinearity, which makes it difficult to build a precise forecasting model [350–352]. In general, the electricity market price forecast has two classes of computing techniques. One is the so-called hard computing techniques [353], which can accurately predict the electricity prices if we know the exact model of the system. Time series models [232] and autoregressive integrated moving average (ARIMA) models [233] are two typical models. However, electricity prices are influenced by many factors, such as the volatile prices of generation resources, seasonal weather risks, the uncertain behavior of competitors in the market, and so on [354]. These elements make it difficult to build an accurate model of the system. Besides, the solutions of hard computing techniques are solved according to physical regularity, which needs high computation costs. Different from “hard computing techniques,” “soft computing techniques” are proposed without needing to build the models of the systems [353]. This type of technique learns the mapping between the input and the output data, which needs less information and has a higher computation efficiency [353]. Hence, we employ “soft computing techniques,” such as forecasting future realtime market (RTM) bus prices using the historical bus price with different machine learning methods. However, this direct method from price to price has a relatively poor performance. No matter which learning method we use or whether we apply the validation step for hyper-parameters like training or testing data size, this results in the model only considering. In order to improve this method, we add another important data type, namely, wind power generation, which directly impacts price variation. Additionally, we also redesign the forecasting model by leveraging the fact that wind generation forecasting has a high accuracy, e.g., mean absolute percentage error is less than 5% [355–357]. Specifically, the proposed method learns two types of mapping rules. The first rule is the mapping between the historical wind power generation and the historical price. The second is the forecasting rule for wind power generation. Based on the two rules, we forecast the price via the forecasted generation and the first learned mapping rule between power generation and price. We name the proposed method, the “rerouted” or “two-stage” method. As a highlight, we examine the advantages and disadvantages of each machine learning method for both the direct (price-to-price) and the rerouted (two-stage) method, so that we can select the best method with the best hyper-parameters for the benchmark. Specifically, we choose machine learning methods that are widely used in real-world applications [358, 359], e.g., polynomial regression, support vector regression (SVR),, neural network (NN), and deep neural network (DNN). For numerical validation, we use RTM bus price data and system-wide wind power generation data from the Electric Reliability Council of Texas (ERCOT). RTM bus price is the simple average of the time-weighted hub bus prices for each settlement interval in real-time, for each hub bus included in this hub. We preprocessed and removed some extreme data to make all the data in the normal

206

6 Forecast for the Future

range. We selected the wind power generation of the entire system. Simulation results show that direct forecasting (price-to-price method) obtains its best testing accuracy when we employ polynomial regression. The rerouted method (two-stage method) obtains its best testing accuracy when we adopt deep learning. In general, the results show that the proposed method is significantly better than the direct forecasting (price-to-price method) when renewables are involved. Current research work indicates that we may obtain higher forecasting accuracy if we consider additional highly correlated data sources such as solar energy and biofuels. The NN and DNN used in this work are basic networks. Future research can explore more of the network structure. The rest of the section is organized as follows: Sect. 6.2.2 formulates the forecasting problem. Section 6.2.3 describes the machine learning methods we used. Section 6.2.4 describes the simulation setup and the numerical results. Section 6.2.5 concludes the result.

6.2.2 Problem Formulation In this section, we explain the direct method (price-to-price method) and the rerouted method (two-stage method) in detail using diagrams and mathematical formulas. To ensure an objective assessment of all the methods in this section, we use the same dataset to test different approaches and models. The ideas are shown in Fig. 6.19.

6.2.2.1

Direct Method (Price-to-Price Method)

The problem is defined as forecast the real-time market (RTM) bus price for the following month using the historical RTM bus price. Specifically, we first preprocessed and removed some extreme data so that all the data is in the normal range. Then, we let M be the size of the input data and N be the size of the output data, and the size of M and N is adjusted to find the best pairs that obtain the highest testing accuracy. The parameters are formulated as follows: • Input: The input matrix is the RTM bus price from January 2016: .X : M × 1. • Output: The output is the predicted RTM bus price for February 2016 .Xfuture : N × 1, which is given by Eq. (6.18): Xfuture = g(X),

.

(6.18)

where .Xfuture is the prediction of future RTM bus price. .g(·) is the method chosen for forecasting.

6.2 Price Prediction

207

Fig. 6.19 The blue pictures represent the direct method (price-to-price method). We use function g to learn the mapping between .X (Historical RTM Bus Price) and .Xfuture (Forecasted RTM Bus Price). The red arrows and pictures form the rerouted method (two-stage method). The rerouted method (two-stage method) contains three steps. Step one is to use function .f1 to learn the mapping between X and .Y (Historical Wind Power Generation). Step two is to use function .f2 to learn the mapping between Y and .Yfuture (Forecasted Wind Power Generation). Step three is to predict .Xfuture using .Yfuture and the function .f1 learned before

In order to get .g(·), we use historical data .(X, Xfuture ) to learn the mapping. By adjusting the sizes of the historical data, we can determine the best mapping .g(·) ˆ of the different methods presented in this section.

6.2.2.2

Rerouted Method (Two-Stage Method)

The problem is defined as forecast the RTM bus price for the following month using the historical RTM bus price and the system-wide wind power generation. Specifically, the rerouted method contains three steps. The parameters are formulated as follows: • Input 1: The input 1 matrix is RTM bus price from January 2016: X. • Input 2: The input 2 matrix is system-wide wind power generation from January 2016: Y . 1. Step 1: We use historical data (Y, X) to learn the mapping function f1 (·) between the historical system-wide wind power generation and the RTM bus price.

208

6 Forecast for the Future

2. Step 2: Let Yfuture be the prediction of future system-wide wind power generation. We use historical data (Y, Yfuture ) to learn the mapping function f2 (·) between the historical system-wide wind power generation and the future system-wide wind power generation. • Output Xfuture : (Step 3) We use the predicted wind power generation Yfuture and the mapping function f1 (·) learned in Step 1 to predict the future price Xfuture . The output is given by Eq. (6.19): Xfuture = f1 (Yfuture ),

.

(6.19)

where Xfuture is the prediction of RTM bus price for February 2016 and f1 (·) is the method chosen for forecasting. Figure 6.20 shows the flow chart of the rerouted method summarizing all the processes.

6.2.3 Machine Learning Methods In this section, we explain existing and popular machine learning methods for the proposed learning process as mentioned in Sect. 6.2.2.

Fig. 6.20 The flow chart of the rerouted method

6.2 Price Prediction

6.2.3.1

209

Overview of Methods

Polynomial Regression In general, the polynomial regression model is given by Eq. (6.20): yi = β0 + β1 xi + β2 xi2 + · · · + βm xim + i , i = 1, 2, . . . , n.

.

(6.20)

It can also be written as Eq. (6.21) y = Xβ + ,

.

(6.21)

where X is a design matrix, .y is a target vector, .β is a coefficient vector, and . is a vector of random errors. The vector of the estimated polynomial regression coefficient can be calculated using Eq. (6.22): β = (XT X)−1 XT y, m < n.

.

(6.22)

Support Vector Regression (SVR) SVR is a regression analysis of the data when we do the fitting [360]. It uses the idea of support vectors and the Lagrange multiplier. SVR constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space by minimizing the margin on all the training data [361]. The support vector regression is obtained in Eq. (6.23): min .

s.t.

F (w) =

1

w 2 2

|yi − (w T xi + b)| ≤ , i = 1, 2, . . . , n,

where .xi is a training sample with target value .yi . .w T xi + b is the prediction for that sample, and . is a free parameter that serves as a threshold. For example, all predictions have to be within an . range of the true predictions. The mapping of SVR to higher dimensions results in a series of problems. Problems include obtaining the correct mapping form and computing the coordinates of the data in that space. Hence, kernel methods are introduced to solve the problem. A kernel function can compute the dot product between the two mapping transforms in the feature space without knowing the mapping transform function itself. Assume .Xi , Xj ∈ Rn , and a nonlinear function . implements the mapping from input space X to feature space F , where .F ⊆ Rm , n m. Refer to the kernel method and we have

210

6 Forecast for the Future

K(Xi , Xj ) = (Xi ) · (Xj ),

.

(6.23)

where .K(Xi , Xj ) is the kernel function. Commonly used kernel functions include linear kernel, polynomial kernel, and Gaussian kernel, also known as radial basis function (RBF) kernel.

Neural Network (NN) NNs are highly interconnected computing systems inspired by modern biology [362]. NNs are built from a number of processing units, also called neurons. Each neuron is a weighted sum of the inputs formed by a linear function with a biased term [363]. The sum is then passed through a transfer function, also called an activation function, which is often a unit step, sigmoid and Gaussian [363]. Neurons can be grouped into layers. Typically, the first layer and the last layer of a basic NN are called the input layer and the output layer, respectively. The layers between the input and output layers are known as the hidden layers. NNs can be represented in Eq. (6.24) (0)

xi,j = g(hi,j ), hi,j = wi,j +



(k)

wi,j xk,j −1 ,

k

.

i = 1, 2, . . . , n, j = 1, 2, . . . , m, k = 1, 2, . . . , t, where .xi,j is the input for the current layer, .xk,j −1 is the input for the last layer, (k) (0) .w i,j is the weights of the kth neuron, .wi,j is the biased term, and g is the transfer function. The transfer function is introduced to increase the nonlinearity. Many experiments were conducted on different activation functions, and it was found that the sigmoid function can achieve the highest accuracy. A diagram illustrating the structure of a basic NN is shown in Fig. 6.21. Backpropagation (BP) is a method used to calculate the gradient of the loss function (produces the cost associated with a given state) with respect to the weights in an artificial neural network (ANN) [364]. Backpropagation neural networks (BPNNs) have the ability to implement any complex nonlinear mapping from input to output, to learn by themselves, and to adapt to changes [365]. Furthermore, BPNNs have a generalization ability and error tolerance. Meanwhile, the main shortcoming of BPNNs is the local minimization problem. With different initializations of the weights, a BPNN will converge to different local minimums. So, every time we train, we get a different result.

Deep Learning Deep learning is a class of machine learning algorithms that use multiple layers of nonlinear processing units for feature extraction and transformation [366]. Each

6.2 Price Prediction

211

Fig. 6.21 Neural network diagram

successive layer uses the output from the previous layer as input. Most deep learning models nowadays are based on ANNs [367]. It is a time-consuming process to train a model, and the validation is very complex and troublesome. However, a welltrained deep learning model can be easily applied to other problems by doing some simple refinements.

Methods Comparison In the polynomial regression, all the features are determined by our research group, which may contain useless features. Hence, NNs and DNNs are designed for not needing to decide how to construct the features. We directly input the raw data into the model, and if a high accuracy is achieved, the model is useful. However, NNs and DNNs involve the random initialization of weights. So, training on the same data may give different results. Besides, considerable parameters are set concerning the architecture of the ANNs as well as the learning algorithms. The optimizations of these parameters can only be carried out through a trial-and-error process which consumes much time and resources [368]. The training of an SVR is a convex quadratic optimization, which has one unique solution, and it does not involve the random initialization of weights like NNs and DNNs [369]. Any SVR with the same

212

6 Forecast for the Future

parameter settings trained on identical data will give the same results. This greatly reduces the number of trainings required to find the optimum.

6.2.3.2

Performance Evaluation Metric

The performances of all the methods are determined by the mean squared errors (MSEs). Let K be the size of the output. The computational formula is defined as follows: 1  (Yt,1 − yt,1 )2 , N t=1

1 t (yˆ1,t + yˆ2,t−1 + · · · + yˆt,1 ), N

MSE = .

t,1 = Y

1 K (yˆt−K+1,K

(6.24) if t ≤ K − 1

+ yˆt−K+2,K−1 + · · · + yˆt,1 ), if t > K − 1

t,1 is the forecasted price at hour t, .yt,1 is the real price at hour t, and N is where .Y the number of the total hours.

6.2.4 Numerical Results 6.2.4.1

Data Preparation

The Electric Reliability Council of Texas (ERCOT) is an independent system operator managing about .90% of the state’s electric load. ERCOT made significantly large investments in the renewable energy sector, particularly in wind energy, and continues to be a top leader in wind production in the nation [370]. ERCOT has an adequate market, and grid information can be easily accessed and downloaded from its website [371]. If a specific range of data is needed, but is not available on the website, contact ERCOT by submitting an information request form [372]. ERCOT is eager to help and responds quickly. The raw data we obtained from ERCOT were excel files containing the information of all districts. The data needed was extracted so that we could build the vectors of RTM price and system-wide wind power generation, which are measured hourly. To ensure the RTM price data is in the normal range, let .μ be the mean of the data and T be the threshold; the normal range is defined as .μ ± T in our specific problem.

6.2.4.2

Benchmark

For the rerouted method (two-stage method), the following simulations using electricity price data and the data of system-wide wind power generation were

6.2 Price Prediction

213 January

Averaged RTM Price/$

100 80 60 40 20 0 -20

0

2000

4000

6000

8000

10000

12000

14000

Wind Power Generation/MW

Fig. 6.22 Initial data distribution

conducted. We used a .744 × 1 vector of system-wide wind power generation as the input data and a .744×1 vector of real-time market (RTM) bus price as the target data for training. The data came from the same January 2016 time slot, which contains .31 (days) × 24 (hours) = 744 data points. The training data can be visualized in Fig. 6.22, the x-axis is the system-wide wind power generation from January 2016, and the y-axis is the RTM bus price from January 2016. For the direct method (price-to-price method), let M be the input data size and N be the output data size. By adjusting these two hyper-parameters, we can find the best M and N that make the mapping between the prices reach the highest accuracy. Using this method, M is chosen from .2, 3, 6, 12, 24, . . . , 384; and N is chosen from .1, 2, 3, 6. Starting from 2 for M, we only use one historical data to predict one or more data, as the uncertainty is so huge that high accuracy is hard to achieve. For other numbers, we let the latter number be twice the former to study the tendency of the testing accuracy. General results show that the rerouted method can guarantee better accuracies when compared to the direct method based on all the machine learning methods we tested. This statement can be confirmed by Table 6.9, where the results of both methods are compared. As noted in Table 6.9, the rerouted method gains its highest accuracy when 14-layer DNN is used. To ensure consistency, the direct method also employs the same machine learning methods. And the results show that it obtains its highest frequency when using polynomial regression. The detailed results and comparisons are listed in the following subsections.

Polynomial Regression For the rerouted method (two-stage method), we use the system-wide wind power generation from January 2016 as the training data and that from February 2016 as the testing data. We vary the degrees of the polynomial model from 1 to 4 and pick a degree of 3 as our example. Figure 6.23 represents the training and testing regression curves along with the error histograms. The training and testing regression curves along with the error histograms of other polynomial models are shown in [373].

214

6 Forecast for the Future

Table 6.9 MSE comparison (a) Rerouted method 1 degree polynomial Method regression Training 36.9341 Testing 22.3945

2 degree polynomial regression 36.5456 23.0906

3 degree polynomial regression 36.0694 22.6684

4 degree polynomial regression 35.9630 22.2578

NN with 60 hidden neurons 33.4186 26.23

Linear kernel SVR

Training Testing

NN with 30 hidden neurons 36.2109 24.6695

38.7031 25.6771

Polynomial kernel SVR 37.9564 24.0645

Method Training Testing

Gaussian kernel SVR 38.1197 23.7623

7 layers DNN 34.9439 22.7658

11 layers DNN 36.8437 23.2444

14 layers DNN 36.1379 20.5065

NN with 30 hidden neurons 6 3 28.3315 27.3736

NN with 60 hidden neurons 96 3 23.4281 30.0632

Linear kernel SVR 192 1 25.5157 29.8577

Gaussian kernel SVR 96 1 35.6567 37.1421

11 layers DNN 3 2 23.3813 32.3289

14 layers DNN 3 2 43.3681 32.1427

Method

(b) Direct method Polynomial Method regression M 12 N 3 Training 40.6163 Testing 27.1534 Method M N Training Testing

Polynomial kernel SVR 192 1 25.5097 29.865

Best values are given in Bold

For the direct method (price-to-price method), we fix the predicting data size N for each time and adjust the historical data size M to obtain the best testing results. The process of determining the size of the data can be illustrated in Fig. 6.24. As shown in Fig. 6.24, the training mean squared error (MSE) fluctuates around the minimum value when we increase the historical data size M, while the testing MSE becomes extremely large. This reveals the overfitting problem when we perform the training of the dataset. The highest testing accuracy of each figure is determined by the gap between the training MSE and the testing MSE. We select the results with the smallest gaps and merge them into Table 6.10b, so that they be compared with the rerouted method. The MSEs for both methods are shown in Table 6.10. As we can see from Table 6.10a, when the order of the polynomial regression increases, the testing accuracy does not change much, which means that the data

6.2 Price Prediction

215

Fig. 6.23 Rerouted method: degree of 3 polynomial regression (a) Training result. (b) Testing result

has great linearity. Combined with Table 6.10b, the rerouted method has a smaller testing MSE than any of the direct methods, no matter how M and N are resized. In this case, when the rerouted method and polynomial regression with degree 1 or 4 is used, we can obtain good testing performance. Note from Fig. 6.23, the polynomial regression with a reasonable degree has a poor behavior detecting outliers. Hence, we employ support vector regression (SVR) for the next step to determine if it can reach a higher accuracy when we map the data into higher dimensions.

216

6 Forecast for the Future

Table 6.10 The MSEs of the polynomial regression (a) Rerouted method Degree MSE 1 Training 36.9341 Testing 22.3945

2 36.5456 23.0906

3 36.0694 22.6684

4 35.9630 22.2578

(b) Direct method N M Training Testing

2 96 31.8074 28.5926

3 12 40.6163 27.1534

6 24 54.1 30.3579

1 48 24.9384 35.2281

Best values are given in Bold Suppoert vector regression

MSE

60

train test

40 20

0

50

100

150

200

250

300

350

400

MSE

60

train test

40 20

0

50

100

150

200

250

300

350

400

MSE

60

train test

40 20

0

50

100

150

200

250

300

350

400

historical data size

Fig. 6.24 Direct method: data size determination

Support Vector Regression (SVR) For both methods, we use the command “fitrsvm” in MATLAB and choose three different kernel functions, linear kernel, polynomial kernel, and Gaussian kernel, to do SVR. For the direct method (price-to-price method), the predicting data size N is fixed to be 1, and the historical data size M is adjustable. The procedure is displayed in Fig. 6.24. The determination of the best testing accuracy obeys the same rule stated in Sect. 6.2.3. To better compare the rerouted method, the highest testing accuracy of each figure is merged into Table 6.11b. We do not display the detailed regression curve here to save space; however, the detailed results can be found in [373]. The MSEs for both methods of SVR with three different kernels are shown in Table 6.11. It clearly shows in Table 6.11 that when the rerouted method is adopted, all the SVR with different kernels have similar training and testing MSEs and the testing MSE of the SVR with Gaussian kernel is slightly better than all the others. The

6.2 Price Prediction

217

Table 6.11 The MSEs of SVR (a) Rerouted method (two-stage method) Kernel MSE Linear Training 38.7031 Testing 25.6771

Polynomial 37.9564 24.0645

Gaussian 38.1197 23.7623

(b) Direct method (price-to-price method) Kernel Linear M 192 Training MSE 25.5157 Testing MSE 29.8577

Polynomial 192 25.5097 29.8650

Gaussian 96 35.6567 37.1421

Best values are given in Bold

rerouted method has smaller testing MSEs than any of the direct methods. In this case, when the rerouted method and SVR with Gaussian kernel is used, the best testing performance is obtained. Compared to Table 6.10, the testing MSEs become worse indicating that SVR ignores outliers. Hence, neural networks are employed to ensure they capture the outliers.

Neural Network (NN) Past researchers have shown great success forecasting electricity price using NNs [350, 363, 368]. Drawing on their ideas, we conduct comparisons between the direct and the rerouted methods based on the same NN. For both methods, we use command “nftool” in MATLAB to build a simple NN with three layers (one input layer, one hidden layer, and one output layer). The hidden layer size is 30, and the NN is trained by the Levenberg-Marquardt algorithm. For the rerouted method (two-stage method), the training and testing regression that curves alongside the error histogram are shown in Fig. 6.25. The training MSE is .34.8300, and the testing MSE is .24.0220. For the direct method (price-to-price method), we fix the predicting data size N for each time and adjust the historical data size M to obtain the best testing results. The best results for each case are shown in Table 6.12. As we can observe from Table 6.12, the rerouted method can reach a smaller testing MSE than any of the direct methods, no matter how M and N are resized. In this case, when the rerouted method and NN with 30 hidden neurons is used, we can obtain the best testing performance. Compared to the results of the polynomial regression, we note that this neural network has a better training MSE, but a worse testing MSE. Therefore, we try to find out if we can gain a better testing result by increasing the hidden layer size from 30 to 60. The simulation result is shown in [373]. The MSE for training is 33.4186; the MSE for testing is 26.3436. The result shows that when we increase the hidden layer size, we obtain a better training MSE, but a worse testing MSE. Thus, increasing the sizes of the hidden neurons does not help obtain a better testing MSE but results in an overfitting in this specific problem.

218

6 Forecast for the Future January

100

Targets Outputs Errors Fit

Averaged RTM Price/$

80 60 40 20 0 -20

0

2000

4000

6000

8000

10000

12000

14000

Frequency

Wind Power Generation/MW 500

Zero Error

0 -40

-20

0

20

40

60

80

Error

(a) February Targets Outputs Errors Fit

Averaged RTM Price/$

40 30 20 10 0 0

2000

4000

6000

8000

10000

12000

14000

Frequency

Wind Power Generation/MW Zero Error 100 0 -20

-15

-10

-5

0

5

10

15

20

25

30

Error

(b) Fig. 6.25 Rerouted method: NN with 30 hidden neurons. (a) Training result. (b) Testing result Table 6.12 Direct method: the best testing results according to N

N M Training MSE Testing MSE

1 2 32.8374 28.1420

2 24 22.9490 32.9945

3 6 28.3315 27.3736

6 12 34.7806 35.4926

Best values are given in Bold

Deep Neural Network (DNN) Owing to the poor performance of the simple neural networks, we employ DNNs to see if they can do better compared to other models. DNNs are known for their

6.2 Price Prediction

219

Fig. 6.26 The structure of the DNN

powerful ability to learn the essential features of datasets from a small sample set. We find that there are not many papers that make use of DNN to forecast electricity price. Therefore, we provide a detailed discussion here to present the advance of using DNN. For the rerouted method (two-stage method), we use “nntool” in MATLAB to build DNNs with three different layers. The first DNN has 14 layers: 1 input layer, 13 hidden layers, and 1 output layer. The second DNN has 11 layers: 1 input layer, 10 hidden layers, and 1 output layer. The last DNN has 7 layers: 1 input layer, 6 hidden layers, and 1 output layer. All of them have a layer size of 30 for each hidden layer. The transfer function for the last hidden layer is “purelin”; the transfer function for all the other hidden layers is “tansig.” All the DNNs listed above are trained by the Levenberg-Marquardt algorithm. The structure of the DNN is shown in Fig. 6.26. We take the deep neural network with 14 layers as an example to illustrate the results. The training and testing regression curves, along with the error histogram, are shown in Fig. 6.27. Simulation results for the deep neural networks with 7 and 11 layers can be found in [373]. The MSEs of the deep neural networks are shown in Table 6.13. As noted in Table 6.13, when we increase the number of hidden layers, the training MSE doesn’t increase much, but the testing MSE becomes smaller. This indicates that we may achieve better testing MSEs if we add more hidden layers. However, the training time will surely be longer if more hidden layers are added. Hence, there’s a trade-off between the number of layers and the time for training. Therefore, the number of layers should be carefully chosen to get relatively good results.

Additional Discussion In order to testify the effectiveness and the generalization ability, real-time market (RTM) price and wind power generation from other months are chosen to perform the verification. All the simulation results are shown in Table 6.14, and it is obvious that the proposed two-stage method is better than the direct method. In addition, to testify the effectiveness and the generalization ability, we also make use of Monte Carlo tools to test the model’s stability against noise. As wind power generation will be influenced by the noise, we add different noise levels to test the model’s endurance toward the noise. The experimental results are shown in Table 6.15. Recalling previous results, the direct method can achieve the best testing accuracy of .27.1534, whereas compared to Table 6.15, we can draw the conclusion

220

6 Forecast for the Future January

Averaged RTM Price/$

100

Targets Outputs Errors Fit

80 60 40 20 0 -20

0

2000

4000

6000

8000

10000

12000

14000

Frequency

Wind Power Generation/MW 500

Zero Error

0 -40

-20

0

20

40

60

80

Error

(a) February Averaged RTM Price/$

40

Targets Outputs Errors Fit

30 20 10 0 0

2000

4000

6000

8000

10000

12000

14000

Frequency

Wind Power Generation/MW 200

Zero Error

100 0 -15

-10

-5

0

5

10

15

20

25

Error

(b) Fig. 6.27 Rerouted method: DNN with 14 layers. (a) Training result. (b) Testing result

Table 6.13 The MSEs of the deep neural networks MSE Training Testing

Layers 7 34.9439 22.7658

Best values are given in Bold

11 36.8437 23.2444

14 36.1379 20.5065

30

6.2 Price Prediction

221

Table 6.14 The MSEs of other months Method Direct Rerouted

Index MSE MSE

Table 6.15 Model endurance toward noise

Month April 81.5258 65.7889

May 49.5156 41.4192 Noise level MSE (Feb)

5% 22.7429

June 120.1935 56.4694 10% 25.2443

July 159.2903 53.2651 12% 26.0906

15% 29.4196

that the proposed method can endure at most 12% noise (at most) from the system and environment.

6.2.5 Conclusion This section develops a novel two-stage method to forecast the real-time market (RTM) price. This new method, namely, the rerouted method (two-stage method), predicts the future price using historical RTM bus price along with system-wide wind power generation. The main contributions of this work are the diversified input data resources such as highly correlated power data and the validation step to quantify the best training interval for different datasets. By conducting a comparison of the direct method to the conventional method, we confirm our conjecture that a higher accuracy can be obtained if we diversify the data source. Furthermore, when we examine the relationship between the input and the output, we find that they maintain a causal relationship. This causal relationship, combined with some physical models, can guarantee better results. To verify the effectiveness and the generalization ability of the model, we conduct simulations over another 4 months. The result shows that the proposed method is more accurate than the direct method. To further explore the model’s stability against noise, we create different noise levels of the wind power generation. The results show that the proposed model also has good stability toward the noise. Other related subjects of interest for further research include the improvement of the prediction accuracy by considering other renewable energies, solar energy being the most likely. These features should have causal relationships with an electricity price. In this section, most of the methods are simple models that do not have many of parameters, thus enabling us to develop more complex models that can achieve better results.

222

6 Forecast for the Future

6.3 Residential Appliances After understanding how to estimate the locational marginal prices and the renewnable energy prices, we investigate the load side with a focus on appliances. This is due to the expansion of residential demand-side management programs and increased deployment of controllable loads which require accurate appliance-level load modeling and forecasting. This section proposes a conditional hidden semiMarkov model to describe the probabilistic nature of residential appliance demand. Model parameters are estimated directly from power consumption data using scalable statistical learning methods. We also propose an algorithm for short-term load forecasting as a key application for appliance-level load models. Case studies performed using granular sub-metered power measurements from various types of appliances demonstrate the effectiveness of the proposed load model for short-term prediction.

6.3.1 Introduction The deployment of smart grid technologies and new types of loads, such as electric vehicles and smart appliances, is changing the way electricity is consumed in residential applications. Understanding the characteristics of these loads is essential to support efficient grid operation and to optimize home energy consumption for the consumer. Short-term or near real-time load forecasting is important for the provision of various services to the grid and the consumers, including demand-side management (DSM) and the integration of distributed renewable generation. Load modeling and forecasting at the appliance or device level are of particular interest for most DSM programs and home energy management systems (HEMS). For example, almost all utility demand response (DR) programs [374, 375] are designed for specific types of appliances. To achieve a desired load profile, utilities rely on load models and short-term forecasts to target customers and estimate the amount of load flexibility. In HEMS platforms, load monitoring, management, and control are mainly performed at the appliance level. Energy consumption schedules are optimized, by minimizing energy costs and maintaining thermal comfort, based on load models of individual household appliances, where the more accurate the load models, the better the load control performance. However, modeling and forecasting of demand at the appliance level is challenging due to the intrinsic variability and uncertainty of exogenous environmental variables and human behavior. Appliance characteristics and consumption patterns can also vary significantly from one household to another. Accurate modeling of the uncertainty and heterogeneity in these variables is required for DSM programs to reliably provide grid services. There is a growing demand for scalable methods for modeling and forecasting load at shorter time scales. The integration of information and communication tech-

6.3 Residential Appliances

223

nologies and advanced metering infrastructure in power grids enable bidirectional, automated, and intelligent interaction among system components. Ubiquitous sensing and communication can enable real-time monitoring, and advanced, automated control devices can allow for online decision-making. The ability to learn load models online and adapt to changing conditions will become more important with growing interest in autonomous DSM solutions.

6.3.1.1

Related Work

Load modeling involves (1) selecting an appropriate mathematical structure and (2) estimating the parameters of the chosen model. Most previous work focuses on either optimizing the accuracy of the model or improving the scalability of the parameter estimation method. Most physics-based load models and human behavior models tend to address the first objective, while machine learning-based techniques generally address the second. In this section, we highlight some of the most relevant work, focusing primarily on appliance-level modeling approaches. For broader interest, see reviews [89, 180, 376–378] and the references therein. Numerous physics-based models have been derived from first principles for residential loads, particularly for thermostatically controlled loads such as air conditioners and water heaters [379, 380]. These studies utilize differential or difference equations to model the load as a simplified dynamic system, whose parameters are estimated by methods such as least squares regression [379], genetic algorithms [381], and particle swarm optimization [381]. A major application of such models is forecasting the system state and power consumption which are then used for load control [382, 383]. However, formulation of the model structure requires detailed prior knowledge of the physics of the system and hence is not generalizable across appliance types. In contrast, black-box machine learning techniques for point prediction of power consumption require no prior knowledge about appliance characteristics. Lachut et al. [384] investigated the use of four different models for power consumption prediction for individual appliances at different timescales: k-nearest neighbors, naive Bayes, support vector machines (SVM), and ARMA models. In [385], the authors analyzed the importance of different exogenous variables as features in multiple linear regression, SVMs, random forests, and gradient boosting machine models. In [386], a load forecasting system was developed to predict the device state, time of use, and state duration using model-free empirical statistics. While some of these approaches can predict future power consumption, they generally do not model the state dynamics of the load, which are usually the required inputs for predictive control in DSM applications. Markov-based state-space models, which are the most relevant to the proposed approach in this paper, provide a generic framework for statistically modeling the dynamics of residential appliances of different types. Markov and semi-Markov based approaches have been developed for various applications, among which non-intrusive load monitoring, a process for deducing what appliances are used

224

6 Forecast for the Future

in the house and their individual energy consumption, has received considerable attention. Several variants of a factorial hidden Markov model (HMM) were considered in [387] for load disaggregation. A conditional factorial hidden semiMarkov model (HSMM) was shown to produce the best performance. In [388], the authors developed a second-order model called an explicit duration HMM with differential observations. A hierarchical HMM was proposed in [389] to represent appliances with multiple operating states. Markov and semi-Markov models have also been developed for load modeling and prediction applications. Ullah et al. [390] used the Baum-Welch expectation maximization (EM) algorithm to train a HMM for predicting the power consumption of each floor of a building using hourly consumption data. Duan et al. [391] developed a HSMM for load prediction that incorporates exogenous variables, but only obtained accurate results for loads at a large scale (larger than a few hundred MW). Stephen et al. [392] proposed a probabilistic model based on Kalman filters for online learning of the power consumption of wet appliances given no prior information of appliance characteristics. However, the use of the model for short-term load forecasting was not considered. The authors of [393] using HSMMs for inferring occupancy states and other household characteristics, explored consumer segmentation from household power consumption data. These approaches use traditional inference algorithms to estimate parameters, which have two major drawbacks: (1) the difficulty of identifying the distribution of the real-world data and (2) the high computational cost. Most approaches rely on the assumption that the dynamics of load can be captured by certain distributions and then estimate the parameters of these distributions. However, the real-world data in general cannot be accurately described by common distributions. In addition, numerous iterations are generally required to learn the models. The issue with a long training time is exacerbated with the incorporation of exogenous variables that impact the state evolution, larger dimensions of the state and duration spaces, and increased complexity of the distribution used to describe the state transition probabilities and observation emission probabilities. Such a high computational complexity makes these approaches impractical.

6.3.1.2

Summary of Contributions

In this section, we develop a statistical load model for residential appliances and a scalable learning method for short-term load forecasting. We propose a conditional hidden semi-Markov model (CHSMM) based on two characteristics observed in granular9 power data: (1) discrete operating states and (2) random time durations

9 We use “granular data” to refer the resolution at which discrete power levels can be observed. The (minimum) granularity depends on the appliance type. For example, discrete power levels can be observed in hourly load data for EV charging, but higher-frequency data is required for water heaters since the duration of the on cycle is much shorter.

6.3 Residential Appliances

225

spent in each state. In this semi-Markov model, the duration distribution is explicitly defined to allow more accurate modeling of the stochastic dynamics of the load, as opposed to Markov models where the duration distribution must be geometric. A key feature of the proposed model is that the state transition probabilities and emission and duration distributions are conditioned on exogenous variables. This allows factors such as temperature and seasonal effects that drive or impact the physical dynamics of the load to be incorporated into the model. The generic load model can be tailored by conditioning on different variables, such that it can be applied to a wide range of load types. To address the issue of long learning times for HSMMs with exogenous variables using traditional methods, we propose a scalable parameter estimation method using machine learning techniques. Unsupervised clustering analysis is used to abstract (hidden) states from (observable) power measurements instead of standard inference algorithms because the discrete power levels observed in granular power consumption data reflect unobservable physical operating modes of the appliance. Regression techniques are then used to estimate the transition and emission distributions to account for the dependencies on exogenous variables. The complexity of parameter estimation is significantly reduced for regression models, compared with direct estimation of the conditional distribution. This computational advantage is a significant benefit of the proposed model and allows for practical implementation in DSM applications.

6.3.2 Appliance Load Characterization On a granular level, the power consumption of a typical appliance exhibits two characteristics: discrete operating states and random state duration. We use a refrigerator10 from [394] to illustrate these features, where a sample trajectory of the 1-minute real power consumption is shown in Fig. 6.28a.

6.3.2.1

Discrete Operating States

Most residential appliances are characterized by a finite set of discrete operating states [395], each associated with a different level of power consumption. In the refrigerator example, different power levels are characterized by the step changes in the power consumption trajectory in Fig. 6.28a. The discrete states are represented by the four peaks at approximately 5 W, 140 W, 325 W, and 495 W (highlighted by the vertical lines) in the histogram in Fig. 6.28b. These power levels likely correspond with the operating states of the appliance: compressor off, compressor

10 The

tor1.”

refrigerator data can be found in [394] with Home ID 871 and appliance name “refrigera-

226

6 Forecast for the Future

Fig. 6.28 A refrigerator example: (a) real power trajectory from 7/1/2017 6:00 AM to 7/2/2017 6:00 AM with 1-minute resolution and (b) histogram of the real power consumption for 2017 in log scale

on, ice making, and defrosting. Although these states are not directly observable, they can be inferred from observations of the power level in the granular data. To this end, we apply K-means clustering, an unsupervised learning method, to identify the hidden operating states of the appliance from the power measurements. The number of states can be selected manually or through heuristic techniques such as the elbow method [396].

6.3.2.2

Duration Analysis

The duration the appliance stays in each state appears to depend on the current state and the previous state. The probability distribution of the length of time varies significantly by state, as shown in the histograms of the duration of each identified

6.3 Residential Appliances

227

Fig. 6.29 Empirical duration distributions for a four-state refrigerator: (a) each histogram shows the marginal duration distribution for a different state, and (b) each histogram shows the duration distribution for state 2, conditioned on a different previous state

state of the selected refrigerator in Fig. 6.29a. This suggests that the duration cannot be accurately described by a single distribution. In addition, the duration is also dependent on the previous state. Figure 6.29b shows histograms of the duration in state 2, each conditioned on a different previous state. The conditional distributions conditioned on state 3 and 4 are very different from the marginal distribution of state 2 shown in Fig. 6.29a. Using the marginal duration distribution alone in a load model hence may not accurately characterize load behavior. To account for the complexity of the duration distribution and the interdependence of duration and state, a HSMM that allows explicit modeling of the state duration appears to be a more appropriate model compared with a HMM.

6.3.3 Hidden Semi-Markov Model A HSMM [397] extends the concept of a HMM [398] to include the case where each state has a variable duration. The basic underlying idea for the HSMM formalism is to augment the generative process of a standard HMM with a random state duration time, drawn from state-specific distribution when each state is entered. The state remains constant until the duration expires, at which point there is a Markov transition to a new state. This formulation eliminates the implicit geometric duration

228

6 Forecast for the Future

distribution assumptions in the standard HMM and thus allows the state to transition in a non-Markovian way. A HSMM is characterized by the following components:   (1) The state space .S = S1 , S2 , . . . , SNS , where .NS is the number of states. We introduce the concept of an epoch to index the transition of states, denoting the stateat time at epoch k by . Sk . We have .xt = x˜t for all time  t by .xt s∈ S and s e e .t ∈ t , t , where .t and .t denote the starting and ending times of the k-th k k k k epoch, respectively. In contrast with the time-indexed state .xt , self-transition is not allowed for the epoch-indexed state .x˜k , i.e.,  .x˜k = x˜k+1 for all epoch k.  (2) The duration space .D = D1 , D2 , . . . , DND , where the number of possible durations .ND is finite. The duration is an integer equal to the number of time intervals occupied by each state. We denote the duration at epoch k by .d˜k ∈ D, whose value is equal to .tke − tks + 1. (3) The observation space .O can be continuous  continuous case,  or discrete. For the .O inR, and for the discrete case, .O = O1 , O2 , . . . , ONO , where .NO is the number of distinct observations. We denote the observation at time t by .yt ∈ O and at epoch k by  .y ˜k ∈ O. Note that .y˜k is a vector of length .d˜k , i.e., .y˜k =  ytks , ytks +1 , . . . , ytke . 11 (4) The transition  probability tensor A of the generation state is defined by the  double .  xk , dk ∈ S × D. Specifically, the .((i, l), (j, m))-th element of A is given by

  a(i,l),(j,m) = P  xk+1 = Sj , dk+1 = Dm | xk = Si , dk = Dl ,

.

∀k, ∀Si , Sj ∈ S, ∀Dl , Dm ∈ D.

(6.25)

(5) The emission probability distribution B, whose probability mass (density) function .bi (y) of state .Si is defined as follows. For a discrete observation space, b(y|i) = P [yt = y|xt = Si ] , ∀t, ∀Si ∈ S, ∀y ∈ O.

.

(6.26)

For a continuous observation space, 

Ol

.

  b(y|i)dy = P Oj ≤ yt ≤ Ok |xt = Si

(6.27)

Oj

∀k, ∀Si ∈ S, ∀Oj , Ol ∈ O. Note that the emission probability is assumed to be state-dependent but timeindependent.

11 We

use “tensor” to represent a four-dimensional array, where the dimensions correspond to the current state .x˜k , the duration .d˜k of the current state, the next state .x˜k , and the duration .d˜k of the next state, respectively. The space .S × D of the generalized state can also be vectorized, in which case the transition probability can be represented by a matrix (two-dimensional array).

6.3 Residential Appliances

229

Fig. 6.30 Graphical representation of a standard HSMM: circles represent states, curved arrows transitions, and straight arrows emissions

(6) The initial distribution .π whose .(i, j )-th element represents the probability of the initial state being Si and its duration being .Dj , i.e.,   π(i,j ) = P  x1 = Si , d1 = Dj , ∀Si ∈ S, ∀Dj ∈ D.

.

(6.28)

An example of the HSMM described above is shown in Fig. 6.30. In the T x1 and its duration period time horizon, there are a total of K epochs. The first state . 1 = 3 are selected according to the initial distribution .π( .d x1 ,d1 ) . The generalized  state .( x1 , d1 ) produces three observations . y1 = (y1 , y2 , y3 ) following the emission probability .b x1 (·). According to the transition probability .a( x1 ,d1 ),( x2 ,d2 ) , the state transits from . x1 to . x2 which is occupied for .d2 = 6 intervals. In the second epoch, six observations . y2 = (y4 , y5 , . . . , y9 ) are generated based on .b x2 (·). Such transitions continue until the last observation .yT is produced by the last state . xK in epoch K. Note that this final state . xK may last longer than .dK , but we impose a finite horizon T for this example.

6.3.4 Appliance Load Model In this section, we present a general probabilistic load model along with a scalable and robust statistical approach for model parameter estimation. Additionally, two variations are described to improve model accuracy for specific loads.

6.3.4.1

Conditional HSMM

As the power consumption of most appliances is highly dependent on external factors such as temperature and time of day, we propose a CHSMM, where the state

230

6 Forecast for the Future

Fig. 6.31 A graphical diagram of the proposed appliance load model. The circles represent states and durations; the dashed ovals, the generalized states; and the shaded circles, the observations. Arrows represent dependencies

transition probabilities and emission probabilities are conditioned on exogenous variables. A graphical representation of the proposed CHSMM for appliance load modeling is given in Fig. 6.31, which shows a time segment with K epochs corresponding e time periods. The notation for states, durations, and observations with .T = tK are the same as for the standard HSMM given in Sect. 6.3.3. The CHSMM includes two sets of exogenous variables which are not present in the standard HSMM description. The first set governs the state (and duration)  transition between consecutive epochs—the transition from the generalized state .  xk−1 , dk−1 at epoch    xk , dk at epoch k depends on the exogenous variable  .k − 1 to .  .zk . For example, outdoor temperature can be one of the exogenous variables for an air conditioner model since the transition of the compressor between the on and off states depends on it. The second set affects the observation the distribution of the observation .yt at time t is conditioned on the current state .xt and the exogenous variable .wt at time t. In the air conditioner example, the actual power consumption when the compressor is turned on may also depend on the outdoor temperature. It should be noted that both  .zk and .wt can be scalars or vectors depending on the number of features. While the state and duration spaces of the CHSMM and HSMM are the same, the generalized state transition probabilities of the CHSMM are conditional probabilities rather than a marginal distribution. In the HSMM,  the transition distribution  is represented by a transition tensor A with elements . a( xk ,dk ),( xk+1 ,dk+1 ) ; the functions. In transition distribution of the CHSMM becomes a set of probability   particular, the transition probability from .  xk , dk to .  xk+1 , dk+1 of the CHSMM is a function of exogenous variable  .zk+1 :

6.3 Residential Appliances

231

  zk ) = P  a( xk+1 , dk+1 | xk , dk , zk+1 . xk ,dk ),( xk+1 ,dk+1 ) (

.

(6.29)

Similarly, the observation .yt is conditioned on both the state .xt and exogenous variable .wt : b (yt |xt , wt ) = P [yt |xt , wt ] .

.

(6.30)

We note that the proposed CHSMM does not assume any independence of state or duration. It is different from simplified models [397] such as an explicit-duration HMM where a transition to the current state is independent to the duration of the previous state and the duration is only conditioned on the current state or a residential time HMM [397] where a state transition is assumed to be independent of the duration of the previous state. These independent assumptions, however, cannot be proven as shown in the example in Sect. 6.3.2.

6.3.4.2

Parameter Estimation

There are five parameters in the proposed CHSMM to be estimated: the number of states NS, the duration space D, the initial distribution .π, the generalized state transition distribution A, and the emission distribution B. As described in Sect. 6.3.2, the number of states can be estimated by the number of prominent peaks in the empirical histogram of real power consumption. Given the number of states, we apply K-means clustering and then determine the duration space from the clustering result. The initial distribution .π can be estimated by the empirical marginal distribution. To simplify the estimation we separate the transition  of transition    probabilities,  from the generalized state .  xk , dk to .  xk , dk conditioning on .z˜  into two parts: (1) the state transition and (2) the duration transition. The generalized state transition is given by S D  z )a  zk ) = a a( x , d, ( x , d, x  , z ) x ,d),( x  ,d ) ( x  ( d

(6.31)

S  z ) =P[  z ]. a x , d, x  | x , d, x  (

(6.32)

D   x , d, x  , z ) =P[d | x , d, x  , z ]. ad  (

(6.33)

.

where .

Note that this separation does not assume any independence of state or duration between the two consecutive epochs. According to the duration analysis in Sect. 6.3.2, a single distribution is insufficient to describe the randomness and dependence of duration across different states. More appropriate state-specific conditional distributions are difficult to estimate using traditional inference algorithms due to their high computational cost. To

232

6 Forecast for the Future

address the computational issue, we propose a data-driven approach which learns the transition probabilities via multinomial logistic regression (MNLR). The use of MNLR models arises from the fact that the state and duration are discrete variables. For any state .xk+1 ∈ S, a regression function is used to map the independent variables .(xk , dk , zk+1 ) to the state .xk+1 at epoch k. Similarly, a regression function associated with any duration .dk+1 ∈ D is used to map the independent variables .(xk , dk , xk+1 , zk+1 ) to the duration .dk+1 at epoch k. In S x , d,  z ) and the duration transition particular, the state transition probability .a x  ( D    probability .ad ( x , d, x , z ) have the following form: 1

S  z )  a x , d, x  ( 





d+α x +α  z x x eα x  

.

2





x +αx˜  d+αx˜  z eαx˜   1

x˜  ∈S

3

2

3

(6.34)

and D  .a ( x , d, x  , z ) d



e

4 z   3 β 1˜  x +β 2˜ d+β ˜ x +β ˜ d

d˜  ∈D e

d

d

d

4 z   3  β 1˜  x +β 2˜ d+β ˜ x +β ˜ d

d

d

(6.35)

d

where .αx = (αx1 , αx2 , αx3 ), ∀x ∈ S, and .βd = (βd1 , βd2 , βd3 , βd4 ), ∀d ∈ D, are regression coefficients. Note that the coefficients .αx1 , αx2 , βd1 , βd2 , βd3 are scalars, and 4 3 .αx and .β are compatible in dimension with exogenous variable z. d The emission distribution .b(y|x, w) is assumed to be Gaussian. Specifically, we assume the observation .yt is a linear function of the current state .xt and exogenous variable .wt at time t, and the residuals follow a Gaussian distribution, i.e.,   y ∼ N γx + φw, σ 2

.

(6.36)

where .γx is the centroid associated with state x from the K-means clustering algorithm. The values of the parameter .α, β, γ , φ, and .σ are obtained using maximum likelihood estimation based on the states obtained from K-means clustering. Note that the Gaussian assumption is made for simplicity and may not be valid for the actual distribution of observation. For example, a histogram around the second and third peaks are skewed. In this case, a nonlinear model or a generalized linear model that allows non-Gaussian residual distributions can be adopted.

6.3.4.3

State-Specific Model

The states of a CHSMM generally represent the operation of one or more appliance components (e.g., motors, heating elements). The duration distributions associated with different states may depend on distinct physical processes and can be influ-

6.3 Residential Appliances

233

enced by human behavior in various ways. To capture such variability, we propose a state-specific modeling approach for the duration transition distribution. Instead of using the current state as a feature in (6.35), we use a separate MNLR model per state. Specifically, the duration probability associated with state . x = D        Si ∈ S is given by .ad ,S ( x , d, z ) = P[d | x , d,  x = Si , z ], where i

D  z ) .a ( x , d, d ,Si

6.3.4.4



e 

1 α 

d ,Si

d ∈D e

2  x +α 

d ,Si

1 α 

d ,Si

 3 d+α 

d ,Si

2  x +α 

d ,Si

 z

 3 d+α 

d ,Si

 z

.

(6.37)

Weighted Logistic Regression

The duration distribution of specific appliances, such as air conditioners (A/Cs), tends to be heavily skewed. For example, the compressor in an A/C may typically cycle on for approximately 15 minutes, but occasionally run for several hours during peak thermal load conditions. These rare events can be of particular interest to utilities for peak load management. To model the load for such applications, we adjust the weights of samples when training the MNLR model for the duration distribution—a weighted MNLR model. Specifically, we assign a weight .wi = 1+ρ/Nd˜i to the ith sample in the training set, where .ρ > 0 is a tuning parameter and .N ˜ is the frequency of the duration .d˜i ∈ D appearing in the training set. This places di a small additional weight on samples with duration classes that are underrepresented in the training set. The loss function for weighted MNLR with L2 regularization is thus given by  1 D wi log ad x (i) , d(i) ,  x (i) , z(i) ) α βd 22 − (i) ( 2 M

L=

.

(6.38)

i=1

where M is the number of samples in the training set, .α is the regularization coefficient, and .βd is the vector of MNLR parameters.

6.3.5 Short-Term Load Forecasting Short-term or near real-time forecasting is an important use case for load models. In this section, we describe an online load forecasting algorithm using the proposed CHSMM. Note that the algorithm presented here serves as a framework that can be adapted to various application scenarios at different time scales. Given the learned model and all available information at time t, the goal is to predict the future power trajectory from .t + 1 to .t + H , where H is the prediction horizon. In particular, the following information is used for prediction: the results of K-means clustering,

234

6 Forecast for the Future

the parameters of the CHSMM, the trajectories of exogenous variables during the prediction horizon, and the observed power consumption prior to time t. To use the learned model for power consumption prediction, we abstract the state of the historical power trajectories. Based on the state sequence, the forecasting algorithm takes the following three steps. The first step is to predict how long the appliance remains in the current state. Let k be the index of the epoch at time t, such that .x˜k is the state and .d˜k is the duration. Given the observation of the previous generalized state .(x˜k−1 , d˜k−1 ) and the exogenous variable .z˜ k , the prediction .dˆ˜k of the duration .d˜k at epoch k is given by .

dˆ˜k =

  xk−1 , dk−1 ,  xk , zk αdD 

 arg max d∈D,d≥t−tks +1

(6.39)

e where .tk−1 is the end time of state . xk−1 . Here we use the most likely value as the predicted duration of the current state by maximizing the conditional probability with respect to the constraint .d ≥ t − tks + 1 which restricts the duration .d˜k of the current state .x˜k to be at least .t − tks + 1, since the last state transition happened at e time .tks = tk−1 + 1. It should be noted that the expected value can also be used as the prediction of the duration. Given .d˜k , the second step is to predict the most likely state trajectory until time .t + H . We use an iterative procedure using the state and duration transition probabilities given in (6.34) and (6.35). Starting from time .tke + 1, we maximize the state transition probability to find the most likely state .xˆ˜k+1 . Using this prediction, we find the most likely duration .dˆ˜k+1 and repeat this process until time .t + H . Finally, the observation trajectory .(yt+1 , yt+2 , . . . , yt+H ) can be obtained from the emission distribution given in (6.36) using the predicted state trajectory and associated exogenous variables. This prediction procedure is summarized in Algorithm 5.

Algorithm 5 Short-term forecasting using CHSMM Require: previous generalized state (x˜k−1 , d˜k−1 ), current state x˜k , probability distribution a S (·), a D (·), b(·), predicted trajectory of exogenous variable z˜ t+1 , . . . , z˜ t+H , and w˜ t+1 , . . . , w˜ t+H . initialize n ← k, τ ← t, dˆ˜n = dˆ˜k given in (6.39), while τ ≤ t + H do n←n+1    xˆ˜n = arg maxx∈S axS xˆ˜n−1 , dˆ˜n−1 , zˆ˜ n    dˆ˜n = arg maxx∈S adD xˆ˜n−1 , dˆ˜n−1 , xˆ˜n , zˆ˜ n   ˆ˜ n τ ← min i=k di , t + H   yˆs = E[ys |xˆ˜n , wˆ s ], ∀s ∈ τ − dˆ˜n + 1, τ end while result predicted trajectory yˆt+1 , . . . , yˆt+H .

6.3 Residential Appliances

235

6.3.6 Case Studies In this section, we present numerical results on the prediction performance of the proposed load model. First, we analyze the performance for different appliance types in various scenarios. Then we show how refinements to the A/C model can improve prediction accuracy. Finally, we discuss the computational cost and performance scale with the size of the training set.

6.3.6.1

Data

The proposed load model was evaluated using real-world data from the Pecan Street database [394]. The data consists of real power measurements at 1-minute intervals for individual appliances from homes in Austin, Texas. Five types of residential appliances were considered: A/Cs, refrigerators, pool pumps, EVs, and water heaters. Hourly outdoor temperature data was linearly interpolated to a 1minute resolution and used as an exogenous variable in certain models.

6.3.6.2

Parameter Specification

The number of states .NS of each appliance used in the K-means algorithm was set by the number of prominent peaks in the histogram of real power consumption in the training set. The A/Cs, water heaters, and EVs were all modeled with two states. These states represent the on and off modes of the A/C compressor, water heating element, or EV charger. Results from the cluster analysis for EVs showed that most of the homes had either a .3.3 or a .6.6 kW charger. The pool pumps in the dataset had 2–9 operating states. Pumps with two states are likely single-speed pumps, while pumps with three or more states may have multiple speeds and/or secondary loads. Most of the refrigerators had three or f our states, which may represent the on and off modes of the compressor, a defrost cycle, and potentially an ice maker. Different exogenous variables were used to model different appliance types. For A/Cs, the state and duration transition probabilities were conditioned on the outdoor temperature and the hour of the day. For the other appliances, the state and duration transition probabilities were only conditioned by the hour of the day. For the emission distribution, the power consumption of A/Cs in the on state was estimated using linear regression with the outdoor temperature as a single feature. For the other appliances, the power consumption in each state is relatively constant over time. Therefore, the emission distribution was not conditioned on exogenous variables.

236

6.3.6.3

6 Forecast for the Future

Performance Metric

Two metrics were selected to evaluate the prediction performance: root mean squared error (RMSE) and normalized RMSE (NRMSE). The RMSE is a measure of the absolute error and has the same units as the predicted power. To compare performance between different appliances with different power consumption, the normalized RMSE (NRMSE) was adopted. The RMSE and NRMSE are defined respectively by   M 1   .RMSE = yt )2 (yt − M

(6.40)

t=1

and NRMSE =

.

RMSE ymax − ymin

(6.41)

where .yt and .yˆt are the actual and predicted power consumptions of the appliance at time t, M is the number of samples in the testing set, .ymax = max{y1 , . . . , yM } is the maximum power, and .ymin = min{y1 , . . . , yM } is the minimum power. Note that we normalize the RMSE by the range instead of the mean, since using the mean power can make the NRMSE inaccurate when the appliance has zero consumption during most of the testing period.

6.3.6.4

Load Forecasting for Individual Appliances

We randomly selected 20 homes for each type of appliance. Five months of data (5/1/2015–9/30/2015) was used to train the models for each individual appliance and 1 month of data (7/1/2016–7/31/2016) for testing. Predictions were performed every 15 minutes over a 6-hour horizon, and performance metrics were computed using a 60-minute averaging period. We compared the proposed CHSMM with a HSMM which uses the MNLR models for state and duration transition probabilities without exogenous variables. The prediction performance of the proposed CHSMM and HSMM for each appliance type (averaged over 20 homes) is shown in Table 6.16. For all studied appliances, Table 6.16 Average performance of 6-hour ahead predictions

Appliance type A/C Pool pump Water heater EV Refrigerator

NRMSE HSMM 0.297 0.258 0.56 0.284 0.181

CHSMM 0.266 0.219 0.418 0.283 0.172

RMSE HSMM 0.771 0.7 0.615 1.103 0.048

CHSMM 0.697 0.563 0.494 1.101 0.045

6.3 Residential Appliances

237

the proposed CHSMM outperforms the HSMM in both metrics. Generally, the CHSMM improved the prediction performance the most for thermal loads (A/Cs and refrigerators) and appliances that operate on a fixed schedule (pool pumps). For water heaters and EVs, the CHSMM and the HSMM have similar performance because their power consumption is much more stochastic and driven primarily by consumer use patterns. The hour of the day is insufficient to capture the dynamics. In terms of the RMSE, the prediction performance was best for refrigerators and worst for EVs. Refrigerators have relatively low power consumption and very consistent use patterns, resulting in low RMSE values for both the CHSMM and HSMM. In contrast, EVs have a much larger magnitude of power consumption and very stochastic use patterns. While the magnitude of the power consumption of pool pumps is similar to that of EVs, pool pumps have much more consistent use patterns. We used box and scatter plots to show the diversity in appliance characteristics and consumer behavior between households for the performance of 6-hour ahead predictions of individual appliances in Fig. 6.32. For A/Cs, we observed a significant improvement in the performance from including exogenous variables in the model. We also computed the mean of NRMSE across homes, and that of the CHSMM was approximately 10.4% lower than that of the HSMM. The power consumption of the compressor tends to vary as a linear function of the outdoor temperature, and the duration of both the on and off states are highly correlated with outdoor temperature. Therefore, conditioning the emission and duration distributions on the outdoor temperature improves the model substantially. For pool pumps, incorporating the hour of the day into the model also greatly improved model performance, as the mean NRMSE of the CHSMM was .15.1% lower than that of the HSMM. This improvement is due to the fact that residential pool pumps generally operate on a timed schedule and therefore exhibit a very consistent daily load profile. Homes with higher prediction error had greater variability in their load profile from day to day, which may occur if homeowners change their pool pump settings more frequently. The improvement from including hour of the day in the model was even larger for water heaters, as shown in Fig. 6.32 and Table 6.16. The mean NRMSE of the CHSMM was .25.4% lower than that of the HSMM. Water heater power consumption is primarily driven by hot water use, which is usually correlated with the time of day. Including the hour of the day in the model also reduced the range of NRMSE values, improving the robustness of the model. In contrast, incorporating hour of the day into the model had a negligible effect on performance for EVs and refrigerators. The effect of the prediction horizon on the performance was also investigated. As shown in Fig. 6.33, the prediction error tends to increase with the prediction horizon for all appliance types but approaches a constant value for large prediction horizons. This suggests there is no drastic degradation in performance as the prediction horizon is increased.

238

6 Forecast for the Future

Fig. 6.32 Performance of 6-hour ahead of load predictions of individual appliances. For each type of appliance, the box plots are associated with the HSMM (blue on the left) and CHSMM (green on the right) models, and dots represent the prediction error of individual homes using the respective model

Fig. 6.33 Impact of the prediction horizon on the mean NRMSE of the CHSMM for different appliance types

6.3.6.5

Load Aggregation and Model Refinements for A/Cs

The prediction accuracy of the proposed CHSMM tends to be worse for appliances with heavy-tailed duration distributions. Figure 6.34 shows the histograms of the on state for the A/Cs with best (dark blue) and worst (light blue) prediction performances. The duration distribution of on state relies on the cycling frequency of the compressor, which depends primarily on the thermal parameters of the home and the sizing of the compressor. Air conditioners with heavy-tailed duration distributions may be undersized or experience frequent changes in the thermostat setpoint. This observation was the motivation to place additional weight on long duration samples in the training set, i.e., the weighted MNLR. Four different models were implemented based on the two variants described in Sects. 6.3.4.3 and 6.3.4.4: (a) the basic model, (b) weighted MNLR, (c) statespecific MNLR, and (d) weighted state-specific MNLR. The weighting factor for

6.3 Residential Appliances

239

Fig. 6.34 Histogram of the duration of the on state for A/Cs with the lowest (dark blue) and highest (light blue) NRMSE values

Fig. 6.35 The aggregated NRMSE for 6-hour ahead load predictions for A/Cs using different refinements of the CHSMM

the weighted MNLR was set to .ρ = 0.02Nmax for underrepresented samples, where Nmax is the largest number of times any single duration class appears in the training set. Performance was evaluated at different levels of load aggregation. We predicted the power consumption of individual loads but calculated the error metric using different aggregation levels. The 6-hour ahead prediction results for each of the model variations at different aggregations are shown in Fig. 6.35. While both refinements reduce the error, implementing both methods together results in the most significant improvement. For example, the NRMSE for an aggregation of 50 A/Cs was reduced from .0.166 to .0.090 for the weighted statespecific duration model. The weighted MNLR improves the prediction accuracy of infrequent long duration events which occur when the compressor cycles for

.

240

6 Forecast for the Future

Table 6.17 Training set periods evaluated for each appliance type

Length 1 week 1 month 3 months 5 months 7 months 12 months

Date range 7/1/15–7/7/15 7/1/15–7/31/15 6/1/15–8/31/15 5/1/15–9/30/15 4/1/15–10/31/15 1/1/15–12/31/15

A/C x x x x x x

PP x x x x x x

WH x x x x x x

EV x x x x x x

R x x x

A/C = air conditioner, PP = pool pump, WH = water heater, EV = electric vehicle, R = refrigerator

extended periods of time during high thermal loads. The increase in model complexity from using state-specific models reduces the prediction error by achieving a better approximation of the duration distributions of the on and off states. The reduction in prediction error from aggregating loads is most significant when increasing the number of A/Cs from 10 to 20. Increasing the size of the aggregation further results in smaller improvements.

6.3.6.6

Scalability and Performance

To analyze how the size of the training set affects the load prediction performance and computational cost, we varied the length of the training period while keeping the testing period constant. Six training periods with lengths ranging from 1 week to 12 months were selected. For each appliance type, we trained the model on the training periods during which all the appliances in the dataset had at least one state transition. A/C models were only trained on data during the cooling season (up to 7 months). The training periods that we analyzed for each appliance type are listed in Table 6.17. One month of data (7/1/16–7/31/16) was used for testing in all cases. Results indicate that there is a trade-off between the length of the training period and prediction performance. A training period of 1–3 months achieves an optimal balance between these two factors. Figure 6.36 shows the mean NRMSE as a function of the length of the training period for a 6-hour ahead prediction horizon for the CHSMM. For all appliance types, the NRMSE is relatively large for training periods shorter than 1 month. However, there are marginal benefits from using a training period longer than 3 months. For some appliances such as pool pumps, we observed a slight increase in prediction error for large training sets. This was likely caused by seasonal factors contributing to a lack of generalizability between the training and testing set. Figure 6.37 shows how the computational cost of the parameter estimation varies with the size of the training set for the CHSMM. The computational cost includes three major components: (1) K-means clustering, (2) calculating the duration from the clustered time series data, and (3) training the duration, state transition, and emission models. For all load types except for A/Cs, the computational cost increases approximately linearly with the length of the training

6.3 Residential Appliances

241

Fig. 6.36 Impact of the training size on the prediction performance of CHSMM

Fig. 6.37 Impact of training size on the computational cost of CHSMM

period. The A/C models have a larger computational cost due to the greater relative number of state transitions per day (higher cycling frequency), the larger number of exogenous variables included in the model, and the more detailed model of the emission distribution. As noted above, only 1–3 months of training data is generally required to achieve good model performance. For this amount of training data, the computational costs are quite low ( 0},

.

(7.7)

where we recall that .P is the probability distribution over . according to which scenarios are sampled in an independent and identically distributed way. Clearly, the risk depends on the set of extracted scenarios, and therefore it has a stochastic variability. Nevertheless, in [426] it was proven that there are conditions under which the risk is distributed according to a beta distribution, irrespective of the distribution of the sampling probability .P. Generally, the results in [426], and in the following contributions, allow one to compute upper bounds to the risk that hold true with high confidence. The concept of support constraint played a crucial role in the theory of the scenario approach, which is defined as follows. Definition 7.2 (Support Constraint) The scenario-dependent constraint corresponding to sample .δs , .s ∈ {1, 2, ..., S}, is a support constraint for .SPS , if its removal improves the solution of .SPS , i.e., if it decreases the optimal cost Eq. (7.6a). We are now ready to state the main results of the theory of the scenario approach and explore them in the present context. Sections 7.1.3.1 and 7.1.3.2 focus on the a priori evaluation of the risk, where we use samples from the uncertainty set to guarantee a certain level of risk with high confidence. Based on the results in these two subsections and the analysis of the Sc-LAED problem in the absence of congestion, we propose a data-driven procedure that we call Algorithm 6. In Sect. 7.1.3.3, we consider the case when there is congestion and it shows that, despite the high number of scenarios that are required by the a priori approach, it is still possible to make useful and accurate claims on the risk after observing the complexity of the obtained solution (a posteriori evaluation). Conclusions are drawn, and a data-driven procedure that explores a posteriori evaluation is proposed (Algorithm 7).

7.1.3.1

The A Priori Scenario Approach Method

The main theorem in [426] is the following one. Theorem 7.1 With the assumption that Eq. (7.6) returns a unique solution, it holds that

252

7 Design New Markets

S

P

.

{V (xS )

> } ≤

d−1  S i=0

i

 i (1 − )S−i ,

(7.8)

where .PS is the probability distribution taken over .δ1 , . . . , δS , which is a product probability due to independence. The right-hand side of Eq. (7.8) is the tail of a beta distribution with parameters (d, S − d + 1). As .S grows, the tail goes exponentially to zero [426]. Fixing a small −6 , one can easily find the smallest number of samples .S such that .β, say .β = 10 d−1 S i S−i < β holds true, so that the right-hand side of Eq. (7.8) is . i=0 i  (1 − ) less than the specified .β. Then, one can claim that with high confidence .1 − β, the risk .V (xS ) of the scenario solution with .S scenarios is no larger than .. Note that the right-hand side of Eq. (7.8) does not depend on .P. This is remarkable and shows that, in order to guarantee that .V (xS ) ≤  with confidence .1 − β, we do not need to know .P. A graphical representation of the roles of the risk parameter . and the confidence parameter .β is shown in Fig. 7.1. The cube on the left is .S , the set of all the possible .S-tuples of scenarios. A point in this cube can be identified with an instance of .S , i.e., with a particular set of scenarios .{δ1 , δ2 , . . . , δS } that is obtained by randomly sampling .S scenarios from . according to the probability distribution .P. For this sample .S , there is a set of feasible solutions .χ that does not violate any of the constraints for any of the scenarios in .S . This is depicted in the middle of Fig. 7.1. An optimal solution .xS is then determined for this set .S of scenarios. The set of scenarios .δ belonging to . for which .f2 (xS , δ) > 0 (i.e., the constraint in Eq. (7.6) is violated) is called the violation region, and it is the region that is shaded black to the right in Fig. 7.1. This region has probability .V (xS ). We would like this probability to always be smaller than the risk parameter .. However, .V (xS ) has a variability as it depends on the sampled scenarios .S through .xS , and it will happen that .V (xS ) >  for certain samples .S that are in a bad set. Such a bad set is depicted as the black region in the cube on the left. Theorem 7.1 guarantees that if the right-hand side of Eq. (7.8) is smaller than .β ∈ (0, 1), the bad set has a probability that is smaller than .β (with respect to the product measure .PS ). .

Fig. 7.1 Illustration of the scenario approach

7.1 Scenario-Based Stochastic Dispatch

253

An explicit formula to find .S, which returns a slightly more conservative number of samples, is given below in Eq. (7.9), which is taken from [447]. As can be seen, the number of samples needed grows linearly with the dimension the optimization being performed and . 1 , but it is not as sensitive to .β. Lemma 7.1 Under the same conditions as Theorem 7.1, if

2 1 .S ≥ ln + d  β

(7.9)

then .PS {V (xS ) > } ≤ β. We now consider the structure of the Sc-LAED problem more explicitly. Note that in Eq. (7.5), only Eqs. (7.5c) and (7.5d) consist of scenario-dependent constraints defined by the net load forecast error at each bus. Eliminating Eq. (7.5d) (for now), one can observe that most .T − 1 constraints can be active and indeed be support constraints. This is due to the fact that for each .t = 2, . . . , T , the constraints in Eq. (7.5c) are half-spaces with the same slope but different displacement, so that no more than one can be active at the same time. Therefore, the number of support constraints for Eq. (7.5) is no more than .T − 1 with a probability of one. In view of this fact, the same formula in Eq. (7.8) can be applied by replacing d with .T − 1; see, e.g., [448, 449]. This prevents the number of samples from growing to very large numbers when congestion is not in the picture. The reduction in the number of required samples in this special case helps the scalability of the problem and shows that the number of samples can be independent of the number of generators and the number of buses in the system, and it only depends on the number of look-ahead intervals .T − 1, . and .β. For a general case and for bulk power systems applications, satisfying Equations (7.8) or (7.9) will require many samples. This is a well-known issue found in scenario approach literature. Several solutions are available that range from multiple steps or iterative procedures; see [450] and references therein, to regularization schemes, [451]. Among them, the recently proposed “wait and judge approach,” [434], is of particular interest in the case of Sc-LAED, because it allows one to compute the upper bound on the risk of the solution as a function of the complexity of the obtained solution. In this way, useful upper bounds can also be obtained when a small number of scenarios are available. This approach will be discussed in Sect. 7.1.3.3. It is also important to remark that, in general, among the sampled scenarios, there might be some extreme scenarios that can lead to excessively conservative results in terms of a cost function. In the following Sect. 7.1.3.2, we show how to eliminate such scenarios while taking in the risk bounds.

254

7.1.3.2

7 Design New Markets

Sampling and Discarding Approach in Sc-LAED

The sampling and discarding approach [427] is one technique in the scenario approach theory that trades risk for performance. Essentially the cost of Sc-LAED is reduced by eliminating scenarios of choice, but the price paid is an increase in the guaranteed risk. Let .A be the discarded scenarios among those in .S , and let .|A| be the cardinality of .A. If the following relation is satisfied:

.

|A| + d − 1 |A|

|A|+d−1  S  i (1 − )S−i ≤ β, i

(7.10)

i=0

 then the solution .xS−|A| that is obtained by removing the scenarios in .A from .S has a risk no larger than ., with high confidence .1 − β. Usually, the support constraints with the highest improvement in the cost of Sc-LAED are removed sequentially by selecting the scenarios with the highest Lagrange multipliers. However, any other elimination rule is valid. For the stated result to hold true, the number of scenarios to be discarded (.|A|) should be defined a priori, while choosing .|A| a posteriori is possible at the price of a (usually minor) degradation in the overall confidence (typically, the confidence becomes .1 − Kβ instead of .1 − β, where K is the total number of values of .|A| that one is willing to accept; for a detailed discussion on this point, see the discussion before Equation (4) in [427]). Combining the results of Theorem 7.1 and Eq. (7.10), a procedure (Algorithm 6) is here proposed for the case when no congestion is expected. The user inputs a desired risk parameter .0 . As explained above, exploring alternative solutions through scenario removal comes at the cost of degrading the guaranteed risk. Hence, the user also sets a modified risk parameter, .˜ ≥ 0 , which is still acceptable for practical purposes and that should be preferred to .0 only if the gain in terms of the cost function is significant. Similarly, a desired confidence parameter .β0 is specified together with a degraded confidence parameter .β˜ ≥ β0 that is still acceptable for practical purposes. These parameters together determine how many scenarios can be safely removed before a solution is returned by the algorithm, that is, they allow the system operator to trade risk for performance in a safe way.

7.1.3.3

The A Posteriori Scenario Approach Method

Convex optimization in dimension d has, at most, d support constraints [436, 452]. For the class of fully supported problems (when a problem in dimension d has exactly d support constraints with probability one), strict equality holds instead of inequality in Eq. (7.8). However, in many engineering applications, the problem being solved is not a fully supported problem. For instance, as discussed in ScLAED, when the system is not congested, the number of support constraints is always far less than the number of decision variables. In this subsection, we study

7.1 Scenario-Based Stochastic Dispatch

255

Algorithm 6 For Sc-LAED in the absence of congestion ˜ T 1. INPUT: 0 , ˜ , β0 , β, 2. Compute S that satisfies Equation (7.9) when , β, d in Eq. (7.9) are replaced by 0 , β0 , T −1, respectively. 3. for i = 1, 2, . . . do a. Find a valid i that satisfies inequality Equation (7.10) where d and |A| in Eq. (7.10) are replaced by T − 1 and i, respectively. ˜ then go to Step 4. b. if (i > ˜ or (i + 1)β > β), end for 4. Sample S scenarios and compute xS by solving Equation (7.6). 5. if (cT xS is satisfactory or i = 1) then OUTPUT: xS , its guaranteed risk 0 and the confidence (1 − iβ0 ); else 6. for k = 1, . . . , i − 1 do  a. Remove the worst k scenarios from δ1 , . . . , δS in Eq. (7.6) and compute the solution xS−k with S − k scenarios.   , its guaranteed risk is satisfactory or k is equal to i − 1), then OUTPUT: xS−k b. If (cT xS−k k and the confidence (1 − iβ0 ). end for

V (xS ) jointly with the complexity of the solution, defined below as .νS for the general case where transmission constraints are considered.

.

Definition 7.3 (Complexity) .νS , the complexity of the solution .xS∗ to .SPS , is the number of the support constraints for .SPS . Complexity in Sc-LAED consists of the (at most .T − 1) support constraints corresponding to the generation adequacy constraint in Eq. (7.5c) plus possibly some support constraints for Eq. (7.5d), which cannot be predicted before solving Equation (7.5). The relation between risk and complexity was first studied in [434]. The results of [434] provide an upper bound on the risk after computing the solution. See Theorem 7.2. Theorem 7.2 For program Equation (7.6) with .S > d, for any .τ = 0, 1, 2, ..., d, the polynomial Equation (7.11), with t as a variable, has one and only one solution in .(0, 1):

.

S β  i i−τ S S−τ − t = 0. t τ S+1 τ

(7.11)

i=τ

We denote this solution by .t (τ ). Defining .(τ ) = 1 − t (τ ) under the assumption of non-degeneracy and uniqueness of the solution [434], it holds that PS {V (xS ) ≤ (νS )} ≥ 1 − β.

.

(7.12)

256

7 Design New Markets

Fig. 7.2 Upper bound on the risk for .S = 2000, .d = 1088. The vertical axis denotes values of  and the horizontal axis denotes values of .ν  . The distance between the black dotted line S and the red curve is the improvement on the risk bounds provided by Theorem 7.2

.V (xS ),

The results after observing .νS support constraints, compared to the original bound from [426] for the synthetic Texas system [453] with .T = 2 in Eq. (7.5), are showed in Fig. 7.2. When .νS d, the results improve significantly. This allows one to make significant claims on the risk even when the number of sampled scenarios is relatively small. For example, for the setting described in Fig. 7.2, an upper bound of . = 0.5967 is obtained by using Theorem 7.1 with .S = 2000. On the other hand, with the same number of scenarios, observing .νS = 18 allows one to claim  .(ν ) = 0.0262 as an upper bound thanks to Theorem 7.2. S Algorithm 7 explores Theorem 7.2 to compute upper bounds on the risk of the scenario solution when congestion is expected so that d cannot be replaced by .T − 1 in Theorem 7.1 and the number of scenarios .S cannot be increased to the values required by Theorem 7.1. In this algorithm, .S is supposed to be given, and typically it accounts for existing computational/data collection limitations. Algorithm 7 For Sc-LAED when congestion is expected 1. INPUT: S, β 2. Compute ¯ (τ ), τ = 0, . . . , d according to Theorem 7.2. 3. Sample S scenarios and solve Eq. (7.6); obtain xS and count the number of support constraints νS . 4. OUTPUT: xS and the upper bound on the risk ¯ (νS ).

7.1 Scenario-Based Stochastic Dispatch

257

In conclusion, Algorithm 6 is the choice when the system operator does not expect congestion in the next T intervals. On the other hand, when congestion is in the picture, the a posteriori approach (Algorithm 7) should be employed. Considering that, in real life, the LAED problem is solved several times along a time horizon, one can try to guess .νS for a new instance of Sc-LAED based on the past solutions so as to adjust .S accordingly. For example, .νS [t − 1], i.e., the number of support constraints at the previous time step, can be used as a starting estimate for the number of support constraints at time t. When .νS [t − 1] νS [t] and .S samples are not sufficient to guarantee the desired risk level, one might sample new scenarios according to an iterative algorithm. Iterative schemes in this line of thought are the subject to ongoing research.

7.1.4 Case Study In this subsection, we test the proposed approach on a 2000-bus synthetic grid on a footprint of Texas [453]. This system consists of 544 generation units, with a portfolio of 367 gas, 39 coal, 4 nuclear, 25 hydro, 87 wind, and 22 utility-scale solar power plants. Uncertainty exists where the nodes with wind/solar resources are located. This can be generalized to DER aggregation and participation in the wholesale electricity market. Four hundred and thirty-two of these units are active during the study period (default setting in [453]). Its transmission network consists of 3206 transmission lines. Installed wind capacity is about 13% of the peak load, and installed solar capacity is less than 1% of the net load. MATPOWER [454] is used to obtain PTDF of the synthetic grid and confirm the accuracy of the base case modelings. Where data was not given (such as the ramping capabilities of the units), the modifications were performed according to [408, 437]. In addition, load and wind profiles were adapted from these references. The optimization is performed for a 24-hour period (96 intervals). T in Eq. (7.5) is two, meaning that there is one deterministic and binding and one uncertain, nonbinding interval. For efficient illustration, in each of the following subsections, different windows of the 96 intervals during a day will be the focus. It is assumed that generators bind linearly into the real-time market. The uncertainty on each uncertain resource is distributed according to Gaussian distribution with mean .μ equal to the nominal forecast and with standard deviation .σ defined as the normalized standard deviation of the wind/solar forecast. A scenario is obtained by sampling the uncertainty instances from these distributions in an independent fashion. Information on the scenario generation mechanism was provided here for the sake of comparison only, and it must be remarked that the adopted method does not require that the underlying probability distribution be known. Deviations from forecasted values enter the net load scenarios as negative load. The confidence parameter .β = 10−6 is used throughout the case study. The decision of each dispatch method is tested using .10,000 independent scenarios extracted from the same uncertainty set.

258

7 Design New Markets

This case study is divided into two parts. The focus of the first part is on the ramping events due to renewable integration in the system, illustrating the algorithm suggested in Sects. 7.1.3.1 and 7.1.3.2 with .d = 1 in the absence of congestion in the system. The second part extends the original scenario theory to the results shown in Sect. 7.1.3.3 in the presence of line constraints. It is shown that by using the results in Eq. (7.12), it is possible to start with a sample size with almost no guarantee on the results and can reach a high confidence level in the results by analyzing the complexity of the solution.

7.1.4.1

Extreme Ramping Test: Scenario vs. Deterministic and Robust LAED

To simulate how different methods respond to the possibility of an extreme ramping event, we increased the wind/solar penetration threefold while increasing the load in the system by 18%. .σ for each uncertain resource is .0.07μ, where we recall that .μ is the forecast of wind and solar resources. A full Gaussian distribution is used to generate the scenarios for the scenario approach. Following the robust methodology in [415], we truncated the Gaussian distribution at .μ ± 3σ for the robust method. The simulation is performed for two different sizes of scenarios and compared to the deterministic and robust methods. The scenario sizes are 2000 and .10,000, which correspond to . = .0.0083 and .0.0017, respectively, using Eq. (7.8). As discussed in Sect. 7.1.2, the decision for the first interval is binding, and the future interval is advisory. Therefore in Fig. 7.3, we compare the dispatch cost of the binding interval (where there is no uncertainty) using different approaches. We show peak hours in Fig. 7.3 because the system is more vulnerable to ramping events during these hours. As can be seen, the robust method has a clear offset in terms of the binding dispatch cost, while the deterministic method carries the least cost of dispatch. However, the increment in the dispatch cost using the scenario method is small compared to the robust method. It should be noted that the generated sets of 2000 and .10,000 scenarios are generated independently. Therefore, there can be a few cases where the dispatch cost is higher with 2000 scenarios than with .10,000. Violation probabilities in the scenario approach are as expected and shown in Fig. 7.4. The robust method maintained the zero violation probability, while scenario LAED allowed some violations but kept this violation below the corresponding  .. The .V (x ) for the deterministic LAED is .0.5029 for hours shown in Fig. 7.4. S Therefore, the scenario method successfully confines .V (xS ) ≤  with a cost much smaller than the robust method. Some extreme scenarios that can lead to conservative results might be included when samples are being collected randomly. We used .10,000 scenarios in the previous section, and approximately 100 of them dropped. As mentioned in Sect. 7.1.3.2, the discarding strategy can be using any arbitrary rule. In this case, we discard the constraints whose removal maximizes the reduction of dispatch cost. As shown in Fig. 7.5(right), when scenarios are being dropped, the performance, which in this case is the cost of the binding interval, is improved. The performance improvement

7.1 Scenario-Based Stochastic Dispatch

Dispatch cost ($) (only binding interval)

8.6

259

×105 Deterministic Scenario = 0.0083, N = 2000 Scenario = 0.0017, N = 10000 Robust

8.4 8.2 8 7.8 7.6 7.4 7.2

14

14.5

15

15.5

16

16.5

17

17.5

18

18.5

19

Hours of the day Fig. 7.3 Comparison of the dispatch cost during the peak hours of the day using different methods

9

×10−3

Observed violation probability

8 7 6

Scenario Scenario

= 0.0083, N = 2000 = 0.0017, N = 10000

5 4 3 2 1 0

14

14.5

15

15.5

16

16.5

17

Hours of the day Fig. 7.4 .V (xS ) for two different scenario settings

17.5

18

18.5

19

260

7 Design New Markets

Fig. 7.5 Sampling and discarding results: trading risk for performance. Left: Violation probability (Monte Carlo estimate with 10,000 samples) located below .k . Right: Binding interval cost reduction in one interval after elimination of k scenarios  ) and . after dropping .k ∈ [1, 100] is traded for risk. Figure 7.5(left) shows .V (xS−k k scenarios. The values of .k extracted from Eq. (7.10) are the values of the transparent plane depicted above the observed violation probabilities. Trading risk for performance can be particularly helpful if dropping the first few scenarios significantly reduces the costs, as in the case of the first few scenarios in Fig. 7.5(right).

7.1.4.2

Risk and Complexity: Considering All Constraints in the Sc-LAED

In this subsection, both network and ramping constraints are considered. Therefore, it is no longer possible to know the exact number of support constraints prior to solving the problem. To use the original line constraints in [453], we do not change wind and solar penetration in this section. However, to cause congestion, we changed the load by 5% at all nodes. The argument is that by making a guess that the number of support constraints is low, we can start with a very large ., solve the problem, and, by observing the results, update our knowledge of .. In this case, we solved the problem with 870 scenarios, which is slightly more than the number of decision variables (which is 864). This leads to . = 0.9996. This means that .V (xS ) can vary from 0 to .0.9996, so that Theorem 1 provides almost no information about   .V (x ). However, an a posteriori upper bound for .V (x ) can be found by Theorem S S 7.2. For instance, when three constraints of support are observed in Sc-LAED, meaning that their removal changes the solution, the claim “.0 ≤ V (xS ) ≤ 0.0282" can be delivered. For the test case, a posteriori results for the first 50 intervals of a day are summarized in Fig. 7.6. As can be seen, the observed number of support constraints (blue .) is small, although congestion exists. The number of support constraints

7.1 Scenario-Based Stochastic Dispatch

261

Fig. 7.6 ., number of observed support constraints; .♦, violation probability (Monte Carlo estimate with .10,000 samples); and ., the upper bound on the violation probability based upon the complexity

for this study varies between one, two, and, for some intervals, three, which is much smaller than .d = 864 (while the a priori results in [426] are for a fully supported problem, i.e., .νS = 864 with probability one). Using Theorem 7.2, one can rigorously define an upper bound on the risk of dispatch for these intervals. Our knowledge about the upper level of .V (x) gets much sharper, as shown by the black stars in Fig. 7.6 (compare .V (x) ≤ 0.9996 with the results). Ten thousand samples for each interval were used to estimate the violation probability: the resulting estimates are all within the theoretical bounds and are represented by the red .♦ in Fig. 7.6.

7.1.5 Conclusion In this section, the scenario approach for solving uncertain economic dispatch is introduced. It is shown that this approach does not require any knowledge of the underlying uncertainty distribution yet yields a quantifiable level of risk in real-time economic dispatch. It is shown how the risk can be evaluated according to a priori and a posteriori mathematical results. The scalability of the problem is considered in both the a priori and a posteriori stages. In the a priori stage, it is shown that disregarding congestion, the number of samples needed does not increase with the size of the system. This fact bears several benefits: first, it makes the process of collecting i.i.d. samples practical; second, it avoids both an overly conservative solution and a high computational burden.

262

7 Design New Markets

Moreover, pessimistic scenarios can be neglected with controllable degradation of the violation probability. In the a posteriori stage, the risk of constraint violation can turn out to be much smaller than general a priori, promising future scalability of the Sc-LAED for a congested case. The case study on a realistic power system suggests that the scenario based LAED could provide a reliable solution with a quantifiable bound on the conservativeness of the results. There is a need for more rigorous investigations of the correlation between the number of constraints of support and design parameters in Sc-LAED. Therefore, our future work will be focused on the a posteriori stage, where a procedure to start from a few scenarios and progressively aim toward the desired . based on the observed number of support constraints will be developed. Practically speaking, the scenario approach strikes a good trade-off between deterministic and robust optimization-based dispatch. The ISO could potentially adopt the scenario approach as a natural step to manage uncertain DERs while keeping a tunable risk level at the ex post stage. It could have direct benefits to both real-time and intra-day decision-making process.

7.2 ISO Dispatch In the last section, we introduced our thoughts on data-driven market design. In this section, we introduce our ideas on how to limit risks from both the generator and load sides for economic dispatch in a day-ahead market with the presence of demand response providers (DRPs). DRPs are intermediates between ISOs and endusers. On one hand, they collect demand response (DR) reduction from end-users by incentives such as discounted prices or gifts; on the other hand, they participate in the market as “virtual generators” providing their products that the DR reduces in order to obtain rewards from the ISO. By comparing several economic dispatch methods, we explain the impact when the DR amount committed by the DRP has not been achieved (either higher or lower), which may cause high uncertainty to the grid. The scenario approach method is also introduced as an alternative to making robust economic dispatch by selecting a certain group of samples in the uncertainty set. In this way, trade-off between expected generation costs and violation probability can be realized.

7.2.1 Introduction In this section, we investigate the uncertainty caused by the presence of demand response providers (DRPs) in a day-ahead market and how different methods of economic dispatch help to clear the market. The need for demand response (DR) has increased for the past few years since the integration of renewables such as

7.2 ISO Dispatch

263

solar and wind requires more flexibility from the system. At the same time, DR has become an important tool used by independent system operators (ISOs) to increase flexibility and reliability of the system [206]. As an intermediary between the ISO and end-users, DRPs are permitted to collect demand response commitments from end-users by various incentives and are treated and got paid as “virtual generators” in the clearing of wholesale market [455–458]. However, unlike conventional generators whose outputs can be precisely controlled by the generation companies (GenCos), demand reduction committed by the DRPs has high uncertainty in real-time [459, 460]. This is mainly because DRPs’ bid of reduction is a bottom-up aggregation from individual end-users, whose behavior is highly uncertain, incentive-sensitive, and hard to be predicted. Although there is already some research focusing on the near-term economic dispatch using stochastic [461–463] and robust [464, 465] models considering uncertainty of energy resources such as wind and electric vehicles (EVs), some research has discussed the uncertainty of DRP, its own characteristic, and impact on the market. Moreover, both stochastic and robust models have their disadvantages in nearterm dispatch for DR. Stochastic models operate well in normal conditions but are usually less optimal in extreme cases, while robust models might be too conservative to take DR bids. The introduction of a scenario approach would be an alternative solution to this dilemma [427, 466]. By removing several “bad cases” in the uncertainty set, this method enables the trade-off between performance (in the worst case) and feasibility (violation probability). Before formulating the problem mathematically, we define some notations as follows. .PG is the power generation for a certain generator; .PDR is the commitment of power reduction by DRP and .πDR bidding price by DRP; .δ is the realized demand response ratio for DRP; .α is the friction factor for re-dispatch cost; . is the constraint violation probability; .β is the confidence level of .; N is the total number of generators in the system; M is the total number of DRPs; .Nk is the number of scenarios; p is the number of samples removed; and d is the number of decision variables. This section is organized as follows. In Sect. 7.2.2, we formulate the model to describe the uncertainty with the presence of DRPs in day-ahead markets. Several economic dispatch methods, including scenario approach, are introduced to clear the market in Sect. 7.2.3. A numerical example is given in Sect. 7.2.4 to show the impact of DR uncertainty, how scenario approach works, as well as the preference among DRPs when congestion exists. We conclude our findings in Sect. 7.2.5.

7.2.2 Problem Formulation In this section, the process of clearing a day-ahead market with DRPs is illustrated. We describe briefly how a DRP decides its bidding strategy, including the price and the maximum amount of DR offered. In addition, we delve into the problem of

264

7 Design New Markets

Fig. 7.7 Three-layer financial structure of DAM with DRP

uncertainty caused by DR and use ERCOT’s data of a typical end-user to estimate its response behavior.

7.2.2.1

DRP as a Supplier in Day-Ahead Market

Nowadays, in some organized wholesale markets in the United States, DR takes on the roles as an energy resource, capacity resource, and ancillary resource [206, 455– 458]. There are two types of DR: price-based and incentive-based. The main difference between them lies in how the end-users are encouraged for demand reduction: the former uses direct dynamic price signals, and the latter uses more indirect and hybrid ways of incentives [467]. In this section, we concentrate on incentive-based DRP and its performance as an energy resource in the day-ahead market. As a new form of market participants, DRPs usually sign contracts with end-users and use direct load control or incentives to obtain commitments of DR during intervals of peak prices. Then, the utilities bid for energy based on the market and get payment according to the committed amount of DR multiplied by market clearing prices (treated as “virtual generators”). This is called “virtual” for two reasons: DRPs do not provide physical power generation, nor do they have an obligation to secure an energy supply for their customers. Figure 7.7 visualizes the three-layer financial structure in a day-ahead market.

7.2.2.2

Decision Curve for DRP

In this subsection, we briefly discuss how a typical DRP chooses its bidding price and the amount in the wholesale market, according to the behavior of its end-users. Unlike conventional generators, which usually have fixed cost curves depending on physical principles, DRPs have more flexibility in choosing both the price and DR amount.

7.2 ISO Dispatch

265

Fig. 7.8 DRP’s bidding strategy: (a) Inherent demand curve for end-users (b) DRP’s decision curve (c) Market supply curve with the DRP

An example of one typical DRP as a proxy of a group of end-users is provided below. Figure 7.8a shows the downward shape of the aggregated demand curve for those end-users. This demand curve indicates the maximum amount of power .PL the end-users are willing and able to purchase using certain electricity price .π. Since we assume that all the end-users are voluntary participants, they are exposed to a fixed retail price .πRR and their baseline .PL0 when not called by the DRP. It has been proven in [468] that by claiming the reward of .πs per unit of power reduction, the DRP sends a price signal of the same amount to end-users, which is added to the retail price .πRR , and encourages the power consumption to move from .PL0 to 1 .P (Fig. 7.8a). Thus, the maximum power reduction that the DRP can collect under L incentive .πs is max PDR1 = PL0 − PL1 .

.

(7.13)

We need to mention that the signal price .πs is arbitrarily chosen by the DRP. In addition, incentives to end-users are assumed to be the only variable cost for the DRP. Thus Fig. 7.8a can be redrawn to Fig. 7.8b, which shows the maximum DR max the DRP can provide underprice .π amount .PDR DR .

266

7 Design New Markets

Another significant assumption for the DRP is that it will bid its marginal cost as price into the market: 1 πDR = πs1 .

.

(7.14)

Comparing with Gencos, which usually submit their bidding price based on max . physical cost, a DRP is able to select its desired price .πDR and amount .PDR However, these two variables are internally correlated by the DRP’s decision curve (Fig. 7.8b). Moreover, the DRP may not get its bidding amount fully accepted: max submitted commitment in day-ahead market .PDR needs to be no higher than .PDR by the DRP. Figure 7.8c further shows a switch of supply curves with the presence of the DRP. As shown in the figure, according to economic principles, after the DRP has 1 ,.P max ) from its decision curve, the total supply curve over chosen a strategy (.πDR DR1 max . In 1 the price .πDR shifts horizontally rightward with the distance that equals .PDR1 Sect. 7.2.4, we will show how different bidding strategies affect the DR acceptance, together with realization cost and violation probabilities.

7.2.2.3

Uncertainty of DR

As mentioned before, there is research revealing the uncertainty in DR as energy resources [459, 460]. It is essential to know that either overreaction or underreaction in realization can cause an extra cost to the grid since the energy imbalance requires ramp up/down of conventional generators or even the use of expensive units to provide the shortage. Therefore, ways of estimating the distribution of demand response can be significant for the DRP, as well as for the ISO. Readers can skip this subsection without any interruption to the logic flow. In this section, we take the price-based DR data in [469] as an example to roughly estimate the level of uncertainty. As discussed in [468], there is essentially not much difference between incentive-based and price-based DR: the signal price of the former takes the same role as the dynamic price of the latter. The data in [469] contains the 9-month real-time power consumption of an anonymous consumer in the ERCOT area. This end-user can adjust its electricity usage according to real-time wholesale market prices. The author in this section has the following conclusions: • Moderate price (.144.4 :=

.

k PDR k ] Ek [PDR

,

(7.17)

so that the distribution of .δ has an average of 1. .δ k = 1 and indicates a certain realization k of DR equals the expected value. If .δ k > 1, the realized DR is higher than the expected DR, and there is an underestimation by the DRP; otherwise,

268

7 Design New Markets

there exists an overestimation. The test result (Fig. 7.9b) indicates that .δ is close to Gaussian distribution. Besides, we are interested in whether extremely high prices (.>1000$/MWh) have different magnitude impacts on DR during this time interval. As shown in Fig. 7.9c, there seems to be no clear relationship between extremely high prices and DR ratios. This can be proved by the correlation coefficient matrix

R(π, PDR ) =

.

1 0.034 , 0.034 1

(7.18)

which indicates a low correlation between these two parameters. So the classification of high prices is not necessary, and thus we can distinguish the power consumption in two groups: normal conditions (used to create the baseline) and DR (in high prices, represented by .δ). Through the fitting process, we can get an estimation of parameters in the normal distribution as .δ, .μˆ = 1, and .σˆ = 0.67. However, there are some key differences between the company used in [469] and DR provided by the DRPs such as: • The DRPs inform the end-users about forecasted high prices (or incentives) in advance to eliminate time lagging of demand response. • Unlike real-time price-based DR, DR programs provided by the DRPs usually have fewer DR events, which helps end-users focus on existing DR events and decrease the uncertainty in their response. Different from self-guided end-users, the DRP has much less uncertainty in their commitment because of its advertisement and interaction with end-users. However, the estimation value of uncertainty will still be used in Sect. 7.2.4 as a severe case of "highly uncertain commitment" and show how the level of uncertainty affects the decision of the ISO.

7.2.3 Economic Dispatch Methods in Day-Ahead Market In this section, we focus on the approach of one-step economic dispatch in dayahead market standing from the perspective of the independent system operator (ISO). Decision variable .ξ is defined as ξ := [PG PDR ],

.

(7.19)

where .PG represents the vector containing all conventional generations and .PDR for accepted DR amount for each DRP. In addition, as we have discussed in Sect. 7.2.2, demand response ratio .δ is introduced as a measurement of uncertainty in DR (Eq. (7.20)):

7.2 ISO Dispatch

269

δjk :=

.

k PDR,j

ˆ PDR,j

(7.20)

.

k PDR,j indicates a certain scenario k of realization of a certain DRP j when its DR ˆ . Thus .δ k represents the normalized value of commitment taken by the ISO is .PDR,j

.

j

k PDR,j in scenario k. If .δjk = 1, ∀k, there is no uncertainty in DRP k (deterministic). If .δjk < 1, the DR realization in scenario k is less than what the ISO expected; otherwise if .δjk > 1, the realization shows an overreaction of end-users since they reduce more than expected. In the rest of this section, sometimes we use vector .δ k to represent a scenario k, and .δjk is the j th element in .δ k .

.

7.2.3.1

Deterministic Model

The deterministic model assumes that there is no uncertainty in the DRP (.δjk = 1, ∀j, k), and the electricity supply and demand balance should be strictly satisfied: min l(ξ ), ξ

.

s.t. f (ξ ) ≤ 0,

(7.21)

ξ ≤ ξ ≤ ξ. l(ξ ) is the total cost function including the quadratic cost of conventional generators, as well as the cost of DRP (Eq. (7.22)):

.

l(ξ ) :=

.

N M   2 (ai PGi + bi PGi ) + PDR,j πDR,j .

(7.22)

j =1

i=1

In this definition, the cost of the DRP is expressed as the multiplication of the accepted amount of DR .PDR and its bidding price .πDR . This is consistent with the treatment to DRPs as “virtual generators" in the wholesale market. Inequality constraint .f (ξ ) consists of two parts: energy balance equation f1 (ξ ) :=

N 

.

i=1

PGi +

M 

PDR,j − PL = 0.

(7.23)

j =1

and power flow constraint f2 (ξ ) := H ∗ Pinj (ξ ) − F max ≤ 0

.

(7.24)

270

7 Design New Markets

7.2.3.2

Stochastic Model

The stochastic model focuses on minimizing the expected total generation cost considering the uncertainty of DR (represented by .δ). We use .δPDR instead of .PDR to represent the realization demand reduction. Assuming there are M number of possible scenarios of the reaction for end-users, and for each scenario k in DRP j , the cost in terms of that DRP can be expressed as Cjk (PDR,j ) = δjk PDR,j πDR,j

(7.25)

.

So the framework of the stochastic model is as follows:   M k . min Ek l(ξ, δ ) ξ

(7.26)

j =1

The objective function aims to minimize the expected value of cost function .l(ξ, δ k ), which has the form of l(ξ, δ k ) :=

.

N M   2 (ai PGi + bi PGi ) + δjk PDR,j πDR,j

(7.27)

j =1

i=1

Comparing Eq. (7.21) with Eq. (7.22), the expression of cost for DRP is changed slightly due to uncertainty of DR. In addition, this uncertainty might cause the energy imbalance in realization. As a result, in the stochastic model, we require that the energy adequacy should be no less than a certain value .γ : ⎛ ⎞ N M   .Prob ⎝ PGi + δjk PDR,j > PL ⎠ ≥ γ .

(7.28)

j =1

i=1

Equation (7.28) can be rewritten as Eq. (7.29) ⎛ f1 (ξ, δ) := − ⎝

N 

.

i=1

PGi +

M 

⎞ δj PDR,j ⎠ + PL ≤ 0, γ

(7.29)

j =1

γ

γ

where .δj refers to the cumulative distribution function .φ(x) where .φ(δj ) = γ for certain distribution of .δj . In addition, in all scenarios, the power flow constraints should be strictly satisfied: f2 (ξ, δ) := H ∗ Pinj (ξ, δ) − F max ≤ 0, ∀δ ∈ U.

.

(7.30)

7.2 ISO Dispatch

7.2.3.3

271

Robust Model

The robust model is widely used when the ISO pays more attention to minimizing the cost of the worst case of .δ. This min-max problem can be expressed as min max l(ξ, δ k ), δ k ∈U

ξ

.

s.t. f (ξ, δ) ≤ 0, ∀δ, ξ ≤ξ ≤ξ

l(ξ, δ) :=

.

(7.32)

j =1

i=1

f1 (ξ, δ) := −

.

N M   2 (ai PGi + bi PGi ) + δjk PDR,j πDR,j .



(7.31)

N 

.

PGi +

M 

i=1

 δjk PDR,j

+ PL ≤ 0, ∀δ ∈ U,

(7.33)

i=1

and constraints .f2 (ξ, δ) have the same form as Eq. (7.30). This min-max problem shown in Eq. (7.31) can be written as Eq. (7.34) [466] in order to be solved: min ξ,h

.

s.t.

h,

l(ξ, δ) ≤ h, ∀δ ∈ U, f (ξ, δ) ≤ 0, ∀δ ∈ U,

(7.34)

ξ ≤ ξ ≤ ξ. The decision variable h is added in Eq. (7.34), representing the upper bound of the cost function .l(ξ, δ k ) when .ξ is chosen from its feasible region and .δ k selected from its uncertainty set U . It can be proved mathematically that Eqs. (7.31) and (7.34) are equivalent. The number of constraints in Eq. (7.34) equals two times the j scenarios .δk in the uncertainty set. 7.2.3.4

Scenario Approach Model

The scenario approach introduced by Campi and Garatti [427, 466] is used initially as a refinement of the convex robust model to secure its solvability. In this model, only a finite number of constraints are selected so that this problem is solvable. This scenario approach has a general form of Eq. (7.35)

272

7 Design New Markets

min f (ξ ), ξ

.

s.t. g(ξ, δ k ) ≤ 0, k = 1, 2, 3, ..., Nk ,

(7.35)

ξ ≤ ξ ≤ ξ, where .Nk number of scenarios are randomly selected from the uncertainty set .δ k ∈ U . Campi et al. [466] proves that if .Nk satisfies the requirement Nk ≥

.

2 1 ln +d ,  β

(7.36)

where . as “violation parameter” and .β as “confidence parameter.” Then, with a probability of no more than .1 − β, the solution of the scenario approach satisfies all the constraints in U but at most . fraction of violation (.g(ξ ∗ , δ k ) ≥ 0). The highlight of this theorem lies in its generalization. Equation (7.36) always holds regardless of the interpretation of the objective function and constraints. In this problem, the only requirement is convexity. Further in [427] the author developed a theorem by removing a certain number of samples p from the original finite uncertainty set containing .Nk samples. The probability of violation . and confidence parameter .β satisfy

p+d −1 p

.

Nk i  (1 − )Nk −i ≤ β. i

p+d−1 

i=0

(7.37)

Equation (7.37) enables the trade-off between feasibility and performance. By removing more “bad cases” from the original uncertainty set, one can expect a better performance of the objective function, which may also cause the increase of violation probability. The highlight of this theorem is that one can remove the samples arbitrarily, to make the improvement of performance more efficient. Since the scenario approach is a general case of a robust problem, in our problem deriving from Eq. (7.34), so the scenario approach is formulated as min ξ,h

.

s.t.

h.

l(ξ, δ k ) ≤ h, k = 1, 2, 3, ..., (Nk − p),

(7.38)

f (ξ, δ k ) ≤ 0, k = 1, 2, 3, ..., (Nk − p), ξ ≤ ξ ≤ ξ. In the remainder of this section, Theorem (7.37) is used in our scenario approach model since it enables an arbitrary selection of scenario removal and usually has better efficiency in improving the performance of the objective function. By removing p scenarios from the uncertainty set (historical data), totally 2p

7.2 ISO Dispatch

273

constraints are removed since there are both .l(ξ, δ k ) ≤ h and .f (ξ, δ k ) ≤ 0 for a certain scenario .δ k . At the same time, .β and . are pre-defined values satisfying Equation (7.37). Compared to the robust model described in Eq. (7.34), intentionally dropping “bad scenarios” helps to improve the performance of the objective function. Thus, lower realization costs can be achieved. However, this may cause the violation of constraints; there are three possible outcomes according to Eq. (7.38): • violation of .f1 (ξ ∗ , δ q ) ≤ 0 indicates energy inadequacy in scenario .δ q . • violation of .f2 (ξ ∗ , δ q ) ≤ 0 means that the optimal solution .ξ ∗ causes an overflow in scenario .δ q . • violation of .l(ξ ∗ , δ q ) ≤ h means that .ξ ∗ is not the min-max optimal solution since in the worst scenario .δ q , another solution .ξ  makes an even lower cost than .ξ ∗ . Theorem (7.37) claims that it is .1 − β confident that any violation happens with a probability of less than ..

7.2.3.5

Realization Cost

In this subsection, the way of calculating realization cost is briefly discussed. Compared to dispatch cost, which is calculated ex ante (before real-time), realization cost is generally a more important measurement of the performance since it includes extra cost of re-dispatch when a certain scenario comes out and any constraint is violated. Realization cost in certain scenario .δ k is formulated as follows: ∗ l(PG , PDR , δ k ) + ω(PG , PG∗ ),

min PG

.

∗ s.t. f1 (PG , PDR , δ k ) = 0, ∗ , δ k ) ≤ 0, f2 (PG , PDR

(7.39)

PG ≤ PG ≤ PG . ∗ is not free to We assume that in the re-dispatch process, the DR amount .PDR adjust, and the only way to balance supply and demand as well as ensure power flow limits is to change the output of conventional generators in real-time. However, we assume a “friction” .ω as a penalty to this adjustment:

ω(PG , PG∗ ) := α

N 

.

2 ∗2 ∗ |(ai PGi + bi PGi ) − (ai PGi + bi PGi )|.

(7.40)

i=1

α is the friction factor .0 ≤ α ≤ 1. This term Eq. (7.40) describes the extra cost of moving from the original optimal dispatch .PG∗ to the new realization output .PG .

.

274

7 Design New Markets

This friction doesn’t have much physical meaning but a measurement of the distance between ex ante and ex post dispatch results.

7.2.4 Numerical Examples This section consists of two parts. In Sects. 7.2.4.1–7.2.4.4, the impact of uncertainty in DR among different economic dispatch methods, as well as the trade-off between performance and feasibility in scenario approach model, is discussed under the framework of a 3-bus system with only one DRP. Section 7.2.4.5 describes a general case using IEEE 14-bus system, with two DRPs having different bidding strategies in the market.

7.2.4.1

3-Bus System with One DRP

The structure and parameters of the 3-bus system are shown in Fig. 7.10. This system contains two conventional generators with quadratic cost functions, as well as one DRP on bus 2 controlling the load on that bus. Recalling Fig. 7.8a we assume a linear inherent demand curve for aggregated end-users, with initial load dπ 0 .P L = PL2 = 60, retail price .πRR = 100, and slope of demand curve . dPL = −5. Thus, the DRP is free to choose any nonnegative incentive price .πs and find the max from the demand curve. The incentive price .π equals bidding corresponding .PDR s price .πDR in the day-ahead market.

max = 120 PG1

G1

G2

max = 200 PG2 2 + 120P C(PG2) = 2PG2 G2

2 + 20P C(PG1) = 0.1PG1 G1

DRP

max = 50 L12

PDR2

X12 = 0.1

L2 X23 = 0.2

X13 = 0.2

L3 Fig. 7.10 3-bus system with the DRP

PL3 = 50

PL2 = 60

7.2 ISO Dispatch

275

10000

10000

9500

9500 realization cost

realization cost

Fig. 7.11 Cut-off Gaussian distribution for demand response ratio

9000 8500 8000 7500 7000

9000 8500 8000 7500

0.7 0.8 0.9

1 1.1 1.2 1.3 1.4 delta

7000

0.7 0.8 0.9

(a) Fig. 7.12 Robust realization cost in all scenarios (a) .πDR = 30

1 1.1 1.2 delta

1.3 1.4

(b) (b) .πDR = 90

In terms of uncertainty in DR, demand response ratio .δ is assumed to be cut-off Gaussian distributed with .μ = 1, σ = 0.1, δ max = 1.5, and .δ min = 0.5 (Fig. 7.11). For realization, the friction .α = 0.2. In total, our simulation uses .Nk = 800 samples of .δ that are randomly generated from the cut-off Gaussian distribution (Fig. 7.11) to form the historical data U . In the robust model, 100 samples are randomly chosen following the .3σ rule to form the uncertainty set .Ur . In terms of the scenario approach model, part of the “bad cases” is removed from U using a greedy algorithm in order to improve our solution. Since the scenario approach model enables arbitrary selection of which scenario to remove, multiple algorithms are available. One algorithm that targets removing the highest cost scenario (as shown in Fig. 7.12) tends to remove the lowest .δ in U . Another option is to remove the extreme case of .δ, meaning the highest value of .|δ − μ| will be removed first. For simplicity, we call the first algorithm “min” and the second “center” and “.20% min” as removing .20% of the scenarios using “min” method.

276

7 Design New Markets

Fig. 7.13 Dispatch result under different models. (a) Realization cost (b) DR bids accepted (c) Power adequacy probability (d) Power flow violation probability (e) Min-max violation probability

7.2.4.2

Simulation Results for Economic Dispatch

Simulation results of different types of economic dispatch models are shown in Fig. 7.13. Among all subgraphs in Fig. 7.13, the x-axis shows the different choices of bidding price .πDR from the DRP. As mentioned in Sect. 7.2.2, the DRP is free to choose any .πDR they desire; higher bidding price results in more profit per unit. However, it has higher probability to be rejected by the ISO if .πDR is higher than the marginal generation cost. A conclusion can be made that the robust model is usually the most conservative since it takes the first inflection of DR acceptance in (b) and has the highest realization cost in (a). However, this level of conservatism leads to the lowest violation of all constraints (energy adequacy (c), power flow limits (d), and minmax status (e)). In contrast, the deterministic model has the lowest realization cost as well as the highest DR amount accepted, but its ignorance of uncertainty results in a in huge violation probability in energy balance and power flow constraints. Stochastic is relatively neutral in both performance and violation. Unlike the robust model, deterministic and stochastic models don’t measure the min-max status since they concentrate on average cost rather than cost in the worst case. As an innovative technology, based on the robust model, scenario approach enables the trade-off between performance (lower dispatch/realization cost) and feasibility (higher violation probability). In addition, the algorithm of removing scenarios is essential and will influence both sides (performance and risk). According to (a) and (b), “center” is more radical, takes more DR, and has a lower cost. In

7.2 ISO Dispatch

277

Table 7.1 Dispatch result under .πDR = 100 DR 20.0 20.0 5.3 17.5 11.8

Balance Vio. 0.50 0.23 0 0.09 0.20

b

9300 9200 9100 9000 8900 8800 8700

Realization Cost

Average Realization Cost

a

Real. cost 8320.2 8501.6 8910.5 8379.0 8530.9

8600 8500 8400 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 Violation Probability

PF Vio. 0.49 0 0 0.09 0.19

9400

Min-max Vio. .N/A .N/A

0 0.08 0.01 1

upper bound violation prob. realization cost

9200

0.8

9000

0.6

8800

0.4

8600

0.2

8400 0

Violation Probability

Model Deterministic Stochastic Robust Scenario(min) Scenario(center)

0 100 200 300 400 500 600 700 Number of Samples Removed

Fig. 7.14 Trade-off between cost and violation in scenario approach. (a) Realization cost vs. violation probability (b) Realization cost and violation probability vs. number of removed samples

addition, it performs better than the “min” algorithm in both energy balance and power flow limits. However, the core advantage of “min” lies in the performance in worst case: according to (e), scenario approach with a “min” algorithm works even better than robust, whereas in contrast, “center” acts poorly in this part (.8%). Table 7.1 further proves our conclusion under .πDR = 100.

7.2.4.3

Trade-Off Between Feasibility and Performance

In this subsection, we concentrate on the trade-off between feasibility and performance for the scenario approach. “min” algorithm is used here, and we take a snapshot of the bidding price .πDR = 100. Figure 7.14a is consistent with our intuition that to obtain lower realization cost, the feasibility of the solution needs to be sacrificed, and there will be a higher violation probability. Moreover, Fig. 7.14b illustrates our finding in another aspect, where two dependent variables, realization cost and violation probability, are described as a function of the number of scenarios removed p (given .Nk = 800). With the increase of p, realization cost plunges immediately but becomes stable when .p > 600, which indicates a decreasing efficiency in improving the cost. On the other hand, violation probability is close to a linear increase. This observation could inspire the decision-

278

a

7 Design New Markets

b

9400

9000 8800 8600 8400

Deterministic Stochastic Robust Scenario(20%/center) Scenario(20%/min)

8200 8000

25 Deterministic Stochastic Robust Scenario(20%/center) Scenario(20%/min)

20 Total DR accepted

Realization Cost

9200

0

50

100

15 10 5 0

150

-5

0

DR bidding price

50 100 DR bidding price

150

Fig. 7.15 Dispatch result under different models (.σ = 0.67). (a) Realization cost (b) DR amount accepted

makers about how to choose an optimal p in the scenario approach model. Also, the upper bound (red line) is given according to the theorem of Eq. (7.37)), and it’s obvious that our simulation result doesn’t conflict with the theory of [427].

7.2.4.4

Influence of δ on DR Acceptance

In this subsection, we briefly discuss how the distribution of .δ impacts the acceptance of DR among different models. Compared to our previous assumption in Sects. 7.2.4.1–7.2.4.3, the distribution (.μˆ = 1, σˆ = 0.67, δ max = 1.8 and min = 0.2) derived from Sect. 7.2.2 gives much higher uncertainty in DR. The .δ simulation result is shown in Fig. 7.15. Compared to Fig. 7.13a, b, Fig. 7.15a, b shows a shrinkage of DR acceptance and an increase of realization cost. This phenomena results from higher uncertainty in .δ, since a higher chance of energy imbalance; thus more conventional generators are considered by the ISO. We can conclude that in terms of DRP, it is essential to have more attendance and more regulated behavior for its end-users, so that more DR is accepted in the wholesale market.

7.2.4.5

IEEE 14-Bus System with Two DRPs

In this subsection, a more complicated IEEE-14 bus system with two DRPs is analyzed (Fig. 7.16). We assume that the two largest loads (bus 3 and 4) have DRPs that can help reduce the demand with .πRR = 100 and slope the demand curve dπ . dPL = −5. Both DRPs have the same distribution of .δ, as discussed in Sect. 7.2.4.1, but they are independent of each other. Line constraint .L24 = 30, and friction .α = 0. In advance of conducting a scenario approach model simulation, how to select “bad scenarios” needs to be discussed, as they are relevant to a 2-D case with two

7.2 ISO Dispatch

279

13

G GENERATORS 12 C SYNCHRONOUS COMPENSATORS

14 11

10 9

Gen. 1 G

Gen. 4

6

7

C

1

Gen. 5 C 8

4 5

2 G

Gen. 2

3 C

Gen. 3

Fig. 7.16 IEEE 14-bus test system

independent distributions. Here we share a similar idea of “center” by removing scenarios that total output of DRPs: r PDR = δ1 PDR1 + δ2 PDR2

.

(7.41)

is far from the expected value .

ˆ = PDR1 + PDR2 . PDR

(7.42)

r − P ˆ |. In other words, the “worst scenarios” should have the largest value of .|PDR DR It is worth noticing that this algorithm is certainly not the only way, nor the best way in selecting scenarios. However, it can do some help in improving the performance of cost function. The simulation result shown in Fig. 7.17 reflects a robust but conservative method. It shows the earliest inflection of DR acceptance and the highest cost.Two scenario approach models (20 and .50%) lie in the middle of both realization cost and DR acceptance level. To discuss how the location of DRPs affect the decision of economic dispatch, we keep the bidding price .πDR1 = πDR2 in order to find how an ISO would choose

280

a

7 Design New Markets

b

Realization Cost among Economic Dispatch Methods Deterministic

Total DR Accepted among Economic Dispatch Methods Deterministic Stochastic

Stochastic

9800

Robust

Robust

Scenario(20%)

40

Scenario(50%)

Scenario(50%)

9600

Total DR accepted

Realization Cost

Scenario(20%)

9400 9200 9000 100

30 20 10 0 80

50 DR2 bidding price 0 0

20

40

60

-10 0

80

DR1 bidding price

60 20

40 40

60

20 80 0

DR2 bidding price

DR1 bidding price

Fig. 7.17 IEEE 14-bus system with two DRPs. (a) Realization cost (b) DR bids accepted Fig. 7.18 DR acceptance ratio

1.5

acceptance ratio

DRP1 DRP2

1

0.5

0

0

10

20

30 40 bidding price

50

60

70

from two DRPs with the same price, while one helps to ease the congestion but the other not. Figure 7.18 shows a clear different treatment between these two DRPs. This figure shows the percentage of DR bidding amount accepted by the ISO. Acceptance ratio 1 suggests .100% of the bid is accepted. With the increase of bidding price for both DRPs, it’s not cost-effective to use DR after the price exceeds the marginal cost of conventional generators. However, the turning points for the two DRPs are quite different. DRP2, which helps to ease the congestion of Line 24, is preferred by the ISO at the price .πDR2 between 56 and 70, when DRP1 already gets rejected with the same bidding price. Our analysis shows that the location of DRP is quite important when there exists line congestion. DRP bids that can help to ease line congestion are preferred by the market and have a higher chance to get accepted.

7.2 ISO Dispatch

281

7.2.5 Conclusion This section introduces the formulation of economic dispatch in a day-ahead market with the presence of demand response providers (DRPs). Unlike conventional generators, DRPs provide demand response by bottom-up aggregation, which causes high uncertainty of DR because of the behavior of end-users. The scenario approach model is introduced in order to have better performance than the robust model in most cases while keeping a slightly lower level of optimality in the worst scenario. In addition, scenario approach is more flexible since it enables the tradeoff between performance and feasibility, compared with previous methods such as deterministic, robust, and stochastic. Simulation results show that uncertainty of DR will cause a fewer number of bids from DRPs accepted in all dispatch methods (stochastic, robust, and scenario approach). Though depending on the algorithm of selecting removed scenarios, the scenario approach provides an improvement in expected realization cost. In addition, the decision-maker is free to choose the number of “bad scenarios” in order to have a lower realization cost while sacrificing the violation probability of energy balance, power flow, and min-max constraints. In addition, the ISO prefers accepting the DR bids, which can help to ease the congestion in the system. Future work will concentrate on a deeper study of the choice of “worst scenarios” in the scenario approach model. Strategies of DRPs in bidding under the current framework will also be considered.

Chapter 8

Streaming Monitoring and Control for Real-Time Grid Operation

8.1 Learning the Network Distributed energy resources (DERs) such as photovoltaic (PV), wind, and gas generators are connected to the grid more than ever before, which introduces tremendous changes in the distribution grid. Due to these changes, it is important to understand where these DERs are connected in order to sustainably monitor and control the distribution grid. But the exact distribution system topology is difficult to obtain due to frequent distribution grid reconfigurations and insufficient knowledge about new components. In this section, we propose a methodology that utilizes new data from sensor-equipped DER devices to obtain the distribution grid topology. Specifically, a graphical model is presented to describe the probabilistic relationship among different voltage measurements. With power flow analysis, a mutual information-based identification algorithm is proposed to deal with tree and partially meshed networks. Simulation results show highly accurate connectivity identification in IEEE standard distribution test systems and the Electric Power Research Institute (EPRI) test systems.

8.1.1 Introduction The electric industry is undergoing structural changes as distributed energy resources (DERs) are integrated into the distribution grid. DERs are small power sources such as photovoltaic (PV) and wind generators (renewable generation), energy storage devices (consumption flexibility), and electric vehicles (vehicleto-grid services). As they have the potential to offer end-consumers more choices, cleaner power, and more control over energy bills, the deployment of DER is gaining momentum on a worldwide scale [470]. For example, the SunShot Vision Study

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Xie et al., Data Science and Applications for Modern Power Systems, Power Electronics and Power Systems, https://doi.org/10.1007/978-3-031-29100-5_8

283

284

8 Streaming Monitoring and Control for Real-Time Grid Operation

estimates that solar electric systems will scale rapidly and potentially generate up to 14% of the nation’s total electricity demand by 2030 and .27% by 2050 [471, 472]. While adding new capabilities, the DER proliferation raises great concern about the resilience of the power grid. For example, the power that used to flow in one direction on the distribution system—from central power plants to customers—is now also flowing back from customers to the distribution grid. Dynamic variation of voltage profiles, voltage stability, islanding, line-work hazards, and distribution system operating at stability boundaries are also troubling the distribution grid operation [473]. Such problems are forcing power system engineers to rethink the architecture of the power grid and transition from the traditional top-down approach to a bottom-up design since most changes happen at the customer level [474]. As a result, the concept of distribution automation is created to provide an intelligent control over distribution grid levels and customer levels for sustainable grid operation [475]. To achieve distribution automation, reliable and continuous monitoring of DERs is needed as it allows operators to know where the reverse power flow may happen. For example, when strategically located, DERs could potentially defer or substitute for conventional infrastructure [470]. So, understanding existing connectivity serves as a key for calculating locational benefit analysis and local grid capacity analysis for future planning purposes [470]. Further, contingency analysis needs DERs connectivity, as they help answer “what if” questions, e.g., what if solar capacity is increased by .20% [476]? Finally, rooftop solar panels may change the revenue of utilities, so some utilities want to know where they are located [477]. A major challenge of DER monitoring is that the topology of the distribution grid is difficult to obtain. For example, many substations rely on a physical topology map, which may be outdated. This is because unlike the transmission power grid [478, 479], a distribution grid can have frequent topology changes [480–483]. Such a topology change can be once a season [484] or once 4 weeks for MV grids [485]. If one needs to coordinate PV, the frequency can be once per 8 h [486]. Some known changes are the results of routine reconfigurations, e.g., deliberate and dynamic changes to the network for best radial topology from the potentially meshed distribution topology in city networks [487, 488]. Many other changes caused by outages or manual maintenance may be unknown. For example, a field engineer may not report immediately about topology changes after repairing part of a network [23]. Additionally, even if a topology change is known or detected in the local area, the exact location may be hard to obtain [489]. Finally, although topology sensors are being used, they are placed only at special locations due to budget constraints [490]. Furthermore, there will be massive ad hoc connections of plug-and-play DER components. For some well-maintained feeders, identifying the connectivity of DERs can be best obtained based on switch or breaker information. However, for many feeders, switch and breaker information near these DERs may not be available. This is because many of the DERs, e.g., distributed generators, do not belong to the utility. In such a case, while the utilities may not own the DER devices,

.

8.1 Learning the Network

285

they can try to obtain the sensor data from the manufacturers such as Solar City and detect the connectivity by themselves (utilities). Mathematically, topology identification can be achieved via voltage estimation [491]. But such an approach will fail in a distribution grid with limited sensor data or relatively frequent topological changes [492–495]. There are also methods dedicated to the distribution grid, based on different assumptions. For example, [496–498] assume the availability of all switch locations and search for the right combination. State estimation-based methods [499, 500] and power flow-based methods [501] assume the availability of an admittance matrix and infrequent topology change. Unfortunately, these assumptions are improper in newly added or reconfigured distribution networks, because neither the knowledge of circuit breakers nor the information of the admittance matrix may be available. Fortunately, smart sensors are continuously being deployed across distribution systems [17, 502], thanks to recent advances in communications, sensing, and targeted government investments. Some sensor examples include advanced metering infrastructure (AMI) and load side micro-PMUs (.μ-PMUs) [503]. Based on these advances, statistical methods can be used to correct a city’s network topology [487, 488]. Additionally, private industries are integrating sensing capabilities into DERs for monitoring purposes, e.g., Solar City’s photovoltaic systems [504], commercial and residential charging systems, and in-home appliances such as thermostats. The problem is that such data streams are not integrated or utilized by distribution system operators to improve system performance, e.g., DER topology identification. In this section, we propose to utilize such data streams to reconstruct DERs’ connectivity for distribution automation [505, 506]. The work uses two scenarios. One is the chronic lack of topology state observation. In this case, a partial network topology may be unknown or incorrect, so one needs to confirm the existing network structure or correct topology errors. The other scenario is to understand the unknown connectivity of newly installed DERs to the existing grid. For example, while the utilities may not own the DER devices, they can try to obtain the sensor data from manufacturers such as solar city for connectivity identification. Specifically, we build a probabilistic graphical model of the power grid to model the dependency between neighbor buses. Subsequently, we formulate the topology identification problem as a probability distance minimization problem via the Kullback-Leibler (KL) divergence metric. We prove that the resulting mutual information-based algorithm can find the optimal topology connection [507] if current injections are assumed to be approximately independent. As such, an algorithm relies on sensor data only and is computationally inexpensive (no need to consider circuit breaker conditions and system admittance matrix [493]), and many plug-and-play devices can be identified correctly. Finally, we generalize the method in the tree network to deal with loops by using conditional mutual information. The performance of the data-driven method is verified by simulations on the standard IEEE 8- and 123-bus distribution test cases [508–510] and the Electric Power Research Institute (EPRI) 13, 34, 37, and 2998-bus systems. Two datasets of .123,000 residential households (PG&E) in North California and 30 houses

286

8 Streaming Monitoring and Control for Real-Time Grid Operation

in upper Austria are used [511]. Simulations are conducted via the MATLAB Power System Simulation Package (MATPOWER) [509, 510] and OpenDSS [512]. Simulation results show that, when provided with enough historical data, the datadriven topology estimate outperforms the estimates from the traditional approaches. The rest of the section is organized as follows: Sect. 8.1.2 introduces the modeling and the problem of data-driven topology identification. Section 8.1.3 uses proof to justify the applicability of the mutual information-based algorithm for distribution system topology re-configuration, as well as an illustrated, detailed algorithm. Section 8.1.4 evaluates the performance of the new method, and Sect. 8.1.5 is the conclusion.

8.1.2 Probabilistic Modeling of Network Voltages via Graphical Modeling In this section, we first describe sensor measurements as random variables. With such a definition, we model the distribution grid via a probabilistic graphical model. Based on this modeling, we formally define the problem of data-driven topology identification. For the modeling, a distribution network is defined as a physical graph .G(V, E) with vertices .V = {1, · · · , n} that represent the buses (PV generator, storage device, and loads) and edges .E that represent their interconnection. It can be visualized as the physical layer in Fig. 8.1. To utilize the time series data generated by smart meters, we construct a cyber layer for grid topology reconstruction, where each

Fig. 8.1 Cyber-physical networks

8.1 Learning the Network

287

node is associated with a random variable .Vcyber . Therefore, a voltage measurement at bus i and time t can be represented as .vi (t). We use a joint probability distribution to represent the interdependency (edges .Ecyber ) between different voltage random variables (.Vcyber ) p(v) = p(v2 , v3 , . . . , vn )

.

= p(v2 )p(v3 |v2 ) . . . p(vn |v2 , . . . , vn−1 ),

(8.1)

where .vi (.i > 1) represents the voltage measurement at bus i. Bus 1 is a reference bus, so it is fixed as a constant with a unit magnitude and a zero phase angle. If the quantization of a bus voltage has m levels, one will need to store and manipulate a discrete probability distribution for .mn−1 values. This is computationally expensive for calculating marginal distribution or maximal joint distribution in large systems. Therefore, we would like to approximate a distribution using a reasonable number of specifying values that capture the main dependence [513] (edges .E) in the physical layer. For example, we can approximate the true distribution .p(v) with another simplified distribution .pa (v). In the next section, we will show that such an approximation is exact without approximation error when the current injections are independent. To approximate a probability distribution .p(v) with another probability .pa (v), one can minimize the Kullback-Leibler (KL) divergence, which measures the difference between two probability distributions [514]. D(p||pa ) = Ep(v) log

.

p(v) , pa (v)

(8.2)

where .pa is constrained to be with a tree structure or a loopy structure in this work. Remark 8.1 We will show later that minimizing the KL divergence between .p(v) and .pa (v) is equivalent to maximizing the data likelihood.

8.1.2.1

Problem Definition

The problem of distribution grid topology reconstruction is defined as follows. • Problem: data-driven topology reconstruction. • Given: a sequence of historical voltage measurements .vi (t), t = 1, . . . , T and an unknown or partially known grid topology, as shown in Fig. 8.3. • Find: the local grid topology .E in the dashed box, e.g., Fig. 8.3, based on .D(p||pa ).

288

8 Streaming Monitoring and Control for Real-Time Grid Operation

8.1.3 Mutual Information-Based Algorithm for Distribution Grids While a complete description of a probabilistic graphical model requires a maximum complexity of .mn−1 , a tree-dependent probabilistic graphical model only requires .m(n − 1) specifiers, which generates a significant saving due to the Markov property. This is because, under the assumption that any two non-descendant nodes are independent conditioning on their parents, a Bayesian network suffices to determine a probability distribution. Therefore, we can describe the cyber layer relationship as a product of pairwise conditional probability distribution pa (v) =

n 

.

p(vi |vpa(i) ),

(8.3)

i=2

where .vpa(i) is the (random) variable designated as the direct predecessors or parent variable node of .vi in some orientation of the tree. In Fig. 8.2, we show the mutual information of pairwise current variables. The relatively small mutual information means that the current can approximate that the currents are independent with some approximation error (Fig. 8.3). In addition to this intuitional plot, we will show why .p(v) in (8.3) can be approximated. Then, we will show an optimal topology identification method based on this assumption. Specifically, we show that if current injections are approximated as independent, historical data can be used to find the optimal approximation distribution .pa (v) of the true distribution .p(v). Further, such an approximation leads to a computationally feasible solution for finding the topology in the physical layer.

Fig. 8.2 Pairwise mutual information of current phasor

900 800 700

Counter

600 500 400 300 200 100 0

0

0.2 0.4 0.6 0.8 Pairwise Mutual Information of Current Phasor

8.1 Learning the Network

289

Fig. 8.3 Cyber layer of the 13-bus system.

Lemma 8.1 If current injections are approximated as independent, then voltages are conditionally independent in a tree network, given their parent nodes’ voltage information. Proof In a distribution system, voltages are usually within the nominal range. Therefore, current injections play a major role in adapting the power injections to balance loads. If users’ loads can be approximated as mutually independent, we would like to approximate current injections to be independent for the following derivations. Let the current injections .Ii ∈ C, .i ∈ {1, 2, · · · , n} be modeled as independent random variables. Let .yij denote the line admittance between bus i and bus j . Given the reference bus value as .V1 = v1 , we can find the relationship between voltages and currents for each bus i in Fig. 8.4 except node m: Vi =

.

Ii + v1 y1i , yii

i = 2, · · · , n.

(8.4)

With the assumption that current injections, e.g., .Ii , are independent with others, Vi |V1 and .Vj |V1 are independent for .i, j ∈ 2, . . . , n and .i = j . Now, we analyze a more general tree network with an additional node m connected to bus n, which is shown in the dashed line circumference circle of Fig. 8.4. The currents and voltages now have the following relationship:

.

290

8 Streaming Monitoring and Control for Real-Time Grid Operation

Fig. 8.4 The (n+1)-bus system



⎤ ⎡ I1 y11 −y12 ⎢ I2 ⎥ ⎢−y12 y22 ⎢ ⎥ ⎢ ⎢ . ⎥ ⎢ .. .. .⎢ . ⎥ = ⎢ . ⎢.⎥ ⎢ . ⎣ In ⎦ ⎣−y1n 0 Im 0 ...

⎤⎡ ⎤ . . . −y1n 0 V1 ⎥ ⎢ ... 0 0 ⎥ ⎢ V2 ⎥ ⎥ .. . . .. ⎥ ⎢ .. ⎥ . ⎢ . ⎥ . . . ⎥ ⎥⎢ ⎥ . . . ynn −ynm ⎦ ⎣ Vn ⎦ . . . −ynm ymm Vm

Given .V1 = v1 , we have ⎡ ⎤ y22 I2 + v1 y12 ⎢ 0 ⎢ ⎥ ⎢ .. ⎢ ⎥ ⎢ .. . .⎢ ⎥=⎢ ⎣In + v1 y1n ⎦ ⎢ . ⎣ 0 Im 0 ⎡

⎤⎡ ⎤ 0 ... 0 0 V2 ⎢ V3 ⎥ y33 . . . 0 0 ⎥ ⎥⎢ ⎥ .. . . . .. ⎥ ⎢ .. ⎥ . ⎢ ⎥ . .. . . ⎥ ⎥⎢ . ⎥ ⎦ 0 . . . ynn −ynm ⎣ Vn ⎦ 0 . . . −ynm ymm Vm

For bus 2 to bus .n − 1, conditional independence proof in (8.4) still holds. But for bus n, it is now connected to bus m. To explore the conditional dependence between bus n and bus .i ∈ {2, · · · , n − 1}, we use the following derivation: In + v1 y1n = ynn Vn − ynm Vm , .

.

Im = −ynm Vn + ymm Vm .

(8.5) (8.6)

Since bus m only connects with bus n, .ymm = ynm . Further, .ynn = ynm + y1n by the definition of admittance matrix. Hence, we can combine (8.5) and (8.6): In + v1 y1n + Im = ynn Vn − ynm Vm − ynm Vn + ymm Vm

.

= y1n Vn , Vn =

In + Im + v1 y1n . y1n

(8.7)

Since the current injections are independent, e.g., .Ii ⊥ Ij for .2 ≤ i ≤ n − 1 and j ∈ n, m, .In + Im and .Ii are independent as well. Therefore, .Vn |V1 is independent with .Vi |V1 due to (8.4) and (8.7).

.

8.1 Learning the Network

291

This proof can be extended easily to the case where bus .i ∈ {2, · · · , n − 1} is connected to one more bus beside bus 1. In more general cases, where each bus has more than two buses connected, we can aggregate these buses into a single bus and use the proof above to show the conditional independence of voltages. Therefore, we can conclude that .Vi ⊥Vj |Vk , if .Vi and .Vj share a common parent .Vk in the tree network.   After showing that .Vi ⊥Vj |Vk , Lemma 8.2 shows that a mutual information-based maximum weight spanning-tree algorithm is able to minimize the approximation error by finding the best-fitted topology in a short time. Lemma 8.2 Recall the KL divergence [514] below: D(ppa ) = Ep(v) log

.

p(v) . pa (v)

(8.8)

If the voltages are conditionally independent, a mutual information-based maximum weight spanning a tree algorithm can find the best-fitted topology, such that the approximation error (KL divergence) is minimized. Proof See [515] for proof.

 

With Lemma 8.1 and Lemma 8.2, we present our main result in the following theorem. Theorem 8.1 In a radial distribution power grid, mutual information-based maximum spanning tree algorithms find the optimal approximation of .p(v) and its associated topology connection if current injections are approximated as independent. Proof By using Lemma 8.1, we know that if the current injections are approximated as independent, then voltages are conditionally independent in a tree network. When the voltages are conditionally independent in a tree structure by Lemma 8.2, a mutual information-based maximum weight spanning tree algorithm can find the best-fitted topology, such that the approximation error (KL divergence) is minimized. Therefore, we can use such a maximum spanning tree algorithm to find the power grid topology.   In Sect. 8.1.4, we will use numerical examples to demonstrate Theorem 8.1. Specifically, by using the mutual information as the weight, a maximum weight spanning tree finds a highly accurate topology of a distribution grid, making the proposed algorithm suitable for the task of smart grid topology identification with DERs.

292

8.1.3.1

8 Streaming Monitoring and Control for Real-Time Grid Operation

Why Mutual Information-Based Algorithm Works?

One key step to finding the topology is to compare the mutual information. We use the following lemma to illustrate why such a concept is important to find the correct topology. Lemma 8.3 In a distribution network with a tree structure and conditional independence assumption in Theorem 1, .I (Vj , Vi ) ≥ I (Vj , Vk ) given .j, k ∈ pa(i), .k ∈ / r(j ) and .j ∈ / pa(k). Proof By the definition of mutual information, it has the following chain rule property [514]: I (Vi , Vj , Vk ) = I (Vi , Vj ) − I (Vi , Vj |Vk )

.

= I (Vj , Vk ) − I (Vj , Vk |Vi ). Since .Vj |Vi is independent with .Vk |Vi , the conditional mutual information I (Vj , Vk |Vi ) is zero. Then,

.

I (Vi , Vj ) = I (Vj , Vk ) + I (Vi , Vj |Vk ).

.

Since mutual information is always nonnegative, I (Vj , Vi ) ≥ I (Vj , Vk ).

.

 

This means that a node will always have larger mutual information than the mutual information between this node and a node further away. Remark 8.2 Notice that there is a special case when the tree structure topology is hard to detect by our proposed algorithm. Such a topology is with symmetric topology and loads. For example, let bus 1 have two children: bus 2 and bus 3. If the two branches 1-2 and 1-3 are with the same impedance, and if the loads on bus 2 and bus 3 are the same all the time, then our algorithm will have poor performance. This is because the voltages of bus 2 and bus 3 are the same all the time, and this maximizes their mutual information, although they are not connected. However, such an instance requires the same impedance, the same loads, and symmetric topology. This rarely happens in practice. Next, we present Algorithm 8. Specifically, steps 6–16 build. Steps 6–16 build a maximum spanning tree using pairwise mutual information as the weight. This algorithm is modified from the well-known Kruskal’s minimum weight spanning tree algorithm [516, 517], which has a running time of .O((n − 2)log(n − 1)) for a radial distribution network with n buses. Therefore, the proposed algorithm can efficiently reconstruct the topology with low computational complexity.

8.1 Learning the Network

293

Algorithm 8 Tree structure topology reconstruction Require: vi (t) for i = 2, . . . , n, t = 1, . . . , T for i, j = 2, . . . , n do Compute mutual information I (Vi , Vj ) based on vi (t). end for Sort all possible bus pair (i, j ) into non-increasing order by I (Vi , Vj ). Let E˜ denote the sorted set. Let Eˆ be the set of nodal pairs comprising the maximum weight spanning tree. Set Eˆ = ∅. for (i, j ) ∈ E˜ do if a cycle is detected in Eˆ ∪ (i, j ) then Continue else Eˆ ← Eˆ ∪ (i, j ) end if ˆ == n − 2 then if |E| break end if return Eˆ end for

Fig. 8.5 Loop network of eight buses

8.1.3.2

Adaptation for Distribution Grid with a Loop

In the previous subsection, we showed a mutual information-based that can identify tree-structure topology. In practice, a ring structure exists for robustness, e.g., Fig. 8.5 [487, 488]. In a ring structure, there must be a node with two neighbors. In a graphical model, this means that there is one child with two parents. Similar to the previous section, probability distribution .pa can be written as pa (v) = p(vn |vpa(n),1 , vpa(n),2 )

n−1 

.

p(vi |vpa(i) ),

i=2

where .{pa(n), 1} and .{pa(n), 2} represent the two parent nodes of .Vn . Then, we have D(ppa ) =



.

=



p(v) log

p(v) pa (v)

p(v) log p(v) −



p(v)

n−1 i=2

log

p(vi , vpa(i) ) p(vi )p(vpa(i) )

294

8 Streaming Monitoring and Control for Real-Time Grid Operation

Algorithm 9 Topology reconstruction with a loop Require: vi (t) for i = 2, . . . , n, t = 1, . . . , T if there is a loop in the network then Compute I (Vn ; Vpa(n),1 , Vpa(n),2 ) with all possible values of n. Remove the branch with the maximum value so that the remaining network forms a tree. end if for i, j = 2, . . . , n − 1 do Compute the mutual information I (Vi , Vj ) based on vi (t). end for Sort all possible bus pair (i, j ) into non-increasing order by I (Vi , Vj ). Let E˜ denote the sorted set. Let Eˆ be the set of nodal pairs that comprise the maximum weight spanning tree. Set Eˆ = ∅. for (i, j ) ∈ E˜ do if a cycle is detected in Eˆ ∪ (i, j ) then Continue else Eˆ ← Eˆ ∪ (i, j ) end if ˆ == N − 3 then if |E| break end if return Eˆ with the detected loopy branch if there is any end for

− −



p(v) log

p(vi )

p(vn , vpa(n),1 , vpa(n),2 ) p(vn )p(vpa(n),1 , vpa(n),2 )

n

p(vi ).

i=2

Therefore, D(ppa ) = −

n−1

.

I (Vi ; Vpa(i) ) − I (Vn ; Vpa(n),1 , Vpa(n),2 )

i=2

+

n

H (Vi ) − H (V2 , · · · , Vn ).

(8.9)

i=2

Thus, for each extended mutual information .I (Vn ; Vpa(n),1 , Vpa(n),2 ), we search for a maximum weighted spanning tree and compute the total mutual information

n−1 . i=2 I (Vi ; Vpa(i) ) + I (Vn ; Vpa(n),1 , Vpa(n),2 ). Then, the total mutual information is compared, and the largest one can be chosen. Finally, one can further generalize the proof to include more than one loop. To summarize our algorithm for the simulation, we use the flow chart in Fig. 8.7 and the algorithm description in Algorithm 9.

8.1 Learning the Network

295

Due to the loop structure, the computational cost involved in Eq. 8.9 is O(n(n − 1)log(n − 1)).

.

8.1.3.3

Adaptation for Smart Meter with Voltage Magnitude Data

As smart meters are only loosely synchronized, accurate phase angle information is hard to obtain for mutual information computation. Fortunately, the angle variance is small in a distribution grid. Therefore, we can use the chain rule of mutual information below for an approximation of .I (Vi ; Vj ) when only the voltage magnitude measurements are available. In order to analyze such an approximation, we decompose the mutual information into four terms and compare their contributions to .I (Vi ; Vj ): I (Vi ; Vj ) = I |Vi |,  Vi ; |Vj |,  Vj ,

= I |Vi |; |Vj |,  Vj + I  Vi ; |Vj |,  Vj | |Vj | ,

= I |Vi |; |Vj | + I |Vi |;  Vj |Vi |   

.

+I 



(8.10)

A

Vi ; |Vj | |Vi | + I  Vi ;  Vj |Vi |, |Vj | .      B

C

Figure 8.6 shows a numerical comparison of all possible pairwise mutual information and its sub-components in an IEEE 123-bus simulation. In Fig. 8.6, terms A, B, and C are relatively small when compared to .I |Vi |; |Vj | . This is because voltage phase angles are with very small changes in the distribution grid, thus having less information than the voltage magnitudes. Subsequently, one can use .I |Vi |; |Vj | for comparison in the topology identification process instead of using .I (Vi ; Vj ). In the next section, we will verify this idea, where we test (a) when the phase angle is available and (b) when such information is unavailable. In Fig. 8.7, we present the flow chart for subsequent simulations.

8.1.3.4

Limitations of the Method

There are inherent limitations of a data-driven approach. For example, the proposed method may not work properly when (1) there is no historical data, (2) the power network is highly meshed with many loops (computationally expensive), (3) the recorded data is of low quality (e.g., quantized data for billing purpose), and (4) the loads are identical at all nodes. While (4) is rare, it happens when the nodes in a network are connected to the same solar panels but without power consumptions. This causes the currents to be similar to each other, violating our assumption in Lemma 8.1.

296

8 Streaming Monitoring and Control for Real-Time Grid Operation

14 (10) Term A in (10) Term B in (10) Term C in (10)

12

Mutual Information

10 8 6 4 2 0 -2

0

5000 10000 Ordered Mutual Information Index

15000

Fig. 8.6 Pairwise mutual information Fig. 8.7 Flow chart of the proposed algorithm

8.1.4 Simulations The simulations are implemented on the IEEE PES distribution networks for IEEE 8-bus and 123-bus systems [508, 518] and the Electric Power Research Institute (EPRI) 13, 34, 37, and 2998-bus systems. In each network, the feeder bus is selected as the slack bus. The historical data has been preprocessed by the MATLAB Power System Simulation Package (MATPOWER) [509, 510] and OpenDSS [512]. To simulate the power system behavior in a more realistic pattern, the load profiles from Pacific Gas and Electric Company (PG&E) and the “ADRES-Concept” Project, a venture of the Vienna University of Technology [511], are adopted as the real power

8.1 Learning the Network

4.5

×10

297

5

4

Real power P (W)

3.5 3 2.5 2 1.5 1 0.5 0

0

0.5

1

1.5

2

time

2.5 ×10 4

Fig. 8.8 1-min power profile

profiles in the subsequent simulation. The PG&E load profile contains hourly real power consumption of .123,000 residential loads in North California, USA. The “ADRES-Concept” Project load profile contains real and reactive power profiles of 30 houses in upper Austria. The data were sampled every second over 14 days [511]. For example, Fig. 8.8 shows the dataset from the “ADRES-Concept” Project. For the PG&E dataset, reactive powers were not available. For its reactive power .qi at bus i, we simulate it with three different scenarios: q

q

• Random Reactive Power: .qi (t) ∼ Unif(0.5μi , 1.5μi ), .t = 1, · · · , T , where the q mean .μi ∈ (60W, 180W ) is given in the IEEE PES distribution network;  • Random Power Factor: .qi (t) = pi (t) 1 − pfi (t)2 /pfi (t), where .pfi (t) ∼ Unif(0.85, 0.95); 

• Fixed Power Factor: .qi (t) = pi (t) 1 − pfi2 /pfi , where the fixed power factor .pfi ∼ Unif(0.85, 0.95), ∀i ∈ V.

298

8.1.4.1

8 Streaming Monitoring and Control for Real-Time Grid Operation

Tree Networks without DERs

PG&E Systems To obtain voltage measurements at time t, i.e., .|vi (t)| and .θi (t) for tree networks without DERs, we run a power flow based on the power profile above. Therefore, time series data are obtained by repeatedly running the power flow to generate hourly data over a year. Totally, .T = 8760 measurements that are obtained at each bus for the PG&E dataset. Finally, these voltage measurements are used for the mutual information-based algorithm. To simplify the analysis, we model .Vi at bus i as a two-dimensional real Gaussian random vector instead of a complex random variable. For simulations over tree networks (IEEE 123-bus in Fig. 8.9) without DERs, the first step of our proposed algorithm is to compare the mutual information between different pairs. Figure 8.10 displays pairwise mutual information of bus 26 and bus 109. We can see that the mutual information of a connected branch is much larger when compared with other pairs. This confirms Lemma 8.3. To demonstrate the detection result, a heat map of the mutual information matrix is shown in Fig. 8.11, where bus .90, · · · , 115 are included. In this figure, a circle represents the true connection, while a crossing indicates a detected connection between the row and column indexes. Therefore, if a circle is superposed by a crossing, a correct topology identification is claimed. We observe that: 1. the diagonal element has the largest mutual information in each row because it is the self-information [514]. 2. the coordinate associated with the true branch has the largest mutual information in each row (excluding the diagonal element). This fact illustrates that using the pairwise mutual information as the weight for topology detection is consistent with the physical connectivity. We repeat the simulation above in different setups and summarize the average performances in Table 8.1, where the detection error rate is defined as .(1 −

ˆ i,j ∈E I(i,j )∈E ×100%. Here, .Eˆ denotes the estimated set of branches and .|E| denotes |E|) the size of the set .E. Table 8.1 summarizes the detection error rate on IEEE 8-bus and 123-bus systems. It shows that the proposed algorithm can recover both systems without error with repeated simulations under three setups. In addition to the detection error rate for a fixed number of unknown topologies, we also simulate cases when the number of unknown topologies changes, e.g., Fig. 8.12. We compare the proposed algorithm with a traditional algorithm (requiring admittance matrix), and the algorithm in [493] is referred to as “BolognaniSchenato Algorithm.” The x-coordinate represents the number of edges needed to be identified. The y-coordinate represents the detection error rate. As the number of unknown edges increase, our approach consistently has a zero-error rate, while the other methods’ detection ability decreases. The failure of both methods is likely due

8.1 Learning the Network

299

Fig. 8.9 IEEE 123-bus system: In the dashed box, the topology is unknown

1

2

3

4

6

8

5

7

10

12

16

13

15

29

9

21

57

17

14

22

24

38

58

18

23

26

25

39

59

20

28

27

44

40

33

45

46

41

30

35

34

47

48

31

36

37

49

51

32

11

50

52

19

60

42

61

63

43

62

66

72

53

64

67

65

73

54

74

68

103

78

69

55

75

104

108

79

82

70

56

76

105

109

80

83

92

71

77

106

110

113

81

84

93

107

111

114

116

85

86

94

98

87

95

99

112

115

117

120

118

119

88

90

96

100

121

89

91

97

101

122

123

102

300

8 Streaming Monitoring and Control for Real-Time Grid Operation

Fig. 8.10 Pairwise mutual information

14

Mutual Information

12 10 8 6 4 Bus 26 Bus 109 Connected Bus

2 0 0

Fig. 8.11 Heat map of the mutual information matrix. Black circle: true branches in the 123-bus network. Black crossing: detected branches

20

40

60 Bus

80

100

120

16 115

14 12

110

Bus

10 105 8 6

100

4 95 2 90

Table 8.1 Detection error rate

90

95

100

Random reactive power Random power factor Fixed power factor

105 Bus

110

115

0

8-bus network

123-bus network

.0%

.0%

.0%

.0%

.0%

.0%

8.1 Learning the Network

301

100

Error Rate (%)

80 Bolognani-Schenato Algorithm |V|

60

(|V|,θ)

Traditional

40

20

0

0

20

40 60 80 Number of Unknown Edges (E)

100

120

Fig. 8.12 Error comparison

to their assumptions, known as the admittance matrix, a fixed inductance/resistance ratio, of sufficiently large nominal voltage.

EPRI Systems To explore the performance of our algorithm on a large-scale network, we use a radial distribution grid provided by the Electric Power Research Institute (EPRI). This grid contains 1 feeder with 2998 buses. The detected topology is shown in Fig. 8.13. The detection error is shown in Table 8.2.

8.1.4.2

Tree Networks with DERs

In the simulation of tree networks with DERs, solar panels are selected as the source of renewable energy. The hourly power generation profile is computed by using a PVWatts Calculator, which is an online application developed by the National Renewable Energy Laboratory (NREL) that estimates the power generation of a photovoltaic system based on weather and physical parameters [519]. The hourly

302

8 Streaming Monitoring and Control for Real-Time Grid Operation

Fig. 8.13 Detected EPRI 2998-bus topology Table 8.2 EPRI system detection error rate

Bus number 13 34 37 2998

Error rate .0% .0% .0% .0%

data are computed based on the weather history of Mountain View, CA, USA, and the physical parameters of a solar panel with a capacity of 5 kW. We randomly choose 12 load buses in the 123-bus system to have solar panels. The hourly generation profile at bus i is modeled as a Gaussian distribution, .pir (t) ∼ N(P r (t), 0.05), where .t = 1, · · · , T and .P r (t) denotes the power generation computed by PVWatts with a unit of kW. Finally, the renewable power generation is modeled as a negative load. After simulations, we obtain similar mutual information plots like Figs. 8.10 and 8.11. Finally, Table 8.3 summarizes the detection error rate on the IEEE 123-bus system, demonstrating the robustness of our algorithm to renewables.

8.1 Learning the Network Table 8.3 Detection error rate with renewable energy generators on the 123-bus system

303

Random Reactive Power Random Power Factor Fixed Power Factor

V .0% .0% .0.81%

.|V | .2.44% .0.81% .0.81%

Fig. 8.14 Heat map of the mutual information matrix. Black circle: true branches in the 8-bus network. Black crossing: the detected branches

8.1.4.3

Networks with a Loop

When adding a loop to the network, we apply an adapted mutual informationbased algorithm in Algorithm 9 and achieve .100% accuracy in the network without renewables and near .100% accuracy in the network with renewables. For example, Fig. 8.14 shows the heat map of the mutual information matrix for a loopy system with eight buses shown in Fig. 8.5. All the topology connections were detected correctly.

8.1.4.4

Algorithm Sensitivities

Sensitivity to the Historical Data Length To explore the sensitivity of the proposed algorithm to the sample number, we run Monte Carlo simulations by using data from 10 days to 300 days. The results are summarized in Fig. 8.15. We observe that when more than 30 days’ observations are available, our algorithm can stably reconstruct the topology with historical data. This result reflects that our algorithm can provide robust reconstruction with a short period of historical data.

Error Rate (1 Standard Deviation)

304

8 Streaming Monitoring and Control for Real-Time Grid Operation

100

Totally available days

80

Needed days

60 40 20 0 -20 -40

0

50

100

150 Days

200

250

300

Error Rate

Error Rate

Error Rate

Fig. 8.15 Error and its confidence interval versus the number of observations

1 min interval

100 50 0 10 0

10 1

10 2 10 3 Time (1 min) 30 min interval

100

10 4

10 5

50 0 10 0

10 1

10 2

10 3

Time (30 min) 60 min interval

100

(|V|,θ)

|V|

50 0 0 10

10

1

10 Time (60 min)

Fig. 8.16 Reconstruction error rates with different data resolutions

2

10

3

8.1 Learning the Network

305

Sensitivity to the Data Resolution In this simulation, we aggregate data points from [511] every 1, 30, and 60 min. We generate the voltage profiles using the IEEE 123-bus test case and assume the topology between bus 78 and bus 102 is unknown. Figure 8.16 shows the simulation results. When data are sampled at 1-min intervals, we need about a 3h voltage profile to reconstruct the topology perfectly. As discussed in [492], the distribution grid usually reconfigures every 3 h. Therefore, the proposed algorithm can reconstruct the topology in real-time. If the sampling period is 30 min or 60 min, about 30 h of data are required to reconstruct the topology [520]. Therefore, the minimum computational time can be shorter when the data with a higher sampling rate is available. Remark 8.3 In addition to data resolution, one also needs to consider the case when the topology changes while the historical data are acquired. If the acquired historical data includes data from two different topologies, our proposed algorithm will have poor performance. However, our algorithm can be executed repeatedly at a low cost. Thus, we suggest using the information when several detection results at consecutive time slots return the same topology.

Sensitivity to the Data Coming from Different Seasons To understand our algorithm’s sensitivity to the different days, we compare data streams obtained during two different seasons of the year. This simulation result is shown in Fig. 8.17. As one can see, data coming from different seasons of the year does not affect the algorithm.

Sensitivity to the Smart Meter Accuracy The proposed method depends on the sensor measurements. So, it is important to know the existing meters’ accuracy and whether such accuracy is sufficient for the proposed method. Therefore, we simulate the sensor error rate of .0.01% in [521]. For comparison purposes, we also simulate the error of .0.05%. Figure 8.18 shows the reconstruction error rates with different noise levels. Therefore, the proposed algorithm can provide a good topology identification rate with the current sensor noise level. Notice that if the smart meters’ data accuracy is reduced, e.g., rounded up, the statistical relationship may be lost; and our algorithm may have poor performance.

306

8 Streaming Monitoring and Control for Real-Time Grid Operation

1 min interval -- winter

100 Error Rate

80 60 40 20 0 0 10

10

1

100

10

2

3

10 Time (min) 1 min interval -- summer

10

4

5

10

5

(|V|,θ)

80 Error Rate

10

|V|

60 40 20 0 0 10

10

1

10

2

10 Time (min)

3

10

4

Fig. 8.17 Reconstruction error rates with data coming from different seasons

Sensitivity to Flat Loads One may expect that diversity in historical data to be important. Therefore, if the load profiles are flat, the proposed approach can fail. To check for failure, we simulate the flat load scenario, where we use a fixed load plus a Gaussian noise with a variance of .1% times the load value. This leads to Fig. 8.19, which represents the simulated loads for bus 3 in the 123-bus. Table 8.4 shows the simulation results. We observe that if the load profiles are flat, the proposed approach still works. This is because the difference between the different loads comes from the “noise;” and these noise are independent of each other, making the current injections independent as well. When the current injections are independent, the assumption of our theorem is completely satisfied. This makes our algorithm achieve even better simulation results without error.

Sensitivity to the Redundancy of Available Information Nowadays, smart meters collect real-time measurements of voltage magnitude. Without .μ-PMU being widely adopted, we want to explore how the proposed

8.1 Learning the Network

307

100

0.05% 0.1%

90 80 Error Rate (%)

70 60 50 40 30 20 10 0 0

20

40 60 80 100 Number of Unknown Edges (E)

120

Fig. 8.18 Reconstruction error rates with different noise levels

10 True load Flat load

9 8

Power (kW)

7 6 5 4 3 2 1 0

0

1000

Fig. 8.19 Flat load

2000

3000

4000 5000 Time (hour)

6000

7000

8000

9000

308

8 Streaming Monitoring and Control for Real-Time Grid Operation

Table 8.4 Detection error rate with renewable energy generators on the 123-bus system with flat loads

Table 8.5 Detection error rate with only voltage magnitude

Random reactive power Random power factor Fixed power factor

Random reactive power Random power factor Fixed power factor

V .0% .0% .0%

.|V | .0% .0% .0%

8-bus network

123-bus network

.0%

.0.81%

.0%

.1.63%

.0%

.2.44%

algorithm performs on the voltage magnitude measurements only. Therefore, we model the voltage magnitude .|Vi | at bus i as a real Gaussian random variable, .|Vi | ∼ N(μ|Vi | , |Vi | ), where .μ|Vi | denotes the mean and .|Vi | denotes the variance. Both parameters are learned from historical observations. Table 8.5 shows the SDR on 8-bus and 123-bus systems by only utilizing voltage magnitude. For the 8-bus system, the proposed algorithm can still achieve zero error. For the 123-bus system, the detection error rate is still low, which is about .2.44%. Notice that the setup is an extreme case where no branch connectivity is known. In practice, at least, a partial network structure is known. For example, if 57 buses (selected randomly) out of 123 buses are unknown, the simulation result shows that the proposed algorithm can always reconstruct the topology correctly by only using voltage magnitudes.

8.1.5 Conclusion To identify distributed energy resources’ connectivity, we propose a data-driven approach based on sensor measurements. Unlike existing approaches, which require knowledge about circuit breakers or the admittance matrix, our proposed algorithm relies on smart metering data only. We formulate the topology reconstruction problem as a joint distribution (voltage phasors) approximation problem under the graphical model framework. A mutual information-based maximum weight spanning tree algorithm is proposed based on the proof of how to minimize the Kullback-Leibler divergence in a distribution system. Moreover, we extend the algorithm from tree structures to loopy structures to generalize the usage. Finally, we simulate the proposed algorithm on IEEE 8- and 123-bus systems and EPRI 13-, 34-, 37-, and 2998-bus systems, compare them with past methods, and observe highly accurate detection results in cases with and without distributed energy resources. This chapter includes state estimation of the steady-state and dynamics of PMU.

8.2 State Estimation of the Steady-State

309

8.2 State Estimation of the Steady-State In the last section, we showed how to recover topology for the distribution grid. One of the most significant benefits of having the topology is for state estimation to understand if electric components are working properly with penetrations of renewable energy. This is because the power systems operations need to capture unprecedented uncertainties in a short period. Fast probabilistic state estimation (SE), which creates probabilistic load flow estimates, represents one such planning tool. This section describes a graphical model for probabilistic SE modeling that captures both the uncertainties and the power grid via embedding physical laws, i.e., KCL and KVL. With such modeling, the resulting maximum a posteriori (MAP) SE problem is formulated by measuring state variables and their interactions. To resolve the computational difficulty in calculating the marginal distribution for interested quantities, a distributed message passing method is proposed to compute MAP estimates using increasingly available cyber resources, i.e., computational and communication intelligence. A modified message passing algorithm is then introduced to improve convergence and optimality. Simulation results illustrate the probabilistic SE and demonstrate the improved performance over traditional deterministic approaches via (1) a more accurate mean estimate, (2) the confidence interval covering the true state, and (3) the reduced computational time.

8.2.1 Introduction (Have a more “beginners” introduction to state estimation starting from the MLE estimation and what is calculated plus the observation that depending on what is known and unknown as well as the network model used, things get very complicated fast.) On the generator side of the energy industry, it is currently experiencing rapid and dramatic changes. For example, Energy Report predicts that the percentage of renewable generation in global energy structure will increase from .10% in 2014 to .95% by 2050 [522]. With such a high penetration of distributed energy resources (DER), like intermittent photovoltaic generation, states shift at a faster speed than in the past. Increasing uncertainty also occurs on the load side, where demand response programs and the ongoing deregulation of the power industry turns passive loads into active loads [523]. These changes both on the generator side and load side call for well-designed planning mechanisms for sustainable system operation. Before the proliferation of renewables, probabilistic tools such as probabilistic power flow [524–526] and probabilistic optimal power flow [526, 527] have already been developed. But these tools are for long-term planning [523, 527, 528]. For example, system demand is taken as a random vector of correlated variables, which allows us to consider the dependence among generation, load, and locations [529].

310

8 Streaming Monitoring and Control for Real-Time Grid Operation

While long-term planning tools are essential, increasing penetration of distributed energy resources calls for operational planning in the short term. For example, the recent US Department of Energy’s call for Grid Modernization Initiative (GMI) aims at solving the challenges of integrating conventional and renewable sources to ensure network resilience. One of the practical questions is how to schedule a capacity bank under uncertainties. To provide real-time probabilistic data for services such as operational planning, we propose to conduct a probabilistic state estimator to utilize all sensor measurements in the network. For example, we can use the probabilistic joint state to conduct operational planning by lowering the need for operating reserves. Figure 8.20 shows a .95% confidence zone with probabilistic state estimation. Once this confidence zone is determined, we can plan reserves, such as standby gas generators, capacitor banks, and storage devices to accommodate for changes in the current operation point of a power grid. In other words, the operational planners can use a probabilistic analysis to calculate required reserve margins [530, 531], so that a sufficient reserve in energy production will be arranged immediately and called upon if necessary [529, 532]. If needed, a system operator can also change the number .95% used in Fig. 8.20 to a larger number, such as .99% or .99.9%, for reserve planning depending on the preference. Also, we can further reduce the needed reserves by using probabilistic joint state estimation. For example, Fig. 8.20a and b show the .95% confidence zone with and without a joint state estimate for planning standby resources. By using the probabilistic joint state estimation, the needed standby resources can be reduced significantly. Finally, joint probabilistic state estimation can also initiate other services such as (1) quantifying joint voltage variation for voltage regulation with intermittent renewable energy generation [533] and (2) creating probabilistic electricity market products [527]. To conduct probabilistic modeling for probabilistic SE, one can use the Bayesian framework. For instance, [534] uses a Bayesian linear state estimator and provides trade-offs among the number of PMUs, PMU placement, and measurement uncer-

Fig. 8.20 95% confidence zone for standby reserve. The x-coordinate reflects the first voltage estimate .v1 , and the y-coordinate represents the second voltage estimate .v2 . (a) shows the confidence zone without considering probabilistic dependence between .v1 and .v2 . (b) shows the confidence zone when considering probabilistic dependence

8.2 State Estimation of the Steady-State

311

tainty. Dobbe et al. [535] uses load forecast for distribution grid state estimation with limited measurements. Different from these methods, our focus is on the graphical representation of the electric power system so that the fast and scalable computation in graphical model can be used for operational planning. This is because, for making probabilistic inferences, joint distribution needs to be marginalized over the whole parameter space for interested states [536]. However, similar to many probabilistic power flow methods [83], direct marginalization requires significant computational time with computational complexity .O(|X|N ), making marginalization non-scalable. (Tables are available only for the bivariate case [537].) Such non-scalability prevents real-time probabilistic state estimation for operational planning. Fortunately, the graphical model provides both the sum-product for marginalization and the max-product for maximum a posteriori estimate that can reduce the computational complexity to .O(N |X|2 ) in a dynamic programming fashion. For constructing the graphical model, we formulate power grid states (bus voltages) as random variables on graph vertices. The edges of the graph determine the interaction of the state variables according to physical laws, i.e., Kirchhoff’s law. Viewed together, the graphical model is specified by the joint density of random variables in the network for SE, which is subject to the constraints imposed by the physical laws. Then, we choose the objective of maximum a posteriori (MAP) for estimation [538, 539]. Under such a MAP problem formulation, distributed message passing (sum-product or max-product algorithms) is subsequently employed in a belief propagation (BP) fashion [540–543]. Although the BP algorithm is optimal for tree networks in distribution grids, it needs adaptation for meshed networks in transmission grids. Therefore, we adopt a tree-reweighted message-passing algorithm based on a variational belief propagation (VBP) algorithm in the Bayesian framework [194, 544–546]. To overcome the convergence problem and reduce needed computer memory, we modify the VBP algorithm into a sequential VBP algorithm. A tree agreement condition is used to check the optimality of our algorithm. Notably, [542, 543] show how to use a factor graph and linearization techniques to conduct belief propagation for mean state estimates in the tree-structure distribution grid. Different from them, we show how to build a graphical model to avoid linearizing the nonlinear power flow equations for preserving physical laws. We also consider mesh structures in transmission systems and some urban distribution systems. For these systems, we show how to adapt the belief propagation algorithms. Finally, we show how to use mean estimates as well as variance estimates for operation planning. We use simulations of IEEE test systems on both transmission grids (e.g., 14 buses) and distribution grids (e.g., 123 buses). The numerical results obtained show that the probabilistic state estimation provides highly accurate mean estimates and confidence intervals that adequately capture the true state. The computational time is short. Only one-tenth of the time of the current industrial approach is needed for the new approach in relatively large systems, leading to real-time application.

312

8 Streaming Monitoring and Control for Real-Time Grid Operation

When compared with existing SE methods, the novelty of the proposed approach lies in (1) motivating the probabilistic state estimation for operational planning, (2) embedding physical laws in the probabilistic modeling, (3) reducing the computational burden associated with the probabilistic estimate, and (4) empirically demonstrating the idea and the performance of the proposed probabilistic state estimate. Finally, the static state estimation is the basis for static security analysis, voltage stability analysis, and optimal generator dispatch. Therefore, our proposed probabilistic state estimate can also be used for services such as probabilistic security analysis [547], probabilistic voltage stability analysis [548], probabilistic optimal generator dispatch [549], etc.

8.2.2 Graphical Modeling In a mathematical description of a power grid, the key state variables .v are usually assumed to be deterministic. It is related to the observed measurements via zi = hi (v) + ui ,

.

(8.11)

where .zi is the .i th telemetered measurement, such as power flow and voltage magnitude. The vector .v = [|v1 |ej δ1 , |v2 |ej δ2 , · · · , |vn |ej δn ]T . .hi (·) is the nonlinear function associated with the .i th measurement. Finally, .ui is the .i th additive measurement noise assumed to be an independent Gaussian random variable with zero means, i.e., .u ∼ N(0, ). . is a diagonal matrix with the .i th diagonal element 2 .σ . While .ui is modeled as a random variable, .zi is a deterministic number. This i is because when .zi is measured, the noise is sampled from a distribution. After sampling, it is determinant [23]. To estimate the deterministic .v, weighted least squares (WLS)-based approach is usually used for the point estimate .v: vˆ WLS = arg min

.

v

 m  zi − hi (v) 2 . σi

(8.12)

i=1

The WLS objective provides a deterministic state estimate. For probabilistic state estimate, we propose a probabilistic description of a power grid, which provides a dynamic programming type of computation to reduce the probabilistic SE’s computational time in large-scale smart grids. This is because the grid of monitoring interest is becoming larger and larger with new components such as renewable generators, electric vehicles, and sensing devices. Therefore, it is important to have scalable properties. Such a probabilistic description is illustrated in the cyber-physical graph representation in Fig. 8.21 based on the increasing intelligence penetration with communication and computational functionalities. This representation is motivated by an abstract model of the existing US electric power grid, which creates

8.2 State Estimation of the Steady-State

313

Fig. 8.21 The physical network and the corresponding cyber network for IEEE 14-bus system

an undirect cyber graph .G(V , E). Vertices in V represent the buses’ voltages (generators and loads), and edges in E represent sensors placed on branches and transformers. In a graphical model, the state .vi represents the probabilistic state at the .i th bus instead of a deterministic state in (8.11). Other auxiliary variables .zi represent measured physical quantities, such as power flow. Known physical laws, such as Kirchhoff laws, determine the interaction of these variables, which are represented by the edges E of the graph. Therefore, the cyber layer may have a different topology than the physical network. The edge in the cyber network, e.g., in blue color, represents sensors. The edge in the physical network, e.g., in black color, represents physical branches. The vertices in both the physical layer and the cyber layer represent the voltages. Similar to static state estimation, topology verification and bad data of the physical layer need to be verified for the estimation on the cyber layer. For instance, traditional topology verification methods can be used for such purposes, e.g., Chisquare test on the residuals [23]. In the future, we plan to extend the current work for robust Bayesian modeling [550]. In the following, we assume that the topology is verified and bad data are removed. For graphical models, probabilistic inferences such as marginalization for interested states usually require large computational time. For example, state estimate via marginalization or maximum a posteriori estimates has a computational complexity of .O(|X|N ), making it hard to conduct real-time operational planning. Fortunately, the scalable computation can be achieved via belief propagation [540, 541, 551] in

314

8 Streaming Monitoring and Control for Real-Time Grid Operation

a graphical model, reducing the complexity to .O(N |X|2 ). The procedure of belief propagation can be represented by different colors in Fig. 8.21. For example, the probabilistic distribution of voltage states in the pink area will pass their messages around to update the states locally according to the measurements around. At the same time, the probabilistic distribution of states in the red, blue, and green areas is updated locally as well. Then, the messages will be exchanged among different areas, shown in the bar on the right-hand side. Such a distributed computation reduces the computational time via dynamic programming while providing a probabilistic state estimation. Such an algorithm can be used for distributions within the exponential family, such as gamma distribution and Gaussian distribution. For example, one simple modeling technique is to use an arbitrarily discriminative model in the exponential family for .p(v|z). Here, the discriminative method represents a class of methods aiming at learning a direct mapping from measurements to the states without considering the physical relationship. However, the discriminative method omits the useful power system knowledge, e.g., power flow equations, for learning [552]. Therefore, we would like to use a generative model via embedding physical understanding but still within the chosen distribution, e.g., Gaussian distribution. Here, a generative algorithm models how the data was generated to estimate the original state [541]. Since the measurement noise in (8.11) is usually assumed to be with a Gaussian distribution [23], we can use the criterion of maximum a posteriori probability (MAP) in (8.13) vˆ MAP = max p(v|z),

.

v

(8.13)

where .p(·) represents the probability density function. By following the Bayes’ theorem, Gaussian distribution appears in .p(z|v): .

max p(v|z) = v

p(v)p(z|v) . p(z)

(8.14)

When computing the posterior distribution .p(v|z) in (8.14), .p(z) is needed. In practice, .z represents the measurements. So, once measurements are used to estimate .v, .z is a number without uncertainty. The probability of .p(z) is then fixed with .100% certainty at the observed values. In other words, .p(z) is the probability of a deterministic constant function that equals 1. Therefore, the MAP objective .p(v|z) has a Gaussian distribution if .p(v) is uniformly distributed.

8.2 State Estimation of the Steady-State

315

8.2.3 Distributed Joint State Estimation In the Bayesian framework, the Bayesian inference is a method of statistical inference in which the Bayes’ theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics and especially in mathematical statistics. For example, it can enable distributed joint state estimation efficiently. In the next section, we discuss the prior probability distribution that can be used in the Bayesian framework.

8.2.3.1

An Objective Prior Probability

For the prior .p(v) in (8.14), we can utilize the historical data to provide a prior distribution. For example, we can use historical data to fit the prior state distribution [553–555]. However, we propose that we can also use a non-informative prior or an objective prior reducing possible bias produced by the historical data. This is because the MAP estimation can be viewed as a regularization of currently used maximum likelihood estimation (MLE) [552]. By relaxing the regularization conditions on non-informative prior probability distributions, we can reduce bias in the Bayesian framework. The simplest rule for determining a non-informative prior is the principle of indifference, which assigns equal probabilities to all possibilities [541]. By plugging in the constant value to (8.14), the posterior function is directly related to the physical equations. This makes the prior information have no impact on the result, leading to significantly reduced bias. Specifically, we use uniform prior probability distribution for both voltage magnitude, i.e., .|vi | ∈ [0, 10], and phase angle, i.e., .δi ∈ [0, 2π], instead of historical data to avoid bias. The unit in voltage magnitude is per unit, where a per unit system is the expression of system quantities as fractions of a defined base unit quantity. Calculations are simplified because quantities expressed as per unit do not change when they are referred from one side of a transformer to the other. Also, the upper bound of 10 in voltages can be changed according to the system operator’s experience.

8.2.3.2

Embedding Physical Laws in the Conditional Probability

For conditional probability .p(z|v), additive Gaussian noise in (8.11) is used to integrate physical laws, e.g., .hi (v). Essentially, we can work with a log of posterior distribution:   2 2 .p(z|v) ∼ exp − (zi − hi (v)) /σi . (8.15) i

316

8 Streaming Monitoring and Control for Real-Time Grid Operation

This is because, from Eq. (8.11), the measurement .zi equals the summation of .h(v) and the noise .ui . If we treat .zi as a random variable, its randomness comes from both .h(v) and .ui . If .v is known, .h(v) is known as the measurement function .hi (·) is known. For example, if .hi (v) represents the power flow equation that maps .v into a power injection, .hi (v) is a quadratic function of .v, and this function is deterministic. Therefore, conditioning on the value of .v, the distribution of .zi |v is from the same distribution family as the noise .ui . As .ui is a Gaussian variable, .zi |v will have a mean .hi (v), and the variance is .σi of .ui . If one writes the distribution of .zi |v, one obtains (8.15). As the coefficient before the base e is a constant, .zi |v is proportional to the right-hand side of (8.15). Without loss of generality, we omit the variance .σi2 in the rest of the section for simplicity. In the following, we specify each measurement type for .p(zi |v). • The complex-valued power flow (pf) measurement on the branch (edge) .s − t near bus s: If we let Y be the admittance matrix and .Yst be the matrix element on the s row and t column, 

2 

pf ∗ ∗ .p(z (8.16)

zi − (vs − vt )Yst vs . i |v) ∼ exp − i

This is because .(vs − vt )Yst∗ vs∗ represents the noiseless power flow from bus s pf to bus t. Plugging .(vs − vt )Yst∗ vs∗ into (8.15) leads to the form of .p(zi |v). This form can be easily extended into real power measurements and reactive power measurements. • The voltage magnitude (vm) on bus s:  vm .p(zi |v)

∼ exp





1 zi − (vs vs∗ ) 2

2 

.

(8.17)

i

When phasor measurement unit (PMU) appears in the network, we can utilize PMU measurements. Note that the sampling rate of PMU measurements is quite faster than the sampling rate of SCADA measurements. One idea is to use the PMU measurements together with SCADA measurements when the SCADA measurements are available. Interested readers can refer to such hybrid methods in [556, 557]. • The voltage phase angle (va) on bus s:  p(ziva |v) ∼ exp

.





zi − tan−1

i

 I m(vs ) 2 . Re(vs )

(8.18)

The probability distribution functions associated with the measurement types above satisfy the pairwise Markov random field representation required by BP (more

8.2 State Estimation of the Steady-State

317

details in Sect. 8.2.3.3). However, the probability distribution function associated with power injection measurements below violates the pairwise requirement. • The power injection into bus s

pinj .p(z i |v)

 ∼ exp

2 

∗ ∗ − (vs − vt )Yst vs

zi − i

t∈N(s)

= exp{−T },

(8.19)

where T equals .

2



(vs − vt )Yst∗ vs∗

zi − i

t∈N(s)

=



zi −

i



(vs − vt )Yst∗ vs∗

t∈N(s)



zi −

=



∗ ∗ (vs − vk )Ysk vs



∗

k∈N(s)



|zi |2 − zi

i

Yst∗ vs∗ +

k∈N(s)





(vs∗ − vk∗ )Ysk vs − zi∗

(vs − vt )

t∈N(s)

 |vs |2 Yst∗ Ysk (vs − vt )(vs∗ − vk∗ ) ,

(8.20)

t∈N(s) k∈Ns

where the multiplications of three different state variables violating the pairwise Markov random field assumption required by the distributed BP algorithm. This is because the message passing algorithm passes a message from a node to another each time. It is based on the function that correlates the two-state variables and the updated distribution of the “from” node. This is one reason why we need a pairwise relationship in a pairwise Markov random field. If there is a multiplication component with three state variables, the message passing method cannot guarantee exact convergence to the true margins [544, 545]. To solve this problem, we first abstract (8.20) into 

θs (vs ) +

.

s

+





θ (vs , vt ).

(8.21)

t∈N(s)



t∈N(s) k∈N(s)

 |vs |2 Yst∗ Ysk vt vk∗ ,

(8.22)

318

8 Streaming Monitoring and Control for Real-Time Grid Operation

where .|vs |2 Yst∗ Ysk vt vk∗ represents the multiplication of three different state variables. .θ is compatibility function. .θs is the compatibility function associated with the (1) (2) (3) node s. Then, we define dummy variable vector .wstk  [wstk , wstk , wstk ]T and regularization variable .λ(wstk )  |ws |2 Yst∗ Ysk wt wk∗ . The regularization variable .λ represents the multiplication of three components but with non-state dummy

2

(1) − vs in the equations below are for variables .w stk . Terms such as .a wstk regularization purpose, so that the three elements of .wstk will be close to the values of .vs , .vt , and .vk . With these definitions, the original problem of maximizing (8.19) can be regarded as the problem of minimizing the following regularized problem 

θs (vs ) +

.

s

t∈N(s)

θst (vs , vt ) +



λ(wstk )

t∈N(s),k∈N(s)



(1)

2

2

2

(2)

(3)



+ a wstk − vs + a wstk − vt + a wstk − vk ,

(8.23)

where a is a penalty coefficient that can be obtained by cross-validation [558].

(1)



. w − vs 2 = w (1) 2 − w (1) v ∗ − w (1)∗ vs + |vs |2 results in a pairwise expression, stk stk stk s stk with similar extension to regularization on .vt and .vk in the problem above. Remark 8.4 One may concern about the bias of prior information, which may cause states to look good, although bad data appears. However, when using a Bayesian approach, it is assumed that the statistical model is correct or that bad data detection and the filtering process are conducted before the proposed inference is run.

8.2.3.3

Marginalization for Interested State in Tree-Structured Networks

After obtaining proper formulations for .p(v), .p(z|v), and .p(z), we can find the expression of .p(v|z) using (8.14). The task now is to infer the most likely joint states and their confidence intervals. Therefore, we need to efficiently marginalize over .p(v|z). However, marginalization for interested states is computationally expensive because all different variable combinations for uninterested variables need to be checked for marginalization. This type of computation grows exponentially with the number of state variables in the power grid. Therefore, we introduce a belief propagation (BP) algorithm, namely, the sum-product algorithm, to reduce computational cost. This algorithm can be directly applied to tree-structured power systems, such as feeder systems for distribution grids. Remark 8.5 Similar to other efficient algorithms in electric power system analysis, the BP algorithm explores network sparsity to compute marginal probabilities in a timeframe that grows only linearly with the number of nodes in the systems. The underlying principle of such a process is the divide and conquer process in typical serial dynamic programming (DP), i.e., we solve a large problem by breaking it down into a sequence of simpler problems. However, as a form of non-serial

8.2 State Estimation of the Steady-State

319

dynamic programming, BP generalizes the serial form of deterministic dynamic programming to arbitrary tree-structured graphs, where each subgraph is again a tree disjoint from other subgraphs (trees). Therefore, as a generalization of DP, BP can be conducted on an arbitrary treestructured graph .G(V , E) with a pairwise Markov random field factorization [541] p(v1 , v2 , · · · , vn ) = α



.



φs (vs )

s∈V

φst (vs , vt ),

(s,t)∈E

where .φs and .φst are compatibility functions for the joint distribution p(v1 , v2 , · · · , vm ) from the MAP form in (8.14) [540]; compatibility functions .φs and .φst can be found in the components of joint distributions in (8.16), (8.17), etc. For example, .(vs − vt )Yst∗ vs∗ in (8.16) equals .Yst∗ vs vs∗ + Yst∗ vt vs∗ , forming the basis for .φs and .φst . .α denotes a positive constant chosen to ensure the distribution normalization. Finally, BP conducts message updates via .

Ms→t (vt ) ←



.

vs

φs (vs )φst (vs , vt )

k=t 

Mk→s (vs ),

(8.24)

k∈N(s)

where .N(s) is the set of neighboring buses of bus s. In such a message-passing calculation, the product is taken over all messages going into node s except for the one coming from node t. In practice, we can start with the nodes on the graph edge and compute a message only when all necessary messages are received. Therefore, each message needs to be computed only once for a tree-structured graph. We use Fig. 8.22 to illustrate the message passing steps. Let’s assume that the message is passed from the left to the right and .v1 and .v2 were updated. Now, there is a power flow equation in (8.16) that couples .v1 and .v3 in .φ13 (v1 , v3 ). Similarly, there is another power flow equation in (8.16) that couples .v1 and .v3 in .φ23 (v2 , v3 ). There is also a voltage measurement that can be plugged into (8.17). By plugging in the distributions of .v1 , .v2 , the two power flow measurements, and the voltage magnitude measurement into .φ13 (v1 , v3 ), .φ23 (v2 , v3 ), and .φ3 (v3 ), we can marginalize over .v1 and .v2 and obtain the distribution of .v3 . Once the distribution of .v3 is known, we can pass this message or belief to .v4 via the compatibility function .φ34 (v3 , v4 ).

8.2.3.4

From Tree Structure for Distribution Grids to Mesh Structure for Transmission Grids

As observed above, the key assumptions of the BP algorithm are (1) each subgraph remains a tree after graph division, and (2) the subgraphs are disjointed. Such assumptions are valid for the majority of electric power distribution grids. However, they do not hold for many transmission networks, such as in the test case of the

320

8 Streaming Monitoring and Control for Real-Time Grid Operation

Fig. 8.22 Message passing

IEEE 14-bus system with loops. To overcome this problem, a variational BP (VBP) approach is used by randomly generating spanning trees of the meshed network structure in the cyber layer [544]. The key is to assign probabilities to the edges based on the edges’ appearance probability .ρst in the spanning trees according to (8.25): ρst =

.

No. of spanning trees with the edge (s,t) . No. of all spanning trees

(8.25)

Mathematically, VBP starts by randomly generating spanning trees in the mesh network structure of the cyber layer in Fig. 8.21 [559], leading to the probability assignment of edge appearance .ρst . The following Algorithm 10 illustrates how to randomly generate a spanning tree. Algorithm 10 Randomly generating a spanning tree Require: Let E˜ denote an empty set. Let num = 0. 1: Randomly generate i, j ∈ {1, · · · , n}, where i = j and n is the bus number. ˜ 2: Assign the branch (i, j ) into the set E. 3: for num = n do 4: num = num + 1. 5: Randomly generate a new pair of i  , j  ∈ {1, · · · , n}, where i  = j  . ˜ 6: Assign the branch (i  , j  ) into the set E. 7: end for

Subsequently, convex combination methods are adopted to approximate the inference on the meshed networks with the BP algorithm on these artificial trees [545]. Mathematically, the new message-passing algorithm below is run with .ρst until states converge Mt→s (vs ) = α



 exp

.

vt



θst (vs , vt ) + θt (vt ) ρst

8.2 State Estimation of the Steady-State



n ρkt k∈N(t)\s [Mk→t (vt )] n (v )](1−ρts ) [Ms→t t

321

 ,

(8.26)

where .θst and .θt are the exponential parameters associated with compatible functions, such as .φst and .φs in (8.24). Remark 8.6 Note that, if .ρst = 1 for all .(s, t) ∈ E, the VBP in (8.26) degrades to the BP form in (8.24) due to the implication of a tree structure. Finally, the VBP approach relies on an exponential family that includes a wide range of many common distributions, such as the Gaussian, exponential, gamma, and Chi-square distributions. Therefore, the usage of Gaussian noise for representing .p(z|v) in Sect. 8.2.3.2 is proper.

8.2.3.5

Improvement over Convergence, Optimality, and Memory Requirement

Improve Convergence Note that sending a message from node s to node t is equivalent to reparameterizing vectors .θt and .θst for node t and edge .(s, t), respectively [546]. Further, the VBP algorithm in (8.26) is formed by minimizing an upper bound based on a convex combination of reparameterized parameters, e.g., trees [544, 545]. However, there are no guarantees regarding the decrease of this bound, which may go up. This is because the tree-reweighted algorithm does not maintain the convex combination



{i} {i} constraints . ρ {i} θs = θs and . ρ {i} θst = θst , where .θ {i} and .θ are parameters associated with the mesh network and the corresponding spanning trees [546]. Without such a constraint, reparameterizations of the original parameter vector may violate the equality constraint and never converge. To improve convergence, we propose a modified VBP called sequential tree-reweighted message passing algorithm from [546] below. Remark 8.7 Similar to the VBP approach, this sequential method works by message passing. Specifically, for each directed edge .(t → s) ∈ E, a message .Mt→s is computed and passed.

Check Optimality: Tree Agreement Condition for Mesh Networks For tree networks, the VBP algorithm (8.26) and the sequential VBP (sequential tree-reweighted) algorithm reduce to the sum-product algorithm, so that they are exact for tree networks. However, for mesh networks, this can no longer guarantee the output of the correct MAP assignment even if the algorithm converges. This is because the algorithmic derivation is based on approximating the distribution of

322

8 Streaming Monitoring and Control for Real-Time Grid Operation

Algorithm 11 Sequential tree-reweighted algorithm





{i} {i} 1: Generate spanning trees that satisfy i ρ {i} θs = θs , i ρ {i} θst = θst , and i ρi = 1, {i} th where ρ is the i spanning tree probability and the summation over i is with respect to all the generated spanning trees. 2: Select an order for nodes and edges in V ∪ E. For each element w ∈ V ∪ E, find all trees containing w. If there is more than one tree, reparameterize θ such that θ gives the correct min-marginals for all trees [546]. 3: “Averaging” operation:

n,{i} ρi θs , where ρs is the node appearing • If w = s is a node in V , compute θsn+1 = ρ1s probability.

n+1 • If w = (s, t) is an edge in E, compute Mt→s = ρ1st vt ρst (θsn+1 (vs ) + θstn (vs , vt ) + n+1 θtn+1 (vt )); set θstn+1 (vs , vt ) such that θsn+1 (vs ) + θstn+1 (vs , vt ) + θtn+1 (vt ) = Mt→s .

4: Check whether the message Mt→s converges; check if each edge is covered at least once. If yes, terminate. Otherwise, go to step 1.

mesh networks via upper bounds, and it is straightforward to demonstrate problems on which it specifies an incorrect MAP estimate. To obtain the correct MAP, we use the following condition to validate the tightness of bounds after the associated VBP algorithm or sequential VBP algorithm. This condition is called the tree agreement condition [546]. Definition 8.1 We say that the VBP decomposition satisfies the tree agreement if different spanning trees, which form the upper bound of the mesh network, share the common optimal result when the BP algorithm is run on each of them.

Reduce Memory Requirement Importantly, the sequential belief propagation algorithm in Algorithm 11 only requires half as much memory when compared to the memory needed for VBP. This is because VBP requires bidirection message storing, i.e., .Mst and .Mts . However, the new approach needs to store either .Mst or .Mts due to the node pre-ordering. For example, we can store only messages oriented toward the current node. The reverse messages are not needed since we update them before they are used.

8.2.3.6

Algorithm Summary

Finally, a summary is provided in Algorithm 12 and Fig. 8.23. In Algorithm 12, various measurement values such as .zi are firstly plugged into the Eq. (8.15) to obtain the expression that the conditional probability .p(z|v) is proportional to. If the grid is a tree, there is only one spanning tree that can be generated. If the grid is with a mesh structure, we randomly generate spanning trees many times. Then, by using the combination of results on generated spanning trees, we can

8.2 State Estimation of the Steady-State

323

Fig. 8.23 The flow chart of the proposed approach

Algorithm 12 Enhancing algorithm 1: Use the measurement values of z in (8.19) to obtain the joint probability function (8.15) over state v. 2: Generate spanning trees based on power system cyber layer topology. 3: Initialize the state variables, i.e., ones for voltage magnitudes and zeros for voltage phase angles. 4: Apply Algorithm 12 for state variable updates with the regularization of power injection measurements. 5: Repeat until state variables converge or when the iteration reaches the maximal allowed time.

approximate the result on a mesh network. Such a combination is achieved by using Algorithm 2, which conducts belief propagation on a mesh network. Finally, we repeat this process until state variables converge or when the iteration reaches the maximal allowed time.

8.2.4 Illustration Using an Example Figure 8.24a represents a 3-bus system to which the proposed algorithm is applied. In this example, we assume that we have voltage measurement .z1vm on bus 1, voltage pf phase angle measurement .z2va on bus 2, complex power flow measurement .z3 on pinj the branches 2–3 near bus 2, and a complex power injection measurement .z4 on bus 3. • Step 1: Because there are three possible spanning trees shown in Fig. 8.24b–d with equal edge probability, .ρ12 = ρ23 = ρ13 = 2/3. • Step 2: Write .p(v|z) as  p(v1 , v2 , v3 |z) ∼ exp

.

  1 2 − z1vm − (v1 v1∗ ) 2

324

8 Streaming Monitoring and Control for Real-Time Grid Operation

Bus 1

Bus 1

Bus 2

Bus 2

Bus 3 Bus 3

(a) Bus 1

(b) Bus 2

Bus 2

Bus 1

Bus 3

Bus 3

(c)

(d)

Fig. 8.24 Generation of spanning trees to obtain the value of .ρij . (a) 3 Bus system (b) 1st spanning tree. (c) 2nd spanning tree. (d) 3rd spanning tree

2  I m(x2 ) 2

pf

− z2va − tan−1 − z3 − (v2 − v3 )Yij∗ v2∗ Re(v2 ) 



2

pinj  − z4 − (v3 − v1 )Y31 + (v3 − v2 )Y32 v3∗

(8.27)

and use the measurement value .zi and the admittance value .Yij , leading to  p(v1 , v2 , v3 ) ∼ exp θv1 (v1 ) + θv2 (v2 ) + θv1 ,v2 (v1 , v2 )

.

+ θv2 ,v3 (v2 , v3 ) + θv1 ,v3 (v1 , v3 )  + φv1 ,v2 ,v3 (v1 , v2 , v3 ) . • Step 3: Initialize the voltage belief on each bus with magnitude one and phase angle zero. • Step 4: With regularization on .φv1 ,v2 ,v3 (v1 , v2 , v3 ), apply the result above to the sequential VBP algorithm. Pass the message .Mij . • Step 5: Repeat Step 4 until state variables converge. • Step 6: Check tree agreement condition in Definition 8.1.

8.2 State Estimation of the Steady-State

325

8.2.5 Numerical Results We simulate and show the performance of the proposed graphical model-based probabilistic SE, where MATLAB Power System Simulation Package (MATPOWER) [509, 510] is used. To mimic the penetration of distributed energy resources (DERs), we add solar power generation using the PVWatts Calculator from the National Renewable Energy Laboratory (NREL) [519]. For extensive runs over different cases, we implement simulations on both IEEE transmission grids, including 9-, 14-, 30-, 39-, 57-, 118-, and 300-bus systems, and distribution grids, including 8- and 123-bus systems. Specifically, we first run a power flow to generate the true state of the power system based on online load profile from New York ISO [560] (for transmission grid), load profile from PG&E (for distribution grid), and National Renewable Energy Laboratory (NREL) dataset (for adding renewables). Specifically, between .10% and .20% of buses, randomly chosen from IEEE networks, are integrated with solar panels. The hourly power generation profile is computed by PVWatts Calculator [519], an online application developed by NREL that estimates the power generation of a photovoltaic system based on weather and physical parameters [15]. The data are computed based on the weather history in North California and the physical parameters of a 5 kW solar panel. The renewable power generation is modeled as a negative load. After obtaining the true state, we generate measurements based on system equations and Gaussian measurement noises. The noise standard deviations are .(1) power injection, .0.015; .(2) power flow, .0.02; .(3) voltage magnitude, .0.01; and .(4) voltage phase angle, .0.003. Our database for measurement selection includes .(1) the power injection on each bus, .(2) the transmission line power flow from or to each connected bus, .(3) the direct voltage magnitude of each bus, and .(4) the voltage phase angle of each bus. Avoiding the ideal case of using all possible measurements, the measurements are randomly chosen. Specifically, after generating all possible measurements, we choose the types and the number of measurements that are available in different grids while maintaining the system observable. We also use different measurement numbers according to our working experience with South California Edison’s 7 feeders: the distribution grids, especially the secondary distribution grids, have much fewer sensors than the transmission grids. So, when conducting the simulations, we put fewer sensors in the distribution grids to reflect reality. We also choose 10 to .20% of the buses with PMU measurement for our transmission grid evaluations. But, we have much fewer PMU measurements for our distribution grid evaluation. At the end of this section, we evaluate the performance of probabilistic state estimates with different PMU numbers. For that last part, we choose from 0 to .100% of PMU measurements for evaluating the impact of PMU measurements. Finally, system observability is checked, and the convergence criteria to stop the calculation are defined as .10−5 in per unit value.

326

8 Streaming Monitoring and Control for Real-Time Grid Operation

Remark 8.8 (The Number of Spanning Trees) If the grid is a tree, there is only 1 spanning tree that we can generate. So, 1 will be enough. For a complete graph with three nodes, there are three spanning trees. For complete networks with more than .n = 3 nodes, there are .nn−2 spanning trees [544, 561]. Fortunately, the power system has a sparse structure, which greatly reduces the possibility of spanning trees. As the exponent is highly related to the network node degree, we generate about .n2 trees. This is because the node degree of the power system is usually around 2 or lower than 2. In networks with higher node degrees, one can increase this number on the exponent. For accuracy, we use the 14-bus system to demonstrate error domain improvements. We use the 30-bus transmission system and 123-bus distribution system to demonstrate state domain comparisons. For a computational time, we use the 9-bus to 300-bus systems for demonstration.

8.2.6 Error Domain Comparison Based on Mean Estimate Although this section proposes a probabilistic estimate for SE, the mean estimate is still necessary. Therefore, we compare our mean estimate with the deterministic estimate obtained from Newton’s method. For comparison in the error domain, 2

v )/σi , where m is the total the sum square error is defined as . m i=1 zi − hi (ˆ measurement number. Figure 8.25a shows 30 simulations in the 14 bus system. The x-axis represents the simulation test number. The y-axis displays the sum square error for each run. The red star represents the proposed belief propagation method. The blue rectangle represents Newton’s method applied to the weighted least square objective in (8.12).

0.8

1.3 Newton’s Method (flat start) VBP Estimates

0.7

Normalized Sum Square Error

Sum Square Error

0.6 0.5 0.4 0.3 0.2

1.1 1 0.9 0.8 0.7 0.6

0.1 0

Newton’s Method (flat start) VBP Estimates

1.2

0.5

0

5

10

15 Testing Number

(a)

20

25

30

0.4

0

5

10

15 20 Testing Number

(b)

Fig. 8.25 IEEE 14-bus. (a) Sum squared error. (b) Normalized sum squared error

25

30

8.2 State Estimation of the Steady-State

327

For the comparison of relative performance, we normalize the two errors with respect to the error in blue rectangle, leading to Fig. 8.25b. Figure 8.25b shows that the mean estimate based on the graphical model can reduce the error by .20% on average when compared to Newton’s method with a flat start. A more than .50% improvement is achieved on the 8th, 12th, 20th, 24th, and 29th simulations. The reason that there is a difference in performance between the two methods is because of the renewable data added into the simulation. This makes the testing case file different than the standard case file in MATPOWER. While Newton’s method is sensitive to the initial guess, the new BP-based method does not have this drawback. Therefore, the possibility of the BP-based approach coming closer to the global optimum is substantially increased.

8.2.7 Variance Estimate While there is a mean estimate, it is hard to know how confident the mean estimate is. This is because two confidence intervals in the same bus for the same confidence level may be dramatically different due to measurement location, meter quality, etc. Therefore, we plot the probabilistic state estimates with .95% confidence level in Figs. 8.26 and 8.27. For transmission grids, Fig. 8.26 shows the simulation results for IEEE 30-bus system. Specifically, Fig. 8.26a shows the probabilistic estimates of voltage magnitudes of the proposed approach. Figure 8.26b shows the probabilistic estimates of phase angles. It can be seen from the two plots that the BP-based approach in the red star region, formed by confidence intervals, captures the true states. For distribution grids, Fig. 8.27 shows the simulation results for IEEE 123-bus system. Specifically, Fig. 8.27a shows the probabilistic estimates of voltage magnitudes of the proposed approach. Figure 8.27b shows the probabilistic estimates

1.02

True states Newton’s method VBP estimates

1.01

True states Newton’s method VBP estimates

1 0 Voltage phase angle

Voltage magnitude

1

0.99

0.98

0.97

−2 −3 −4

0.96

0.95

−1

−5

5

10

15 Bus number

(a)

20

25

30

−6

5

10

15 Bus number

20

25

30

(b)

Fig. 8.26 Results obtained from the IEEE 30-bus (transmission grid). (a) Voltage magnitudes. (b) Voltage phase angles

328

8 Streaming Monitoring and Control for Real-Time Grid Operation True states Newton’s method VBP estimates

1.04

True states Newton’s method VBP estimates

0 −1

Voltage phase angle

Voltage magnitude

1.02

1

0.98

0.96

−2 −3 −4 −5 −6

0.94 −7 0.92

20

40

60 Bus number

80

100

(a)

120

20

40

60 Bus number

80

100

120

(b)

Fig. 8.27 Results obtained from the IEEE 123-bus (distribution grid). (a) Voltage magnitudes. (b) Voltage phase angles

of phase angles. Similar to the result in the transmission grid, the mean estimate is closer to the true state than Newton’s method. While the confidence interval is larger than the transmission grid scenario, the confidence region does capture the true state completely. Finally, as renewables are simulated in our test case, Figs. 8.26 and 8.27 highlight the strength of probabilistic estimate, which can serve as the input to operational planning, e.g., probabilistic power flow.

8.2.8 Computational Cost Because one of the major contributions in this work comes from the scalability of the probabilistic estimate, we plot comparisons in computational time in Fig. 8.28, which is based on the needed CPU times for test cases with different bus numbers. For comparison purposes, the computational times for centralized WLS and distributed WLS are added to the same figure. All simulations are obtained using MATLAB on an Intel Core i5 CPU with 4 GB RAM. The x-axis in Fig. 8.28 represents different bus numbers. The y-axis represents the needed CPU time. We observe that the computational time needed by the BP-based method grows linearly and is much lower than the time needed by the centralized WLS and distributed WLS methods. Therefore, the computational cost is low for fast probabilistic state estimation. This confirms the scalable feature of the BP-based method, which is the key to the design of the future wide-area monitoring, control, and protection (WAMPAC) systems.

8.2 State Estimation of the Steady-State

329

Centralized WLS Distributed WLS Tree−Reweighted BP 0

CPU Time (sec)

10

−1

10

−2

10

9

14

30 39 57 118 IEEE Testbed Case Bus Number

300

Fig. 8.28 CPU time comparison Table 8.6 Convergence

8.2.8.1

Method VBP S-VBP

Convergence probability 93% 100%

Probability of optimality 86% 97%

Improvement over Convergence, Optimality, and Memory

In Sect. 8.2.3.5, we show that the modified sequential tree-reweighted algorithm (SVBP) has attractive properties over the ordinary tree-reweighted algorithm (VBP). As such, we simulated the test cases to compare the two algorithms. The probability of optimality was checked by the tree agreement condition. We summarize the results in Table 8.6. From the table, the modified S-VBP algorithm can solve the convergence problem in the VBP algorithm. The probability of optimality is large, at .97%. Such a good performance is because the fact that the power system mainly has a tree structure in the distribution grids and a sparse mesh structure in the transmission networks. Finally, it is worth noting that we observe that the S-VBP algorithm needs about half the memory than the ordinary VBP approach. This property is particularly important for computations over large networks.

330

8 Streaming Monitoring and Control for Real-Time Grid Operation 7

0.7 6 Mean Square Error

Mean Square Error

0.6

0.5

0.4

5

4

3

0.3 2 0.2

0%

10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

1 0%

PMU Percentage

(a)

10% 20% 30% 40% 50% 60% 70% 80% 90% 100% PMU Percentage

(b)

Fig. 8.29 The impact of PMU measurements. (a) Voltage magnitudes. (b) Voltage phase angles

8.2.8.2

The Impact of PMU Measurements

Although it is unlikely that all the nodes in the distribution system will have PMU measurements, this subsection serves as an evaluation of the impact of PMU numbers to the results of the probabilistic state estimate. We choose from .0% to .100% PMU numbers to evaluate the impact of PMU measurements. Figure 8.29 shows that the mean square error (MSE) and the associated estimation variance decrease when the percentage of the PMU measurements increase in the network. Figure 8.29a is for voltage magnitudes, and Fig. 8.29b is for voltage phase angles. Both have the lowest MSE and the lowest variance when the network is fully equipped with PMU measurements. For comparison in the error domain, the mean square error below is used 2 1 m . v )/σi , where m is the total measurement number. i=1 zi − hi (ˆ m Similar to the classic WLS with PMU measurements, the more PMU measurements, the better the performance. Remark 8.9 If all bus voltages are directly measured, the method is still in need. This is because, in the distribution grid (especially the secondary distribution), where residential customers are increasingly installing renewable generations, e.g., a solar panel-based generator, it is hard to allow all buses to be equipped with PMU measurements due to the tremendous bus number. Existing sensors, such as smart meters, however, can provide measurements such as power injections and voltage magnitudes. In addition, PMU measurements can have bad data, e.g., calibration error, etc. An increased number of PMU measurements than pure PMU measurements means that there is more redundancy in recovering a better system state estimate.

8.3 Voltage Regulation Based on RL

331

8.2.9 Conclusion and Future Research The increasing penetration of renewables calls for operational planning in realtime. To provide a real-time probabilistic estimate for operational planning, e.g., probabilistic load flow, we propose a probabilistic state estimation method based on the graphical modeling of smart grids. Specifically, such modeling combines cyber intelligence with physical laws in a Bayesian framework. To reduce the computational burden that is usually associated with probabilistic state estimation, the sum-product algorithm is employed to find the marginal probabilistic state estimation of interest. For purpose of enhancing the algorithm’s robustness, a convergent tree-reweighted sum-product algorithm is designed to resolve convergence issues, check optimality, and reduce required computer memory for large-scale computation. We demonstrate that the proposed approach can provide an accurate mean estimate and the associated confidence zone fully capturing the state uncertainty. The most common assumption of noises in power system state estimation formulation (8.11) is that different noises are independent [23]. Therefore, the formulation in this section is based on such an assumption. However, recent research on noise modeling shows that measurement readings from the same device are correlated. For example, active and reactive power measurements are computed with the same current transformer (CT) and potential transformers (PT), leading to the correlation between the active and reactive power at the same location [562]. While measurements from different locations still have independent noises, there is a need to see how to remove the correlation locally in a Bayesian framework.

8.3 Voltage Regulation Based on RL In the last section, we showed how to conduct state estimation with uncertainties introduced by renewable generations. After monitoring the system with state estimation, we close the loop with system control. For this purpose, we introduce in this section a new framework to address the problem of voltage regulation in unbalanced distribution grids with deep photovoltaic penetration. In this framework, both real and reactive power setpoints are explicitly controlled at each solar panel smart inverter, and the objective is to simultaneously minimize system-wide voltage deviation and maximize solar power output. We formulate the problem as a Markov decision process with continuous action spaces and use proximal policy optimization and a reinforcement learning-based approach to solve it, without the need for any forecast or explicit knowledge of network topology or line parameters. By representing the system in a quasi-steady-state manner and by carefully formulating the Markov decision process, we reduce the complexity of the problem and allow for fully decentralized (communication-free) policies, all of which make the trained policies much more practical and interpretable. Numerical simulations on a 240-

332

8 Streaming Monitoring and Control for Real-Time Grid Operation

node unbalanced distribution grid, based on a real network in Midwest United States, are used to validate the proposed framework and reinforcement learning approach.

8.3.1 Introduction Photovoltaic (PV) smart inverter technology introduced in recent years enables solar panels to act as distributed energy resources (DERs) that can provide bidirectional reactive power support to electric power grid operations [563–565]. This support can be used to regulate local and system-wide voltages in distributed grids, and the IEEE Standard 1547-2018 [566] provides requirements on the use of such support. Voltage regulation is critical for network safety, both at the transmission and distribution levels. In the distribution grid, voltage regulation is usually controlled either through discrete switching (e.g., tap transformers, capacitor banks) or devices with continuous setpoints (e.g., PV inverters). In general, there are two categories of control and information structures to address the voltage regulation in distribution grids. The first category of solutions assumes complete or partial knowledge of system parameters and topology (samples of references include [567–573]). The second category of solutions primarily relies on observation data and does not explicitly use the knowledge of a physical model of the distribution network (such as a subset of references [574–580]). In the first set of solutions, control schemes are adopted based on assumed system models and on knowledge of line parameters. Given the broad set of references, we provide representative ones to illustrate some key directions of research. In [567], a multi-objective OPF problem is solved over a given unbalanced distribution grid model using sequential quadratic programming (SQP), where the set of objectives includes minimizing line losses, voltage deviation from nominal, voltage phase unbalance, and power generation and curtailment costs. In [568], a model-based voltage regulation problem is shown to be solvable by an equivalent semi-definite programming (SDP) problem, and the sufficient conditions under which it can be solved are a result of convexifying the problem as outlined. In [569–571], the distribution grid is modeled using the widely adopted linearized flow model, known as LinDistFlow, which assumes a tree (radial) grid structure and negligible line losses. In these papers, the same voltage regulation problem is solved, and an extension to these works are [572, 573] in which limited or no communication between buses is needed and the same LinDistFlow model is adopted to provide theoretical guarantees on convergence and stability of the proposed control schemes. In the second set of solutions, reinforcement learning (RL) approaches are used to bypass the need to explicitly model the system. Nonetheless, some form of a system model and a power flow solver is still needed to simulate the effects of actions on states, as it may not be feasible to learn by directly interacting with a live power grid. However, knowledge of such a model or its parameters need not be explicitly

8.3 Voltage Regulation Based on RL

333

known by the solver of the RL problem, since optimal policies can be inferred solely by interacting with the simulation environment and receiving reward signals. This flexibility to system models is in part what makes RL approaches attractive, in contrast with the aforementioned approaches that rely heavily on a specific class of system models. Furthermore, the set of objective functions commonly used in conventional control methods tend to be restrictive (e.g., quadratic), whereas in RLbased approaches, reward functions can be arbitrary and are usually designed to directly reflect the user’s underlying objective. In [574], for example, Batch RL is adopted to solve the optimal setting of voltage regulation transformers, where a virtual transitions generator is used to allow the RL agent to collect close-to-real samples for learning, without jeopardizing realtime operation. In [575], the optimal reactive power dispatch (ORPD) problem is solved using tabular Q-learning, where the objective is to minimize line losses, with actions that include various forms of discrete reactive power re-dispatch, such as switching control of tap transformers and capacitor banks. Tabular Q-learning works well when there is a relatively small number of discrete states and actions but suffers from the curse of dimensionality and does not scale up well for larger systems. In [576], deep RL is used to optimize reactive power support over two timescales: one for discrete capacitor configuration and the other for continuous inverter setpoints. Here deep refers to the representation of Q-value functions (used in Q-learning) in deep neural network form, as opposed to tabular form. This approach is well-known as DQN (Deep Q-learning). This is also applied in [577] to control voltages on the transmission level. While DQN uses neural networks to approximate value functions over continuous state-spaces, it still relies on the assumption that the action space is discrete. Policy gradients, which we review later in this section, are an alternative set of RL approaches that enable continuous action spaces, and in [578–580], policy gradients are used to solve Volt-VAR control problems where reactive power support is selected from a continuous set using a deep neural network that maps states directly to actions. Such methods are inherently limited by physical constraints on reactive power support, which are conventionally assumed to be uncontrollable. Here, we discuss the flexibility of such constraints and the value of relaxing them. For example, conventional practices of maximum power-point tracking (MPPT) have been stateof-the-art, wherein each PV inverter is designed to extract the maximum real/active power from the solar panel. However, with a growing number of PV panels in the distribution grid, it becomes important to fully investigate the benefits and costs of always absorbing the maximum real power from the sun into the grid in realtime. By absorbing less real power, for instance, there is more room for reactive power support. In practice, real power curtailment at generators is performed either by solving model-based OPF problems, such as in [567], or by using a so-called VoltWatt (VW) droop controller, which requires carefully tuning the droop curves at each generator. In this section, a RL approach is proposed for directly learning decentralized joint active and reactive power support. We illustrate a set of scenarios where instead of injecting all the solar power into the network, it might be better to save or store the

334

8 Streaming Monitoring and Control for Real-Time Grid Operation

power and inject it at a later time. Even in the absence of a storage system, under deep enough photovoltaic penetration, we discover that the RL agent surprisingly learns by itself that it might be better to draw only parts of the available power, in order to avoid over-voltage, especially if there is an insufficient amount of reactive power resources available. This is accomplished by designing a reward function, used in the RL approach, that strikes a balance between voltage regulation and real power absorption. Despite modern advancements in deep RL, artificial intelligence, and machine learning more broadly, the adoption rate of such tools in the power sector is still substantially low compared to other sectors. This is primarily due to the difficulty in interpreting deep neural networks and due to the complexity of setting up processes that are required for the use of such tools in practice. These issues make it quite risky for a power grid operator to both trust those tools and to prefer them over conventional ones for operating over critical infrastructure. Thus, it is necessary for the adoption of RL by grid operators so that the proposed approaches are heavily simplified and made much more interpretable, which is what we aim to contribute by this study. The key contributions of this section are suggested as follows: 1. Joint optimization of real and reactive power injection from the PV generation is formulated with a parameterized reward function tailored for a multi-agent RL approach. Such parameterization is shown to facilitate more interpretable and easier to train deep RL policies for a more user-friendly experience. 2. A variant of PPO (proximal policy optimization), a popular RL approach for handling continuous action spaces, is proposed for decentralized settings to dramatically simplify the search for optimal policies for relatively large systems. Not only can it be shown to train as well as or better than a centralized policy architecture, but it is much easier to implement than RL-based decentralized approaches in the existing body of literature. In our framework, an RL agent per bus observes voltages locally, and incrementally updates real and reactive power setpoints at photovoltaic inverters, similar to an integral droop controller (e.g., [581]). However, it does not rely on any explicit knowledge of network topology or line parameters and is fully decentralized, requiring minimal communication infrastructures for practical implementation. This is illustrated in Fig. 8.30, wherein agents are shown to be able to communicate with other agents or a central controller. The word fully in the title of this section refers to the fact that under our proposed framework, each agent in real-time (online) executes its policy locally without communicating with any other agent, not even any of its neighbors. During training though (while exploring and updating policies), observations are aggregated centrally so that all agents can adapt to one another. Since this is done on a much longer-term basis, the communication technologies used do not need to be as sophisticated for this purpose as they might be for an alternative online communication-based approach. Finally, the term “load” in Fig. 8.30 is not restrictive. For example, it can include individual residential units or larger and more aggregate neighborhoods. In the

8.3 Voltage Regulation Based on RL

335

Load

Bus

Load

Distribution Grid

Power Electronics

Agent Fig. 8.30 Every agent in the distribution grid controls a local photovoltaic system, with possible communication between agents. Note: each colored dot in the “Distribution Grid” box refers to one agent

former case, examples include houses with rooftop solar, and the agent is controlling a few panels. In the latter case, one bus could correspond to a solar farm, and the agent would control many units simultaneously. The remainder of the section is organized as follows. In Sect. 8.3.2, the voltage regulation objective with joint real and reactive power compensation is formulated. In Sect. 8.3.3, we provide a general review of the Markov decision processes, with one specific to our problem in Sect. 8.3.4, with modifications to simplify the task. In Sect. 8.3.5, centralized and decentralized policy architectures are proposed, and they are evaluated in Sect. 8.3.6 with numerical simulations.

8.3.2 Preliminaries We consider a three-phase balanced distribution network which consists of a set N = {0, 1, . . . , N} of buses and a set .L ⊂ N×N of distribution lines connecting the buses. Bus 0 represents the substation that acts as the single point of connection to a bulk power grid. For each distribution line .(i, j ) ∈ L, .yi,j denotes its admittance. The set of algebraic power flow equations that govern this three-phase balanced (single-phase equivalent) network are:

.

Pi − iQi = Vi∗



.

yij Vj ,

∀i ∈ N,

(8.28)

j ∈N

where .Pi and .Qi are the net injection of real and reactive power, .Vi is the√complex phasor voltage at bus i. .Vi∗ denotes the complex conjugate of .Vi and .i := −1. To model a distribution grid that is not three-phase balanced or unbalanced for short, you may simply replace bus indices with phase indices in Eq. (8.28) and .N with the set of all phases. With this, you can generalize over two-phase and singlephase buses, which are common in real distribution grids.

336

8 Streaming Monitoring and Control for Real-Time Grid Operation

Let .Vi denote the positive sequence voltage magnitude at bus i. At the substation bus, .V0 is fixed at 1.0 p.u. as it is modeled as an ideal voltage source. Let .C ⊂ N be the set of buses that are equipped with solar panels and controllable smart inverters, and .n := |C|. Let .Pic and .Qci be the total real and reactive power, respectively, which are injected by the PV inverter at bus .i ∈ C. Each inverter has an apparent power capacity .Si , which limits .Pic and .Qci as (Pic )2 + (Qci )2 ≤ Si2 ,

.

Pic ≤ penv ≤ 0.9Si ,

(8.29)

where .pienv is the maximum amount of real power that can be drawn from the solar panel at a given moment in time. It depends on exogenous environmental factors (irradiance, temperature, etc.), hence the superscript. The upper bound on this quantity is .0.9Si since each inverter in the network is assumed to obey standard IEEE 1547-2018 [566]. We let this injected power be evenly distributed across all phases per bus. Strictly speaking, if .Pic (t) is the actual real power injected by the inverter at time t and .Pic (t) is the setpoint, then those two cannot be equal at the same time. There is a small time delay (.∼10 ms, or less than one 60 Hz cycle) between when the setpoint is assigned and when the actual quantity tracks it. We let both the discrete-time step and the tracking time be 10 ms. This allows us to treat the system as a quasi-steadystate system, as illustrated in Fig. 8.31. More precisely, net real and reactive power injections at bus i at time t can be expressed as .Pi (t) = Pic (t) − Pil (t) and .Qi (t) = Qci (t) − Qli (t), where .Pil (t) and .Qli (t) are the (uncontrollable) load consumption at bus i at time t. Due to the one time step delay between setpoint and actual injections, .P c (t + 1) = P c (t) and .Qc (t + 1) = Qc (t). Then, due to the quasi-steady-state nature, .V (t + 1) = f (P c (t), Qc (t)), where .f (·) represents the solution to the algebraic power flow equation given in (8.28).

Fig. 8.31 Illustration of quasi-steady-state behavior

8.3 Voltage Regulation Based on RL

337

Fig. 8.32 Voltage deviation reward function .RV for different .δ (see Eq. (8.30f))

We address the problem of voltage regulation in the distribution grid through joint real and reactive power control of PV inverter setpoints. The control objective is to track desired voltage levels while not wasting solar power in the process. The voltage regulation problem can be formulated as follows: !

" RVi (t) + μi RPic (t) ,

.

(8.30a)

Pi (t) = Pic (t) − Pil (t),

.

(8.30b)

Qi (t) = Qci (t) − Qli (t),

.

(8.30c)

Equation (8.28), i.e., power flow, ∀t,

.

(8.30d)

Equation (8.29), i.e., inverter constraints, ∀t,

.

(8.30e)

1 min {δ − |1 − Vi (t)| , 0} , 0.05

.

(8.30f)

.

maximize P c ,Qc

E

T t=0 i∈C

s.t.

RVi (t) =

RPic (t) =

Pic (t) , 0.9Si

(8.30g)

where .μi ’s and .δ are positive scalars. Unlike with .P c and .Qc (controllable inverter), l l .P and .Q (uncontrollable load) that are generally not evenly distributed across all individual phases. For clarity, subscript i refers to .i th bus index, and terms where the subscript is dropped refer to all buses collectively. Voltage deviation (from nominal 1.0 p.u.) at each bus is considered acceptable if it is kept within some user-defined .δ. Deviations greater than this are assigned negative rewards, as depicted in Fig. 8.32, to signify an undesirable voltage profile. The reward term .RV in (8.30f) quantifies this criterion. In conventional voltage regulation schemes, reactive power injection or consumption is used to alleviate voltage deviation problems. Reactive power support

338

8 Streaming Monitoring and Control for Real-Time Grid Operation

is limited by physical constraints. For example, in the case of PV inverters, as expressed in Eq. (8.29), real power injection by the solar panel directly limits available reactive power. In our proposed objective, we consider both reactive power and real power provision as well as a decision variable to add a degree of freedom and to relax constraints on reactive power support if needed. Since we seek to extract as much real power as possible from the solar panel, which is physically bounded by .0.9Si and as expressed in Eq. (8.29), we assign a positive reward to more power drawn from the solar panels, as quantified in (8.30g) and (8.30a). Under this framework with the joint provision of real and reactive power, the user (e.g., utility) selects parameter .μ. This parameter acts as a balancing term between voltage deviation minimization and solar production maximization, considering the fact that over-injection of power leads directly both to over-voltage and to tighter constraints on reactive power. A value for .μ chosen too high yields a control scheme equivalent to conventional maximum power point tracking (MPPT) control with limited reactive power support. On the other hand, as .μ approaches zero, much less real power is likely to be drawn from solar panels by the optimal controller. In Sect. 8.3.6, we demonstrate a balanced choice of .μ. The following assumptions are made about variables that are not explicitly controlled: • While provided information to the simulator, neither network topology nor line parameters are used by the controller at any time during training or execution. That is, no a priori knowledge of such values is needed. • No load or solar forecasting is made available to the controller, neither upon training nor during execution. • Net load at a control bus is measured by the controller before supplying a setpoint to the solar panel inverter. The voltage regulation problem formulated in this section is posed as a Markov decision process (MDP) in Sect. 8.3.4, but first, in Sect. 8.3.3, MDP terminology and formalism is introduced for a more general class of control problems.

8.3.3 Markov Decision Process and Reinforcement Learning In this section, we give a brief review of the Markov decision processes (MDP) terminology. We also review a specific RL algorithm, called proximal policy optimization (PPO) [582], which is used to solve the voltage regulation problem. MDP is a standard formalism for modeling and solving control problems. The goal is to solve sequential decision-making (control) problems where the control actions can influence the evolution of the state of the system. An MDP can be defined as a four-tuple .(S, A, P¯ , R), where .S is the state-space and .A is the action space. .P¯ (s  |s, a) is the probability of transitioning from state s to .s  upon taking action a, and .R(s) is the reward collected at this transition. We consider a finite horizon MDP setting with horizon (episode) length of T . We assume that the

8.3 Voltage Regulation Based on RL

339

rewards depend only on the state, not on the actions. A control policy .π : S → A specifies the control action to take in each possible state. The performance of a policy is measured using the metric of the value of a policy, .V¯π , defined as,

.

V¯π (s) = Eπ

!T −1

" R(s(t))|s(0) = s ,

(8.31)

t=0

where .s(t + 1) ∼ P¯ (·|s(t), a(t)), a(t) = π(s(t)), and .s(t) is the state of the system at time t and .a(t) is the action taken at time t. The goal is to find the optimal policy ∗ ∗ = arg max V¯ . The corresponding .π that achieves the maximum value, i.e., .π π π ∗ value function, .V¯ = V¯π ∗ , is called the optimal value function. .π ∗ and .V¯ ∗ satisfy the Bellman equation, ! ∗

π (s) = argmax R(s) +

.

a∈A



" ¯∗

P¯ (s |s, a)V (s ) . 



(8.32)

s  ∈S

When the system model .P¯ is known, the optimal policy and value function can be computed using dynamic programming [583]. However, in most real-world applications, the system model is either unknown or extremely difficult to model. When there is a voltage regulation problem, the line parameters and/or the topology of the network may not be known a priori. Even if they were known, it would still be difficult to model the effect of actions on states in a feedforward fashion due to the algebraic nature of their relationship. In such scenarios, the optimal policy needs to be learned from sequential state/reward observations by interacting with an environment, which in this section is a simulation environment as described in Sect. 8.3.4. Reinforcement learning is the approach for computing the optimal policy for an MDP when the model is unknown. Policy gradient algorithms are a popular class of RL algorithms. In a policy gradient algorithm, we represent the policy as .πθ , where .θ denotes parameters of the neural network used to represent the policy. Let .J (θ ) = Es [V¯πθ (s)], where the expectation is w.r.t. to a given initial state distribution. The goal is to find the optimal parameter .θ ∗ = arg maxθ J (θ ). This is achieved by implementing a gradient descent update, .θk+1 = θk + αk ∇J (θk ), where .αk is the learning rate. The gradient .∇J (θ ) is given by the celebrated policy gradient theorem [583] as .∇J (θ ) = ¯ πθ (s, a)∇ log πθ (s, a)], where the expectation is w.r.t. the state and action Eπθ [Q distribution realized by following the policy .πθ . Here .Q¯ πθ is the Q-value function corresponding to the policy .πθ . Often, the Q-value function is represented using a neural network of its own (different from the one used for policy representation). The neural network which represents the policy is called the actor-network, and that which represents the value function is called the critic network. The terminology is used so that the policy network determines actions given observations, and the value network provides the ‘critic’ feedback to update the policy parameter, as clear from the expression for .∇J (θ ). This class of algorithms is also called actor-critic

340

8 Streaming Monitoring and Control for Real-Time Grid Operation

algorithms. The goal is to incrementally update the parameters of both networks in such a way that they converge to yield parameters corresponding to an optimal policy and Q-value functions. Trust region policy optimization (TRPO) [584] is a recent variant of policy gradient algorithms. For improving the sample efficiency and ensuring reliable convergence, TRPO modifies the policy update as: $ πθ (s, a) ¯ Qπθk (s, a) . πθk (s, a) θ   s.t. E DKL (πθk (s, ·), πθ (s, ·)) ≤ d, #

θk+1 = argmax Eθk

.

(8.33) (8.34)

where .DKL (·, ·) is the Kullback-Leibler divergence between two policies, and πθ (s, a) denotes the probability of selecting action a given state s. Constant d is a user-defined threshold. The proximal policy optimization (PPO) algorithm [582] builds upon the TRPO framework by modifying the objective function and optimization update, which enables improved data efficiency and easier implementation since the KL divergence constraint is dropped and the objective is clipped (or saturated) as described in [582]. Clipping of the objective is performed to discourage the optimizer from overupdating .θ . We adapt this state-of-the-art algorithm to a decentralized setting to solve the voltage regulation problem we consider.

.

8.3.4 Voltage Regulation as an RL Problem We use the MDP formalism to model the voltage regulation problem presented in Sect. 8.3.2. As a reminder, .n := |C|.

8.3.4.1

State Space

The state-space .S ⊂ R2n is the set of real power injections and voltage measurement at all controllable buses. For convenience, each state .s ∈ S is defined as an affine transformation of those measurements. More precisely, the state of the system at time t, .s(t), is given by s(t) := (s1P (t), s1V (t), · · · , snP (t), snV (t)),

.

where siP (t) ←

(8.35)

Pic (t) 1 − Vi (t) − 1, siV (t) ← . 0.9Si 0.05

So, when .Pic (t) is its maximum allowable value, .siP (t) is zero. Also, when the voltage is equal to the nominal value, .siV (t) is zero. Thus, ideal scenarios correspond

8.3 Voltage Regulation Based on RL

341

to the state value zero, and critical scenarios correspond to magnitudes of order one or less, assuming critical voltages exceed .1 ± 0.05. This scaling helps to initialize and train the RL algorithm.

8.3.4.2

Action Space

Given voltage measurements at every bus in .C, we can choose two approaches to determine .P c and .Q c : (1) Directly determine optimal real and reactive power setpoints, i.e., . P c , Qc , by algebraically tying to voltage, or (2) change setpoints incrementally, i.e., . P c , Qc , similar to an integral controller. The first approach requires the design and memorization of a highly nonlinear function that is likely dependent on system operating conditions. Due to the lack of tracking in this approach, forecasting would be required to respond to different operating conditions. The second approach, on the other hand, enables tracking the desired state in a simple and incremental way [573]. We use the second approach in this section. The action space .A ∈ [−1, 1]2n is the set of possible scaled increments in real and reactive power setpoints. The action at time t, .a(t), is given by Q Q P P .a(t) = (a , a , · · · , an , an ), and the increment to those setpoints are defined, 1 1 Q Q respectively, as . Pic = aiP · Pmax and . Qci = ai · max . Here . max explicitly limits the size of the actual (as opposed to scaled) increments . P c , Qc . 8.3.4.3

Transition Model

We assume that the next states are obtained by interaction either with a real-world distribution grid or with a simulator, such as OpenDSS [585]. In the case of a simulator, provided changes in loads .(P l , Ql ) along with the current state and action based on .S and .A, the next states can be computed directly. For example, actions are mapped to states using OpenDSS as follows: V (t + 1) = OpenDSS(P c (t), Qc (t)).

.

(8.36)

We choose to use OpenDSS simulator for two main reasons: (a) it can solve power flow for unbalanced distribution grids, and (b) one can directly interact with it using Python, where RL methods are easier to implement. 8.3.4.4

Reward Function

The system-wide reward at every time step is obtained as follows: Rt =

.

1 RVi (t) + μi RPic (t) n i∈C

(8.37)

342

8 Streaming Monitoring and Control for Real-Time Grid Operation

where .RVi (t) and .RPic (t) are defined in Eqs. (8.30g,8.30f). Note that the state and action spaces have been defined in such a way that each element ranges from .−1 to 1, with an exception where the voltage-related state may exceed .±1 if the p.u. voltage exceeds .1 ± 0.05 under abnormal conditions. This is a suitable choice for training an RL policy as it allows for initialization and adjustment of policy parameters .θ in a standard method by exploiting the state-ofthe-art algorithms (most of which requires that state and action spaces be a box inside .±1 along all dimensions). Based on the definition of action space, the RL agent seeks to learn the magnitude and direction which incrementally changes the setpoints for every starting state. This raises the question: what information does the agent need to guide this action? The state defined in Eq. (8.35) has the following advantage: if both the voltage term and the power term are zero (i.e., maximum power drawn and nominal voltage), then the scenario is ideal, and no extra injection is needed. If the load changes in the system though, a simple amendment to the RL controller is needed

.

P c ← Pmax · aP + P l , l Qc ← Q max · aQ + Q ,

(8.38)

+1]) are determined by the RL agent’s zero-centered where .aP and . aQ (both in .[−1, policy .π, and . P l , Ql is the observed change in load at the controllable buses. The strategy adopted in Eq. (8.38) is termed an integral controller since setpoint .(P c , Qc ) behaves as a discrete-time integrator of changes in operating conditions. Moreover, this controller tracks the state to zero in a steady state within resource limits since all terms in Eq. (8.38) go to zero if .s = 0. Under scarcity of resources, one or more of the state terms in Eq. (8.35) will be non-zero, which calls for a balance between maximum power point tracking and voltage regulation. Note that state-tracking incremental setpoint changes are bounded by . Pmax and Q . max to limit fluctuations. These values are chosen heuristically as .0.09Si and .0.2Si , respectively, since those are one-tenth of the maximum possible jumps in setpoints .P c and .Qc .

8.3.5 Control Policy Architecture and Optimization In this section, we present the design and architecture of our RL algorithm for voltage regulation. We build on the PPO [582] algorithm and extend it to a decentralized setting. Figure 8.33 summarizes the RL-based control policy architecture we propose. State Observer refers to Eq. (8.35), and Integral Controller refers to Eq. (8.38). . θπ refers to changes in policy .π determined by the PPO algorithm. The policy and critic network architectures used to implement PPO are detailed in the remainder of this section.

8.3 Voltage Regulation Based on RL

343

Fig. 8.33 RL-based control architecture

The standard PPO algorithm assumes that there is a single centralized agent, which fully observes the state of the system and can take any control action. However, this centralized control policy may not be feasible for the voltage regulation problem we consider. First, the RL algorithm may not be able to scale to a large network with many nodes. Second, even if a centralized training of an RL algorithm is feasible, the implementation of such a centralized control policy in the real-world system may not be possible due to the communication infrastructure needed. We propose a decentralized RL algorithm that overcomes these challenges. Recall that .si and .ai refer to local state and action (at bus i), respectively. As defined in Eq. (8.35), each state .si contains two terms per bus, relating to real power and voltage measurements. Similarly, action .ai was defined in such a way that it also contains two terms per bus relating to changes in real and reactive power setpoints. Our goal is to find the optimal policy parameter .θi∗ for each bus i that maps local state .si (t) to local control action .ai (t), i.e., .ai (t) = πθi∗ (si (t)), in an optimal approach. The objective is to maximize the cumulative global (system-wide) reward. To replace the centralized policy used in a standard PPO with a decentralized policy, we propose a neural network architecture for .π that connects the input to output only at the same bus, rendering it equivalent to a decentralized controller that competes with conventional methods. That is, there are n neural networks in parallel, each with only two inputs and two outputs. Each of those smaller networks is parameterized by a group of weights and biases, denoted collectively as .θi , and notation .πθi is shortened to .πi . This time, the PPO algorithm optimizes over .θ1 , . . . , θn , in search of optimal policies .π1 , . . . , πn , where ai ← πi (si )

.

∀i ∈ C.

(8.39)

344

8 Streaming Monitoring and Control for Real-Time Grid Operation

Note that the only difference between this case and the centralized case (optimizing over .θ ) is that here we enforce the strict rule that all neural network weights connecting states at bus i to actions at bus j are fixed at zero iff .i = j . One can also modify this architecture by replacing the condition .i = j with .(i, j ) ∈ / L, if the desired setup involves neighboring buses communicating with one another. For training and implementation purposes, we perform orthogonal initialization on neural network weights for all policies and assign very small initial values to those in the last layer to prevent instability in the feedback controller. In PPO, value function .V¯ (see Eq. (8.31)) is also updated at every iteration when the policy is trained. Even though we desire a decentralized control setting, actor policies are not trained to optimize over local reward maximization. Rather they are trained to maximize global (system-wide) reward. For this reason, instead of n value functions (.V¯i for each .i ∈ C), there is a single value function (critic) for all actors combined, and this function’s argument is the system-wide state. In contrast, Multi-Agent Deep Deterministic Policy Gradients (MADDPG) [586], a method that also employs multiple policy networks in a policy gradient approach, expects each “agent” to perform two functions, observe and act locally (with its own policy network), and have its own critic network (value function estimate) and estimate the policy network of others. In that paper, each agent is assumed to have its own reward function or objective. However, since there’s always a global objective (single reward signal), only one critic function suffices, and the policy networks train simultaneously to adapt to each other. Due to how the policy architecture was chosen, there is still a single policy function .π from a standard PPO algorithm perspective, with 2n inputs and 2n outputs in this formulation. Since there is a single value function, the PPO agent is trained as usual, without any explicit changes to the algorithm, other than the aforementioned restriction that states the policy network connects the input to output only at the same bus. To implement this, the optimizer (e.g., Adam optimizer in PyTorch) is set to ignore the weights (initialized and left at zero) in the policy network, which links actions at one bus to states at another.

8.3.6 Numerical Simulation In this section, we apply the proposed policy architecture and use PPO to solve the MDP. Numerical simulations are conducted on a 240-node distribution grid (see Fig. 8.35) using OpenDSS to solve unbalanced power flow, as described in Sect. 8.3.4. All parameters associated with this network are obtained from real line parameters and real load data, based on an anonymous distribution grid in the Midwest United States [587]. Experiment details (e.g., software and hardware details) are found in [588].

8.3 Voltage Regulation Based on RL

345

Algorithm 13 One episode of interacting with the distribution grids 1: function 2: Run Episode(in_training, D) {in_training is True or False, D is experience buffer} 3: Specify load & solar scenario (random & time of day). 4: if in_training then 5: Set agent policies to sample actions stochastically 6: else 7: Set agent policies to sample actions deterministically 8: end if 9: while the episode is not done do 10: for each controllable bus i in grid, do 11: Observe local voltage and active power measurements. 12: Use policy πi to sample changes to active & reactive power setpoints at local PV panels. 13: end for 14: Collect rewards and append actions, observations, and rewards to D. {Do this only if in_training} 15: end while 16: end function

Algorithm 14 Training decentralized PPO for distribution grids 1: Initialize the following: • • • •

policy neural network, πi for each bus i value neural network, V¯ PPO hyper-parameters experience buffer, D {memory of actions, observations, and rewards used in PPO}

2: in_training ← True 3: while the number of episodes less than threshold do 4: for each episode, do 5: Run Episode(in_training, D) {Defined in Algorithm 13} 6: end for 7: every pre-specified number of episodes do 8: Use D to update the policy and value networks according to PPO. 9: end every 10: end while

Based on numerical simulations, as shown in Fig. 8.36, we found that the decentralized agent is more sample efficient and trains with fewer fluctuations and variance in episodic rewards over the learning process. On the other hand, the centralized agent takes a bit less computation time (about 20% less) per iteration yet takes more iterations to converge.

8.3.6.1

Simulation Setup

The RL agent interacts with the distribution grid every 10 ms (the time step), and each episode contains 100 time-steps, for a total of one second per episode. .μ is set

346

8 Streaming Monitoring and Control for Real-Time Grid Operation

Fig. 8.34 An example of real load data from a grid located in the Midwest United States [587]. (Top) about 200 load profiles in the first week of January 2017. (Bottom) load distribution over the entire year of 2017

to .0.1 to favor voltage regulation over solar production maximization. We use the distribution grid shown in Fig. 8.35, where .N = 240, and we select .n = 16 and .n = 194 for the case studies that follow. One hundred ninety-four is the number of controllable nodes provided originally with the OpenDSS model of this grid. For each of these 194 nodes, we have 1 year (2017) of real historical load data .(P l , Ql ), which we use to our advantage to generate random samples for the simulation at the beginning of every episode. A short summary of the load data is shown in Fig. 8.34. Since each episode is 1 s, it is fair to assume that fluctuations in .penv and l l .(P , Q ) are negligible within one episode. For this reason, at the beginning of each episode, as indicated in Algorithm 13, we randomly generate and fix .penv and l l env , for .(P , Q ) for the remainder of the episode. Maximum solar power output .p training purposes, is randomly selected in each episode as a multiple of the net load at the same bus. For example, at a given bus, if the net load is x kW, then .penv at the same bus is chosen uniformly between 0 and 2x kW. This helps generate a diverse set of scenarios, ranging from net overconsumption to net overproduction at each bus. The implementation of each episode is detailed in Algorithm 13. If the episode is being run for the purpose of training, experiences are collected and stored in the experience buffer (also known as the replay buffer). Furthermore, each agent acts deterministically during an online interaction with the environment. However, during training, actions are sampled from a Gaussian, from which the mean and variance are determined by the policy’s neural network. The in_training Boolean (true or false) is used in the algorithm to distinguish between online execution and training. Finally, Algorithm 14 describes how interaction with the environment is used to train the policy and value networks (.πi ∀i and .V¯ ).

8.3 Voltage Regulation Based on RL

347

Fig. 8.35 Real distribution grid located in the Midwest United States [587], with 240 nodes excluding the substation node

8.3.6.2

Case Study on a Smaller (16-bus) Subsystem

In this case study, we compare the use of a centralized policy to that of a decentralized policy, presented in Sect. 8.3.5. Since .n = 16, neural networks of both centralized and decentralized policies have 32 inputs and 32 outputs. The standard choice of 2 hidden layers with 64 neurons per layer is made for the centralized policy, with .tanh(·) activation functions, whereas the decentralized policy splits into 16 sub-policies each with 2 inputs, 2 outputs, and 2 hidden layers each with 4 neurons. This gives both the centralized and decentralized policies a “height” of 64 neurons in the hidden layer (.4 × 16 = 64), but a total of 8352 parameters to tune for the former and 672 for the latter. In fact, in the decentralized case, we assign 16 different Adam optimizers, one to tune each sub-policy, so it’s not so much 672 parameters to optimize per PPO iteration, rather 42 per optimizer, compared to 8352 per (single) optimizer in the centralized setting (Fig. 8.35). The training curve for each is shown in Fig. 8.36, where each “PPO iteration” on the x-axis refers to 2048 steps of interacting with the environment (or 20 s, considering 10 ms time step). It is evident that the centralized agent does not outperform the decentralized agent and is clearly less interpretable and requires a wide communication infrastructure to implement in practice. Note: in both centralized and decentralized cases, value function V is centralized (a fully connected neural network). That is, the RL agent is centralized during training (computer simulation), but decentralized during execution (real-world).

8 Streaming Monitoring and Control for Real-Time Grid Operation

Fig. 8.36 The RL training curve for centralized vs. decentralized policies. A decentralized agent trains more monotonically and with less variance in episodic rewards. For both, rewards below 0 indicate voltages are not within the user-defined safety boundary

50 Scaled episodie rewards

348

0 –50 –100 –150 decentralized centralized Necessary for voltage safety

–200 0

2

4 6 PPO iterations

8

10

In classic RL benchmarks, on average a threshold’s cumulative reward is chosen to determine when the learning problem is solved. This helps the person who monitors and debugs the learning process to get a sense of progress to save time and effort. In our context, the threshold is set to 0, as shown in Fig. 8.36, for the following reasons. We know that .RV ≤ 0 and .RP ≥ 0, from Eqs. (8.30g, 8.30f). Both reward terms have been designed in such a way that the magnitudes of the rewards are of order 1 or less during normal operating conditions. Moreover, if the user wants to keep voltages within .1 ± δ, it is then a fact that .(RV + μRP ) ≥ 0 at every bus only occurs if the voltages are kept within the desired region at all buses. It logically follows that if the inequality does not hold, then the voltage of at least at one bus must be outside the desired region. Thus, we can state that by simply monitoring the training curve, one may claim that not all voltages are inside .1 ± δ if the curve is still below the threshold, concluding that more time is needed before ending the training progress. This necessary condition on voltage serves as a useful tool for users who seek to implement this approach. Note: in Fig. 8.36, the term “voltage safety” merely refers to voltages being inside .1 ± δ. Figure 8.36 demonstrates that the decentralized agent permanently crosses this threshold after two iterations, while the centralized takes four iterations to do so. By these results, we claim that one can obtain results for a decentralized agent that are similar to, or even better than, those for a centralized agent, simply by manipulating the neural network’s architecture.

8.3.6.3

Case Study on a Larger (194-bus) Subsystem

In the previous subsection, we compared centralized and decentralized policy architectures. In this subsection, we dig deeper to examine the proposed framework from purely a power systems perspective. We ask the following question: what is the impact of joint real and reactive power control (as opposed to just the latter) on system-wide voltage profile amid deep photovoltaic penetration?

8.3 Voltage Regulation Based on RL

349

P c/penv 1.0

0.5

ΔQc only, ΔP c = 0 Joint ΔP c , ΔQc

0.0

V (p.u.) 1.05 1.00 1±δ

0.95 0

50

100 Bus Index

150

200

Fig. 8.37 A comparison between control policies under a scenario with deep photovoltaic penetration. The joint provision of real and reactive power yields an improved voltage profile with minor power sacrifice

Consider .n = 194 buses, with controllable real and reactive power inverter setpoints. As shown in Fig. 8.37, when maximum real power is drawn from the solar panels, leaving less reactive power support, deep photovoltaic penetration causes over-voltage. With the joint provision of real and reactive power, the RL agent manages to keep voltages within the user-defined desired region (.1 ± δ). Surprisingly, a small reduction in real power injection was needed to achieve this effect. Figure 8.38 shows the steady-state distribution of real power consumption per bus as a ratio to maximum possible injection (.penv ). It is worth noting how well the voltage improved system-wide, even though most solar panels produced near maximum output (note the 0.85 on the y-axis of both Figs. 8.37 and 8.38). This justifies the value in considering the joint provision of real and reactive power support. Indeed, there exist plenty of studies in the body of literature that implement some form of power curtailment (to provide reactive power support) for the purposes of voltage regulation. The novelty proposed here is twofold. First, by using RL to automatically learn how to balance between voltage regulation and maximum power utilization, the user does not need to design heuristics to reach similar results or to explicitly rely on any understanding of how the system works. Second, as shown in Fig. 8.37, very little power curtailment is required to significantly improve the voltage profile and keep it well within the desired thresholds. This is made possible by the structure of the proposed decentralized PPO policy, which allows all the agents to simultaneously train to adapt to one another during exploration, despite the lack of communication between them during real-time operation.

350 Fig. 8.38 Histogram of c /p env under RL policy, using results of Fig. 8.37. Most buses inject near-maximum real power

8 Streaming Monitoring and Control for Real-Time Grid Operation

P c/penv distribution over buses Percentage of buses

.P

40%

20%

0% 0.85

0.90

0.95

1.00

P c /penv

8.3.7 Conclusion This section introduces a reinforcement learning-based voltage control strategy with the joint provision of real and reactive power for distribution grids with deep photovoltaic penetration. The joint real power and voltage support problem is formulated as a Markov decision process with rewards parameterized to balance between voltage deviation minimization and solar production maximization. Compared with conventional multi-agent RL algorithms, we develop a tailordesigned decentralized PPO (proximal policy optimization) algorithm that would both work well with large continuous action spaces and simplify the optimization process by updating multiple policies simultaneously. We demonstrate the idea by reducing the search space for the simper 16-bus subsystem (from over 8000 parameters for a centralized setting to under 700 parameters) for a decentralized setting. By taking this action, we still achieve similar and, in some cases, better average rewards. This size reduction is on the order of 10 or so, but for the 194bus subsystem, it is on the order of 100. Numerical simulations on a 240-node distribution grid based on real parameters show that it is not always the best strategy to absorb all the solar power available. This observation implies that it would benefit the distribution grid if some fraction of the real-time power produced from the PV panels could be locally absorbed. This section also proposes and verifies a fully decentralized (communicationfree) approach for this type of control, which can be implemented on the existing physical infrastructure, thus helping alleviate problems related to communication failure or cyber-attacks. In future work, competition between agents is considered, whereby the inverter at each bus seeks to maximize local, not system-wide, rewards. Further research could also investigate the optimal combination of local energy storage together with PV panels for real-time operation.

Chapter 9

Using PMU Data for Anomaly Detection and Localization

9.1 Dynamics from PMU This section studies the fundamental dimensionality of synchrophasor data for identifying existing events. Based on such analysis, we propose an online application for early event detection using reduced dimensionality. First, the dimensionality of the phasor measurement unit (PMU) data under both normal and abnormal conditions is analyzed. This suggests an extremely low underlying dimensionality despite many raw measurements. An early event detection algorithm based on the change of core subspaces of the PMU data at the occurrence of an event is proposed. Theoretical justification for the algorithm is provided using linear dynamical system theory. Numerical simulations using both synthetic and realistic PMU data are conducted to validate the proposed algorithm.

9.1.1 Introduction This section is motivated by the need for real-time analytics to make better use of the streaming data collected from the increasing deployment of phasor measurement units (PMUs). Given the strong capability of synchrophasor measurements for security assessment [589–591], a large number of other intelligent electronic devices (IEDs) with PMU functionality, such as frequency monitoring network (FNET) [592] and frequency disturbance recorder (FDR) [593], are rapidly being brought online. As an illustration, China has coverage from about 1717 PMUs as of 2013 [594]; in the United States, there have been about 500 PMUs installed by July 2012, and another 800 were installed by the end of 2014 [595, 596]. There have been numerous discussions about utilizing PMUs to improve widearea monitoring, protection, and control (WAMPAC) [597–599]. For example, the Lyapunov exponents of the voltage phasors are utilized to monitor the short© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Xie et al., Data Science and Applications for Modern Power Systems, Power Electronics and Power Systems, https://doi.org/10.1007/978-3-031-29100-5_9

351

352

9 Using PMU Data for Anomaly Detection and Localization

term voltage stability [600]. A PMU-based adaptive technique for transmission line fault detection and location is proposed using the discrete Fourier transform [601, 602]. The phasor angle measurements are employed together with the system topology to detect line outages [603]. ABB produces a monitoring system that can monitor phase angle stability, line thermal limits, voltage stability, and power system oscillations using PMU measurements [604]. Given the increasing amount of PMU data, it has become a challenge to determine how to best manage and leverage the increasing amount of data from synchrophasors for real-time operational benefits. Just one phasor data concentrator (PDC) collecting data from 100 PMUs of 20 measurements each at a 30 Hz sampling rate generates over 50 GB data one day [605]. From a research perspective, the large deployment of synchrophasors raises several open questions: (1) What is the underlying dimensionality of the massive PMU data in wide-area power systems? (2) Does the underlying dimensionality change as the system operating conditions change? (3) Can such a change of dimensionality indicate the occurrence of an event in power system real-time operations? (4) Is there any fundamental connection between the PMU data-driven analytics and the model-based analysis of power systems? These are new questions that conventional model-based approaches alone cannot address. In this section, by exploring the underlying dimensionality of PMU data, we propose theoretical justifications for an early event detection algorithm, which lends itself to early anomaly detection. Based on principal component analysis (PCA), the dimensionality reduction analysis provides a significantly lower dimensional “signature” of the states in the overall power system [606]. At the occurrence of a system event, an alert from the early event detection algorithm is issued whenever a large value of the proposed event indicator, induced by the change of the core subspaces of the PMU data, is detected. The key features of the proposed algorithm are (a) it is an online data-driven approach requiring no knowledge of the system model/topology; (b) it implements the dimensionality reduction at the adaptive training stage to extract the key features of the embedded high-dimensional PMU data; (c) it performs event detection using a much-reduced number of PMUs as “pilots,” which is computationally desirable in real-time operations; (d) it is theoretically justified using linear dynamical system theory; (e) for the online event detection, it does not require lengthy buffering of data, which is required in the alternative approaches based on frequency-domain analysis; and (f) it is capable of detecting system events at an earlier stage than would be possible by monitoring the raw PMU data. This section is organized as follows. Section 9.1.2 presents a linear PCA-based approach to analyze dimensionality reduction of synchrophasor data. Based on the dimensionality analysis of PMU data, an online of an early event detection algorithm is proposed in Sect. 9.1.3, with a theoretical justification using linear dynamical system theory. In Sect. 9.1.4, numerical examples utilizing synthetic PMU data from a power system simulator for engineering (PSS/E) and realistic PMU data from

9.1 Dynamics from PMU

353

Texas and Eastern Interconnections are provided to validate the proposed algorithm. Conclusions and possible future research directions are presented in Sect. 9.1.5.

9.1.2 Linear Analysis of Synchrophasor Dimensionality Dimensionality analysis and reduction of PMU data have been studied in recent literature due to the increasing size of PMU data [606, 607]. As one of the most used linear dimensionality reduction methods, PCA reduces the dimensionality by preserving the variance of the original data [608, 609]. Its fast computation feature alone is greatly attractive in the areas of coherency identification [610], extraction of fault features [611], and fault location [612], aside from its considerable benefits for visualization. In this section, we propose a PCA-based approach to reducing the dimensionality of streaming PMU data. Let p denote the number of available PMUs across the whole power network, each providing . measurements. It is anticipated that there could be up to thousands of PMUs in interconnected power systems, with each PMU providing up to 20 different measurements1 at each sample [595, 596, 605]. At each time sample, a total of .NT := p ×  measurements are collected, indicating the difficulty of online data analytics. For each PMU, the . measurements could include a variety of variables, such as frequency, voltage magnitude, etc. In this section, we conduct the dimensionality analysis for each category of measurements, independently. In other words, we assume that at each round of analysis, .N := p. Then, we define the measurement matrix .Yn×N := [y (1) , . . . , y (N ) ] containing the N measurements. (i) (i) Each measurement has n past samples, i.e., .y (i) := [y1 , . . . , yn ]T , .i = 1, . . . , N. The PCA-based dimensionality analysis is described as follows, with the flowchart shown in Fig. 9.1: T (1) Calculate the covariance matrix of Y : .CY := Yn×N Yn×N . (2) Calculate the N eigenvalues and eigenvectors of .CY . (3) Rearrange the N eigenvalues in decreasing order, with the eigenvectors being the principal components (PCs). (4) Out of the N PCs, select the highest m, which preserve a cumulative variance satisfying . m i=1 vari ≥ τ . .τ is a pre-defined variance threshold, and .m  N . (5) Form a new m-dimensional subspace from the top m PCs. (6) Project the original N variables onto the m-dimensional PC-based space. −1) Among the . N×(N pairs of projected vectors, select .m ≤ m vector-based 2 (1)

(m )

variables to form the basis matrix .YB := [yb , . . . , yb



] ∈ Rn×m , such that

1 Considering the different numbers of measurements provided by different types of IEDs, this section utilizes the industry-grade PMU devices, which provide 20 measurements at each sample [607].

354

9 Using PMU Data for Anomaly Detection and Localization

Fig. 9.1 Implementation of the early event detection algorithm

cosθ =

.

(i)

(j )

yb ·yb

(j ) (i) |yb |·|yb |

≈ 0, .i, j = 1, . . . , m , i.e., the .m variables should be as

orthogonal to each other as possible. The .m variables are henceforth denoted as the “pilot PMUs,”2 which will be utilized during the online detection described in Sect. 9.1.3. The remaining .(N − m ) PMUs are denoted as “nonpilot PMUs.”3 .YB , containing the .m pilot PMUs, forms a linear basis for each of the original N measurements. (7) Represent the non-pilot PMUs .y (i) in terms of .YB , where .y (i) ⊆ Y and .y (i)  (i) T YB , .i = 1, . . . , N − m . Let .v (i) := [v1(i) , . . . , vm  ] be the vector of the linear regression coefficients for the approximation, i.e., 

y

.

(i)



m 

(j )

vj(i) · yb = YB · v (i) .

(9.1)

j =1

2 Pilot PMUs correspond to the .m PCs that are preserved after the PCA-based dimensionality reduction. In reality, some practical concerns could also be included in the determination of the pilot PMUs. For example, for some topologically and physically significant buses, their installed PMUs can be enforced to be pilot PMUs. 3 Excluding the pilot PMUs from the total N available PMUs, the rest are denoted as non-pilot PMUs. In reality, some PMUs can also be enforced to be non-pilot PMUs if they are historically eventful.

9.1 Dynamics from PMU

355

Considering the many training PMU data in the dimensionality analysis, it follows that .n m . Therefore, using .y (i) and .YB from the training data, .v (i) can be calculated by solving the overdetermined problem (9.1) as v (i) := (YBT YB )−1 YBT y (i) ,

.

(9.2)

in which the squared approximation error . y (i) −YB ·v (i) 2 is minimized [613]. Using (9.2), each non-pilot PMU measurement vector can be represented in terms of the pilot PMUs. The dimensionality of the PMUs across the whole network can therefore be reduced from N to .m , where .m  N. In such a case, the independent system operators (ISOs) or the vendors can utilize the pilot PMUs to approximate some selected non-pilot PMUs and detect the changes of the system operating conditions in real-time operations.

9.1.3 Online Event Detection Using PMU Data If the massive PMU data essentially lie in a much-reduced dimensional space, ISOs or vendors can leverage the change in the underlying subspaces of the PMU data to visualize and detect system events at an early stage. A system event is defined in this section as a change of system topology, operating conditions, or control inputs. In this section, we propose such an algorithm with the following features: (a) only a reduced number of PMUs are needed; (b) it is online implementable; (c) it is theoretically justified using linear dynamical system theory; (d) the implementation of the algorithm can be done without the knowledge of any underlying physical model of the system; and (e) it can detect a system event at a very early stage (within 100 ms in our study). This proposed early event detection algorithm consists of two parts, as shown in the flowchart in Fig. 9.1. Figure 9.2 provides an overview of the proposed early event detection algorithm to be implemented in power systems.

9.1.3.1

Adaptive Training

The adaptive training utilizes the PCA-based dimensionality reduction proposed in Sect. 8.2.9 to compute the linear coefficients .v (i) ’s in (9.2). When the training period is defined as .Ttrn . It is presumed to have been taken in normal operating conditions. At the current time .t0 , the PMU data in .Ttrn , in the normal operating conditions, are employed to form the measurement matrix .Yn×N (t0 ) for training. Denote the updated period as .Tup . .Tup is a system-dependent variable and usually can be chosen as 3–5 minutes. The adaptive training mechanism is designed as follows:

356

9 Using PMU Data for Anomaly Detection and Localization

Fig. 9.2 Overview of the early event detection algorithm

(1) If there is no event occurring during a period of .Tup , the training procedure is adaptively updated every .Tup time units. (2) If an event is detected within .Tup , the training procedure is updated immediately after the system recovers from the event.

9.1.3.2

Robust Online Monitoring

The robust online monitoring utilizes the pilot PMU measurements at current time t, and the coefficients .v (i) are calculated from the adaptive training to approximate the measurements of some selected non-pilot PMUs at the same time. Under normal operating conditions, the predictor coefficients .v (i) provide accurate approximations of non-pilot .y (i) because of the usage of normal-operating-condition data in the training procedure. Whenever an event occurs, the spatial dependencies inside the power system will change, resulting in the deterioration of the approximations, leading to large approximation errors. Whenever a significant approximation error is noticed, an event alert is declared for the purpose of a corrective control.

9.1 Dynamics from PMU

357

Assume that the real-time approximation of the ith non-pilot PMU is y(t) ˆ (i) := YBmeas (t) · v (i) ,

.

(9.3)

where .YBmeas (t) is the measured .YB at time t and .v (i) is from the adaptive training in Sect. 9.1.3.1. Define the relative approximation error of the i-th non-pilot PMU as e(t)(i) :=

.

y(t) ˜ (i) × 100%, y(t)(i),meas

(9.4)

where .y(t)(i),meas represents the real-time measurement of the ith non-pilot PMU at time t and .y(t) ˜ (i) := y(t) ˆ (i) − y(t)(i),meas is the absolute approximation error. The occurrence of events can be monitored by using .e(t) of some selected non-pilot PMUs. Numerically, because of the per unit scale of power system variables, .e(t)(i) are too small to be accurately identified at the occurrence of events. Therefore, we propose a real-time event indicator .η(t) for the ith non-pilot PMU, for the purpose of early event detection, as η(t)(i) :=

.

e(t)(i) (i)

,

(9.5)

enormal

(i)

where .enormal is the mean value of .e(t)(i) calculated under normal operating conditions. Whenever .η(t) becomes larger than a pre-specified threshold .γ , an event alert is issued. Given the fact that the PMU samples at a rate of 30 Hz or higher, an alert can be issued only two samples later after the occurrence of an event. Such a swift alert is capable of quickly identifying system events in real-time situations. Proposition 9.1 Using the proposed event indicator (9.5), a system event can be detected within 2–3 samples of PMUs, i.e., within 100 ms, for some selected nonpilot PMU i, the event indicator satisfies .

    η(t)(i)  ≥ γ ,

(9.6)

where .γ is a system-dependent threshold that can be calculated using historical event PMU data. Proof Large-scale power systems can be described by a coupled set of nonlinear differential and algebraic equations (DAEs) [614] x(t) ˙ = f (x(t), uo (t), h(t), q), .

(9.7)

0 = g(x(t), uo (t), h(t), q),

(9.8)

.

where .x(t) and .uo (t) represent the power system dynamic state and input vectors, respectively. .h(t) defines the algebraic variables-real and reactive power injections.

358

9 Using PMU Data for Anomaly Detection and Localization

q denotes the time-invariant system parameters. Differential Equation (9.7) consists of all the system dynamics, including generators, wind turbines, loads, etc. Algebraic Equation (9.8) represents the real and reactive power balance equations. We linearize the nonlinear DAEs (9.7) and (9.8) around one system equilibrium point (one operating condition) and eliminate the algebraic equations by Kron Reduction [615]. The resulting continuous linear time-invariant (LTI) state space model is x(t) ˙ = Ax(t) + Bu(t) + α(t), .

(9.9)

y(t) = Cx(t) + Du(t) + ε(t),

(9.10)

.

where .x(t) and .y(t) are the state and measurement vectors, respectively, with corresponding system matrices A, B, C, and D, which usually satisfies .D ≈ 0 in power systems. .u(t) is the augmented input vector including the original system inputs .uo (t) with the net injections .h(t) of real and reactive power [615]. .α(t) ∼ N (0, Q) and .ε(t) ∼ N(0, R) are assumed to be uncorrelated white noises representing the modeling and measurement errors, respectively. Assume (1) a zero-order hold of .u(t); (2) a continuous integration of .ε(t); and (3) .D ≈ 0. The discretization of (9.9) and (9.10) with sampling time T yields [616]   x[k + 1] = eAT x[k] + A−1 eAT − I Bu[k] + α[k], .

.

y[k] = Cx[k] + ε[k],

(9.11) (9.12)

where  .

α[k] ∼ N (0, Qd ) , Qd =

T

τ =0

T

eAτ QeA τ dτ,

(9.13)

ε[k] ∼ N (0, Rd ) , Rd = R. Recursively substituting (9.11) into (9.12), the general expression for the measurement column vector at time k can be represented as y[k] = C(eAT )k−1 x[1] +

k−1 

.

C(eAT )l−1 A−1 (eAT − I )Bu[k − l] + ε[k]

l=1

= yx [k] + yu [k] + yε [k],

(9.14)

where .x[1] stands for the first system state in the training data and .u[·] represents the inputs at each time step before time k. To generalize the proof, we further assume (a) each measurement represents one PMU and (b) a total number of N measurements are analyzed, each having n samples for training, i.e., .Y (t0 ) ∈ Rn×N . Therefore, the kth sample/row of Y can be represented as .Y (k) := [y (1) [k], . . . , y (N ) [k]] ∈ R1×N . Denote the observation

9.1 Dynamics from PMU

359 (i)

(i)

matrix as .C := [c(1) , . . . , c(N ) ]T , where .c(i) := [c1 , . . . , cM ]. M is the total number of system states, usually huge and unknown in reality. Correspondingly, (i) [k] = c(i) x[k] + ε (i) [k], .i = 1, . . . , N . .y In order to prove the capability of the early event detection algorithm for early event detection, assume the following: (i) all the PMU data for adaptive training are under normal operating conditions. Equivalently, (i.1) .u[k] = u0 is a constant input vector for .k = 0, . . . , N − 1; (i.2) the initial condition .x[1] stays the same; (i.3) the system matrices A, B, and C stay the same. (ii) Only one system event occurs at a time .t > N + 1.4 Using (9.14), the general form for the ith measurement/column in Y can be represented as ⎤ ⎡ (i) ⎤ ⎡ ⎤ ⎡ (i) c(i) ε [1] y [1] ⎥ ⎢ ⎢ ε(i) [2] ⎥ ⎢ y (i) [2] ⎥ ⎢ c(i) eAT ⎥ ⎥ ⎥ ⎢ (i) ⎥ x [1] + ⎢ .y =⎢ . ⎥=⎢ ⎢ . ⎥ . ⎥ .. ⎣ .. ⎦ ⎣ .. ⎦ ⎢ ⎦ ⎣  n−1 y (i) [n] ε(i) [n] c(i) eAT ⎤⎡ ⎤ ⎡ (i) cu,1 0 0 0 · · · 0 u0 ⎥ ⎢ (i) (i) ⎢ cu,1 cu,1 0 0 · · · 0 ⎥ ⎢ u0 ⎥ ⎥ ⎥⎢ ⎢ (i) (i) (i) ⎢ ⎥ u0 ⎥ ⎢c c c ⎥ ⎢ u,1 u,1 u,2 0 · · · 0 ⎥ ⎢ ⎥ + ⎢ (i) (i) (i) (i) ⎥⎢ ⎢ ⎢ cu,1 cu,1 cu,2 cu,3 · · · 0 ⎥ ⎢ u0 ⎥ ⎥ ⎢ . .. ⎥ .. .. .. . . .. ⎥ ⎥⎢ ⎢ . ⎦ ⎣ . . . ⎦ ⎣ . . . . (i) (i) (i) (i) (i) u0 c c c c ··· c u,1

=

u,1

u,2

cx(i) x[1] + yε(i)

u,3

u,n−1

+ cu(i) U0 ,

(9.15)

 j −1 −1  AT  (i) e − I B. A where .cu,j = c(i) eAT Without loss of generality, assume the .m basis vectors in .YB are the first .m columns in .Y (t0 ). Therefore, using (9.2) and (9.3), .y (i) can be represented as 

y (i) ≈

m 

(j )

vj(i) · yb

j =1 .





m 

(i)

(j )

(j )

vj [cx x[1] + yε(j ) + cu U0 ]

(9.16)

j =1

= cx(i) x[1] + yε(i) + cu(i) U0 ,

4 In this section, we only consider the detection of a single event using the early event detection algorithm. The analysis and detection of multiple events or cascading events is a future avenue of research.

360

9 Using PMU Data for Anomaly Detection and Localization

where .i = m + 1, . . . , N. Equally, ⎡ .

⎣cx(i) −



m 





(i) (j ) vj cx ⎦ x[1] + ⎣yε(i) −

j =1





m 



vj yε(j ) ⎦ + ⎣cu(i) − (i)

j =1

= cx x[1] + yε + cu U0 ≈ 0.



m 

⎤ (i) (j ) vj cu ⎦ U0

j =1

(9.17)

As stated in (9.2), the .v (i) ’s are calculated by minimizing the squared error. Assume the calculation of .v (i) ’s is of absolute accuracy. Therefore, (I) the errors (i) in the calculations of the .v (i) ’s are zero; and (II) .enormal in (9.5) is almost zero. Consequently, for high-dimensional training data, the three terms in (9.17) can be assumed to be zero, respectively, i.e., cx x[1] ≈ 0, yε ≈ 0, cu U0 ≈ 0.

.

(9.18)

Now using (9.17) and (9.18), we will prove the capability of the early event detection algorithm to detect the following three types of system events:

Control Input Changes For the control inputs .U0 in (9.17), there are .n × M linear equalities. These equalities, which are not necessarily linearly independent, form an overdetermined condition. Under this overdetermined condition, the initial input vector .u0 can be theoretically calculated by minimizing the squared error. Under normal operating conditions, . cu U0 ≈ 0 holds from (9.18). When one of the control inputs change, the new input vector .Unew will not lie in the null space of . cu . Consequently, a large nonzero term . cu Unew will violate the zero approximation of (9.17) and thus impact the approximation error (9.4).

Initial Condition Changes Consider the term related to .x[1] in (9.17). There are n linear equalities to solve for x[1], in an overdetermined manner, .n M. Under normal operating conditions, . cx x[1] ≈ 0 holds as assumed in (9.18). A change of the initial condition will make the new condition .x[1]new lie outside the null space of . cx . This will result in a large nonzero term . cx x[1]new , which violates the zero approximation of (9.17) and results in a large approximation error in (9.4). .

9.1 Dynamics from PMU

361

System Topology Changes During normal operating conditions, .x[1] and .U0 can be theoretically calculated by the overdetermined .(n + n × M) equalities. In other words, they lie in the null space of . cx and . cu , respectively. A change of topology from A into .Anew will yield changes of . cx,new and . cu,new , as shown in (9.15). These changes will further induce changes in the corresponding null spaces, in which .x[1] and .U0 will consequently not lie. As a result, a large nonzero term (. cx,new x[1] + cu,new U0 ) will violate the zero approximation of (9.17), and the approximation error (9.4) will be large. For the above three types of system events, the occurrence of any one event will result in a nonzero approximation error (9.4), which serves as the numerator of the (i) event indicator .η(t) in (9.5). With an almost zero denominator .enormal calculated from normal operating conditions, .η(t) will become larger at the occurrence of any one of the system events. For some selected non-pilot PMUs, historical data with known system  events  can be utilized to calculate the system-dependent threshold .γ . Whenever .η(t)(i)  ≥ γ in (9.6), a system event will be issued, and an alert will be declared for the purpose of a further corrective control.

Remark 9.1 As shown in Fig. 9.1, starting from time .t0 , if there is no event detected during the updating period .Tup , i.e., the time interval .Tt−t0 between the current time t and .t0 satisfies .Tt−t0 ≥ Tup , then the adaptive training procedure is conducted by updating the measurement matrix Y with the latest .Ttrn s data.

9.1.4 Numerical Examples In this section, we illustrate the efficacy of the early event detection algorithm, including the dimensionality reduction, the adaptive training, and the early anomaly detection. Both synthetic and realistic examples are utilized. Siemens PSS/E [617] is utilized to generate the synthetic PMU data, and the realistic data are provided by Texas and Eastern Interconnections.

9.1.4.1

Dimensionality Reduction of Synchrophasor Data

In this section, the efficacy of dimensionality reduction for synchrophasor data will be illustrated. Synchrophasor data from normal operating conditions are utilized

362

9 Using PMU Data for Anomaly Detection and Localization

Fig. 9.3 Topology of PSS/E 23-bus system [617]

in the adaptive training procedure. Assume that the length of the training data is Ttrn = 250s.5

.

Dimensionality Reduction of Synthetic PSS/E Data A 23-bus 6-generator system in PSS/E is utilized to generate the PMU data. Figure 9.3 serves as a demonstration of the system topology, which is not necessary in either part of the early event detection algorithm. Table 9.16 lists the dynamic

5 Realistically, more synchrophasor data under normal operating conditions could be utilized in the

adaptive training to obtain more accurate and robust training models. 6

GENROU: round rotor generator model GENSAL: salient pole generator model

9.1 Dynamics from PMU

363

Table 9.1 Dynamic models in PSS/E system Bus no. 101 102 206 211 3011 3018

Generator model GENROU GENROU GENROU GENSAL GENROU GENROU

Exciter model IEEET1 IEEET1 IEEET1 SCRX SEXS SEXS

Turbine/governor model TGOV1 TGOV1 TGOV1 HYGOV N/A N/A

models [617] for the six generators employed in the PSS/E system. Assume each bus has one PMU installed, and the sampling rate is 30 Hz. To mimic the industrygrade PMUs, noise is added to the synthetic training data so that the signal-to-noise ratio (SNR) is 92 dB. For bus frequency .ω, the cumulative variance calculated from PCA is shown in Fig. 9.4a. The first two PCs alone preserve over .99.99% of the variance. Similarly, in Fig. 9.4b, two PCs preserve .80% of the cumulative variance for voltage magnitude Vm ω ω .Vm . If .τ P SS/E = 99.99% and .τP SS/E = 80% are assumed, then .mP SS/E = 

mPVmSS/E = 2 can be selected for both .ω and .Vm . The corresponding basis matrices Vm ω are .YB,P SS/E = [ω206 , ω102 ] and .YB,P SS/E = [Vm153 , Vm201 ]. In order to illustrate the robustness of the proposed training procedure, assume Vm ω the resulting .YB,P SS/E and .YB,P SS/E are utilized for all the online synthetic cases before the first adaptive training takes place.

9.1.4.2

Dimensionality Reduction of Realistic Texas Data

Seven PMU datasets from Texas Interconnection are trained in this section to serve for the online early event detection algorithm. No system topology could be provided due to the confidentiality of the interconnected areas and the modeling complexity of the system components. The cumulative variances for .ω and .Vm m are shown in Fig. 9.5. Assume .τTωX = 99.9% and .τTVX = 70%. Then .mTωX = 2 Vm ω and .mT X = 3 can be chosen. The basis matrices are .YB,T X = [ω5 , ω3 ] and Vm YB,T X = [V2 , V1 , V7 ].

.

IEEET1: 1968 IEEE type 1 excitation system model SCRX: bus or solid fed SCR bridge excitation system model SEXS: simplified excitation system model TGOV1: steam turbine-governor model HYGOV: hydro turbine-governor model N/A: no model for the component

364

9 Using PMU Data for Anomaly Detection and Localization (a) 100.005

Percentage (%)

100 99.995 99.99 99.985 99.98 99.975

0

5

10 15 Number of PCs

20

25

20

25

(b)

Percentage (%)

100 90 80 70 60 50

0

5

10 15 Number of PCs

Fig. 9.4 Cumulative variance preserved by PCs for PSS/E data. (a) Cumulative variance for bus frequency .ω in PSS/E data. (b) Cumulative variance for voltage magnitude .Vm in PSS/E data

It can be observed that the variance thresholds in the Texas case are less than m those in the synthetic case, respectively, i.e., .τTωX < τPωSS/E and .τTVX < τPVmSS/E . One reason, even if noise with an SNR of 92 dB is added to the synthetic data, it is still difficult to accurately mimic the changes of realistic system operating conditions.

Dimensionality Reduction of Realistic Eastern Data Fourteen PMU datasets are provided by the Eastern Interconnection for bus frequency analysis and eight for voltage magnitude. The cumulative variances are ω ω shown in Fig. 9.6. For bus frequency .ω, assuming .τEast = 99.6% with .mEast = 2, Vm Vm ω the basis matrix is .YB,East = [ω11 , ω6 ]. Assuming .τEast = 80%, .mEast = 3 is Vm selected for voltage magnitude with basis matrix .YB,East = [V8 , V5 , V2 ]. As can be noticed from Figs. 9.4, 9.5 and 9.6, for both .ω and .Vm , the dimensionality reduction can be achieved by the PCA-based method proposed in Sect. 9.1.2. Because of the localization property of voltage magnitude, the numbers of basis Vm ω ≤ m(·) . vectors for .ω and .Vm usually satisfy .m(·)

9.1 Dynamics from PMU

365 (a)

Percentage (%)

100 99.95 99.9 99.85 99.8 99.75 99.7

1

2

3

4 Number of PCs

5

6

7

5

6

7

(b) 100 Percentage (%)

90 80 70 60 50 40 30 1

2

3

4 Number of PCs

Fig. 9.5 Cumulative variance preserved by PCs for Texas data. (a) Cumulative variance for bus frequency .ω in Texas data. (b) Cumulative variance for voltage magnitude .Vm in Texas data

Note that we separately analyze the dimensionality reduction of .ω and .Vm using the PCA-based method. It is known that .ω is a global variable, which has a similar profile throughout the whole power grid, while .Vm is a local variable because of the voltage-level difference. This property of .ω and .Vm can be illustrated in the loading plots shown in Fig. 9.7 for the PSS/E data. In each of the loading plots, the lines demonstrate the projections of the original measurement vectors onto the PCbased space. The separation of the lines indicates how much the original vectors are correlated with each other. As shown in Fig. 9.7, .ω has a concentrated characteristic, while .Vm is much more dispersed. Combining .ω and .Vm in PCA will result in combined dispersed characteristics. Consequently, the concentrated properties of .ω will be concealed by the dispersed properties of .Vm , leading to the further inaccuracy of the dimensionality analysis.

366

9 Using PMU Data for Anomaly Detection and Localization (a)

Percentage (%)

100

99.8

99.6

99.4 2

4

6

8 Number of PCs

10

12

14

7

8

(b)

Percentage (%)

100 90 80 70 60 50

1

2

3

4 5 Number of PCs

6

Fig. 9.6 Cumulative variance preserved by PCs for Eastern Data. (a) Cumulative variance for bus frequency .ω in Eastern Data. (b) Cumulative variance for voltage magnitude .Vm in Eastern Data

9.1.4.3

Online Event Detection Using the Early Event Detection Algorithm

In this section, both PSS/E data and Texas data are utilized to validate online event detection using the proposed algorithm. The Eastern Data will not be employed due to the fact that the data provided do not contain any events.

Online Event Detection of Synthetic PSS/E Data Three types of system events, line tripping, unit tripping, and control input change, are simulated in the PSS/E 23-bus system, with the event details shown in Fig. 9.8. The control input change event is as the Initial Condition Changes in Sect. 9.1.3.2, and the line tripping and unit tripping events correspond to the system topology changes in the System Topology Changes Sect. 9.1.3.2. The initial condition

9.1 Dynamics from PMU

367 (a)

0.6

0.4

PC2

0.2

0

−0.2

−0.4

0

0.05

0.1

0.15

0.2

0.25

PC

1

(b) 0.4

0.3

PC

2

0.2

0.1

0

−0.1

−0.2

−0.1

0 PC1

0.1

0.2

0.3

Fig. 9.7 2-D loading plot of PSS/E data. (a) 2D loading plot for bus frequency. (b) 2D loading plot for voltage magnitude

changes are not simulated in this section due to the difficulty of mimicking realistic system operating condition changes in PSS/E. To demonstrate the efficacy of the adaptive training, assume the following: (1) the training procedure conducted in Sect. 9.1.4.1 works for all the three types of system events before the first adaptive training takes place; (2) it takes 10s for the system to return to normal operating conditions after an event; (3) the updating period is .Tup = 40s; and (4) if an update is needed, the retraining period is the same as the original training period, i.e., .Tretrn = Ttrn = 250s.

368

9 Using PMU Data for Anomaly Detection and Localization

Fig. 9.8 Timeline of three simulated system events in PSS/E. (a) Line tripping event. (b) Unit tripping event. (c) Input change event

Line Tripping Event As shown in Fig. 9.8a, assume the transmission line connecting buses 152 and 202 (Line 152–202) is tripped at .t = 10s, followed by the closure of Line 152–202 at .t = 70s. The total data length is 100s. The bus frequency profile for bus 153 during the events is shown in Fig. 9.9. As can be observed, it takes about 10s for the system to recover from either event, i.e., the system recovers to normal operating conditions at .t = 20s and .t = 80s, respectively. With .Tup = 40s, the training model is updated at time .t = 60s with ω the latest 250s data. The updated basis matrices are .YB,P SS/E = [ω3011 , ω101 ] and Vm YB,P SS/E = [Vm152 , Vm202 ] correspondingly. The Line 152–202 closure event is detected using the updated basis matrices. ω of bus 153, which can detect both Figure 9.10 illustrates the event indicator .η153 events. A zoomed-in view of the early detection of the tripping events is presented in Fig. 9.10b and c, showing the capability to detect the event almost instantly, ω within 40 ms. Line 152–202 tripping at .t = 10s results in a huge value of .η153 at the next step .t = 10.033s. This indicates a large approximation error of the linear representation. Comparatively, as can be seen from the frequency profile in Fig. 9.9b, at time .t = 10.033s, the bus frequency deviation is . ω153 ≈ 0.00005 p.u.,

.

9.1 Dynamics from PMU

369 (a)

1.0006

Bus Frequency (p.u.)

1.0005 1.0004 1.0003 1.0002 1.0001 1 0.9999 0.9998 0.9997 0

10

20

30

40

50 60 Time (second)

80

90

100

(c)

(b) 1.0004

1.0006

1.0003 Bus Frequency (p.u.)

1.0005 Bus Frequency (p.u.)

70

1.0004 1.0003 1.0002 1.0001

1.0002 1.0001 1 0.9999 0.9998 0.9997

1 9

9.5

10 10.5 11 Time (second)

11.5

12

0.9996 69

70 71 Time (second)

72

Fig. 9.9 .ω153 profile during the line 152–202 tripping and closure events in PSS/E. (a) .ω153 profile during line 152–202 tripping and closure events. (b) Zoomed-in .ω153 profile during line 152–202 tripping event. (c) Zoomed-in .ω153 profile during line 152–202 closure event

which is too small to be identified as an event. When a relatively large deviation ω153 ≈ 0.0004 p.u. is detected, it is already 250 ms later than the occurrence time of the event. Similar results can be observed for the line closure event at time .t = 70s. In this sense, the advantage of the proposed algorithm is illustrated. Another observation is from the comparison of Fig. 9.10b and c: the maximum deviation of the event indicator in Fig. 9.10c is much smaller than that in Fig. 9.10b. The reason comes from the adaptive training, i.e., the retraining takes the eventful data into consideration and, therefore, improves the accuracy of the training model. However, from Fig. 9.10c, the capability to detect early system events is not affected by this improvement. The two system events will not occur as close as those in this case. Therefore, the retraining data will not always contain eventful data. In addition, by choosing an appropriate length of training data, this kind of improvement can also be avoided, and the training model can be accurate and robust enough to detect the event at an early stage, as shown in Fig. 9.10c. .

370

9 Using PMU Data for Anomaly Detection and Localization 8

2

(a)

x 10

Event Indicator η

ω

Line Closure Event 1 0 −1

Retraining Point

−2 −3

0

10

20

40

50 60 Time (second)

(b)

8

2

30

x 10

6

x 10

70

80

90

100

(c)

ω

1

Event Indicator η

Event Indicator η

ω

4

0 −1

2 0 −2 −4

−2 9

10 11 Time (second)

12

−6 69

70 71 Time (second)

72

ω during line 152–202 tripping and closure events in PSS/E. (a) .ηω Fig. 9.10 Event indicator .η153 153 ω during line 152–202 tripping during line 152–202 tripping and closure events. (b) Zoomed-in .η153 ω during line 152–202 closure event event. (c) Zoomed-in .η153

Unit Tripping Event As shown in Fig. 9.8b, a unit at bus 3011 is tripped at .t = 40s. Figure 9.11 shows ω the bus frequency profile of bus 3002, with the event indicator .η3002 in Fig. 9.12. In Fig. 9.11b, from the bus frequency deviation . ω3002 ≈ 0.0005 p.u. at .t = 40.033s, it is difficult to directly detect the unit tripping event. However, ω from Fig. 9.12b, unit 3011 tripping yields a huge value of the event indicator .η3002 provided by bus 3002 at .t = 40.033s. This clearly demonstrates that the proposed event indicator is capable of early event detection by magnifying the difference between the quantities for the normal condition and the contingency.

Control Input Change Event As shown in Fig. 9.8c, the voltage regulator setpoint of bus 211 is changed by .0.01 p.u. and .−0.01 p.u. at .t = 10s and .t = 70s, respectively. In this case, the training

9.1 Dynamics from PMU

371 (a)

Bus Frequency (p.u.)

1 0.999 0.998 0.997 0.996 0

10

20

30

40

50 60 Time (second)

70

80

90

100

(b)

Bus Frequency (p.u.)

1 0.999 0.998 0.997 0.996 39

39.5

40

40.5 Time (second)

41

41.5

42

Fig. 9.11 .ω3002 profile during unit 3011 tripping event in PSS/E. (a) .ω3002 profile during unit 3011 tripping event. (b) Zoomed-in .ω3002 profile during unit 3011 tripping event

models are updated at time .t = 60s with the latest .Tretrn = 250s data. The updated Vm ω basis matrices correspondingly are .YB,P SS/E = [ω211 , ω3011 ] and .YB,P SS/E = 211 = −0.01 p.u. is detected using the [Vm101 , Vm102 ]. The second event with . Vref updated basis matrices. ω provided by bus 201 is capable of indicating In Fig. 9.14a, the event indicator .η201 211 = 0.01 both events. From Fig. 9.14b, the event of control input change by . Vref ω ≈ 108 , while from the bus frequency p.u. is detected around .t = 10.1s with .η201 profile in Fig. 9.13b, the bus frequency deviation . ω201 < 0.0001 p.u. at .t = 10.1s cannot be detected efficiently. Similar results can be observed from Figs. 9.13c and 9.14c. In this case, because of the adaptive training, the maximum deviation of the event indicator in Fig. 9.14c is smaller than that in Fig. 9.14b. However, this does not impact the efficacy and capability of the event indicator for the early event detection, as shown in Fig. 9.14c.

372

9 Using PMU Data for Anomaly Detection and Localization 8

Event Indicator η

ω

3

(a)

x 10

2 1 0 −1 −2 0

10

20

30

40

8

Event Indicator η

ω

3

50 60 Time (second)

70

80

90

100

(b)

x 10

2 1 0 −1 −2 39

39.5

40

40.5 Time (second)

41

41.5

42

ω ω Fig. 9.12 Event indicator .η3002 during unit 3011 tripping event in PSS/E. (a) .η3002 during unit ω during unit 3011 tripping event 3011 tripping event. (b) Zoomed-in .η3002

Online Event Detection of Realistic Texas Data In this case, we utilize the Texas data to demonstrate the early event detection algorithm with respect to its purely data-driven capability, i.e., without the knowledge of system topology or model. Both bus frequency and voltage magnitude are analyzed in this case. As can be observed from Figs. 9.15 and 9.17, there are two unit tripping events occurring around .t = 104s and .t = 863s. After the first event, it takes about 300s for the system to recover to normal operating conditions. In this case, assume the updating period is .Tup = 100s and the retraining period is .Tretrn = 250s. Therefore, according to the early event detection algorithm, the adaptive training results in Sect. 9.1.4.2 will work only for the first event. The latest training model before the second event will be updated at time .t = 800s with the latest 250s data. The detection of the second event will be achieved using the latest training model. In this case, the retraining data does not contain any events and therefore can better demonstrate the efficacy of the adaptive training.

9.1 Dynamics from PMU

373 (a)

Bus Frequency (p.u.)

1

1

0.9999

0.9999

0

10

20

30

40

50 60 Time (second)

(b)

70

80

90

100

(c)

Bus Frequency (p.u.)

Bus Frequency (p.u.)

1

0.9999

0.9999

9

10 11 Time (second)

12

1

1

0.9999

69

70 71 Time (second)

72

Fig. 9.13 .ω201 profile during bus 211 input change events in PSS/E. (a) .ω201 profile bus 211 211 = 0.01 p.u. during input change events. (b) Zoomed-in .ω201 profile during input change . Vref 211 event. (c) Zoomed-in .ω201 profile during input change . Vref = −0.01 p.u. event

The event indicator of bus 4 frequency, .η4ω , is shown in Fig. 9.16. In the zoomedin Fig. 9.16b and c, the changes of system operating conditions can be detected at .t = 104s and .t = 863.25s, respectively. However, from the bus frequency profile in Fig. 9.15b, the bus frequency deviation . ω4 ≈ 0.0005 p.u. at .t = 104s is too small to be detected early or accurately. Similar conclusions can be drawn for . ω4 ≈ 0.0005 p.u. at .t = 863.5s in Fig. 9.15c. For the same two events, the event indicator of bus 4 voltage magnitude, .η4Vm , is shown in Fig. 9.18. The changes of system operating conditions are detected at .t = 104.25s and .t = 863.3s, respectively, as shown in Fig. 9.18b and c. From the voltage magnitude profile in Fig. 9.17b, the voltage magnitude deviation . Vm4 ≈ 0.005 p.u. at .t = 104.25s is not as noticeable, and neither is . Vm4 ≈ 0.004 p.u. at .t = 863.3s. Note, there is a drop of value in Fig. 9.18a around .t = 800s. This is because of the updated training model, and it cannot be evaluated as an event.

374

9 Using PMU Data for Anomaly Detection and Localization (a)

8

1

x 10

Event Indicator ηω

Input Change ΔV211=−0.01 p.u. Event ref

0.5

0

−0.5

−1

Retraining Point

0

10

8

1

20

30

40

50 60 Time (second)

(b)

6

x 10

1

70

80

90

100

(c)

x 10

Event Indicator ηω

Event Indicator ηω

0.5 0.5

0

−0.5

0 −0.5 −1 −1.5 −2

−1

9

10 11 Time (second)

12

69

70 71 Time (second)

72

ω during bus 211 input change events in PSS/E. (a) .ηω during bus Fig. 9.14 Event indicator .η201 201 ω during input change . V 211 = 0.01 p.u. event. (c) 211 input change events. (b) Zoomed-in .η201 ref ω during input change . V 211 = −0.01 p.u. event Zoomed-in .η201 ref

For both .ω and .Vm , the advantages of the proposed event indicator .η for early event detection are demonstrated through Figs. 9.9, 9.10, 9.11, 9.12, 9.13, 9.14, 9.15, 9.16, 9.17 and 9.18. The comparisons of detection time for the simulated bus frequency cases are summarized in Table 9.2. As can be observed, the proposed algorithm is capable of detecting each of the simulated events earlier than would be possible by only using the bus frequency profiles themselves. This will benefit the ISOs and the vendors for real-time operations and to efficiently apply the corrective controls.

9.1 Dynamics from PMU

375 (a)

Bus Frequency (p.u.)

1.001 1 0.999 0.998 0.997 0.996 0.995

0

100

200

300

400 500 Time (second)

600

700

(b)

900

(c)

During 1st Unit Tripping Event

During 2nd Unit Tripping Event

1

1.0005

0.9995

1 Bus Frequency (p.u.)

Bus Frequency (p.u.)

800

0.999 0.9985 0.998 0.9975

0.9995 0.999 0.9985 0.998

0.997 0.9965

0.9975 104

106

108 110 112 Time (second)

114

862

864

866 868 Time (second)

870

Fig. 9.15 .ω4 profile during unit tripping events of Texas data. (a) .ω4 profile during unit tripping events. (b) Zoomed-in .ω4 profile. (c) Zoomed-in .ω4 profile

9.1.5 Conclusion In this section, by exploring the dimensionality reduction of the PMU data, we have proposed an early event detection algorithm, along with theoretical justifications, to detect power system events at an early stage. A PCA-based dimensionality reduction method for the PMU data is implemented with an adaptive training procedure. A basis matrix, consisting only of the pilot PMUs, can be implemented to linearly approximate the non-pilot PMUs. The value of the approximation error is utilized to form an event indicator .η, which is designed for the robust data-driven early event detection in online monitoring. Both synthetic and realistic PMU data suggest the efficacy of the early event detection algorithm in detecting events in an online setting. Such detection is much faster than detection techniques that are based on direct frequency or voltage magnitude measurements. The presented work is only a first step toward understanding and utilizing dimensionality reduction of online PMU data for real-time monitoring. Much more

376

9 Using PMU Data for Anomaly Detection and Localization (a)

6

x 10

Event Indicator ηω

3 2 Retraining Point 1 0 −1 −2

0

100

200

300

400 500 Time (second)

600

700

(b) 6

4

x 10

1

st

900

(c)

Unit Tripping Event

6

2

3

x 10

2

nd

Unit Tripping Event

ω

1.5 Event Indicator η

Event Indicator ηω

800

2 1 0 −1 −2 103

1 0.5 0 −0.5

104

105 106 Time (second)

107

−1 862

863

864 865 Time (second)

866

Fig. 9.16 Event indicator .η4ω during unit tripping events of Texas data. (a) .η4ω during unit tripping events. (b) Zoomed-in .η4ω during. (c) Zoomed-in .η4ω during

research can be done in this direction. First, with the accumulation of more realistic event data, we plan to continue investigating the efficacy and robustness of the proposed algorithm. Second, given the fundamental nonlinearity arising in power systems, nonlinear methods will be further investigated for dimensionality analysis [618, 619]. Finally, online classification of specific categories of events, such as inter-area oscillations, deserves significant attention.

9.2 Asset Management This section proposes a data-driven algorithm for locating the source of forced oscillations and suggests a physical interpretation for the method. By leveraging the sparsity of forced oscillations along with the low-rank nature of synchrophasor data, the problem of source localization under resonance conditions is cast as computing the sparse and low-rank components using robust principal component

9.2 Asset Management

377

Voltage Magnitude (p.u.)

(a)

1.03

1.025

1.02

0

100

200

300

400 500 Time (second)

600

700

During 2

nd

(b)

800

900

(c)

st During 1 Unit Tripping Event

Unit Tripping Event

1.03 Voltage Magnitude (p.u.)

Voltage Magnitude (p.u.)

1.032 1.03 1.028 1.026 1.024 1.022 1.02 103

104

105 106 Time (second)

107

1.028 1.026 1.024 1.022 1.02 862

863

864 865 Time (second)

866

Fig. 9.17 .V4 profile during unit tripping events of Texas data. (a) .Vm4 profile during unit tripping events. (b) Zoomed-in .Vm4 profile. (c) Zoomed-in .Vm4 profile

analysis (RPCA), which can be efficiently solved by the exact augmented Lagrange multiplier method. Based on this problem formulation, an efficient and practically implementable algorithm is proposed to pinpoint the forced oscillation source during real-time operation. Furthermore, theoretical insights are provided for the efficacy of the proposed approach, by use of physical model-based analysis, specifically by highlighting the low-rank nature of the resonance component matrix. Without the availability of system topology information, the proposed method can achieve high localization accuracy in synthetic cases based on benchmark systems and real-world forced oscillations in the power grid of Texas.

378

9 Using PMU Data for Anomaly Detection and Localization (a)

5

x 10

Event Indicator ηVm

6

4

2 Retraining Point 0 0

100

200

300

400 500 Time (second)

600

700

(b) 5

7

x 10

5

3

x 10

2

nd

Unit Tripping Event

863

864 865 Time (second)

Vm

2.5

5

Event Indicator η

Event Indicator ηVm

900

(c)

st 1 Unit Tripping Event

6

4 3 2 1 0 103

800

2 1.5 1 0.5 0

104

105 106 Time (second)

107

−0.5 862

866

V during unit tripping Fig. 9.18 Event indicator .η4V during unit tripping events of Texas data. (a) .ηm4 V V events. (b) Zoomed-in .ηm4 during. (c) Zoomed-in .ηm4 during

Table 9.2 Comparison of detection time for bus frequency cases Case PSS/E line 152-202 tripping PSS/E unit 3011 tripping 211 .Vref Input change

Detection time (s) Direct profile 10.25 40.5 11

Proposed algorithm 10.033 40.033 10.1

Texas unit tripping

104.5

104

Bold numbers represent the biggest numbers

9.2 Asset Management

379

9.2.1 Introduction In the last section, we introduced data-driven methods to identify existing events. Now we consider the capability of using historical data to forecast an event in the future for situational awareness. To answer this, we investigate one type of system dynamical behaviors exposed by PMUs, namely, the forced oscillations (FOs). Forced oscillations have attracted significant attention within the power community. Forced oscillations are driven by exogenous periodical disturbances that are typically injected by malfunctioning power apparatuses such as wind turbines, steam extractor valves of generators, or poorly tuned control systems [620–622]. Cyclic loads, such as cement mills and steel plants, constitute another category of oscillation sources [620]. The impact of such injected periodic perturbation propagates through transmission lines and results in FOs throughout the grid; some real-world events of FOs since 1966 are reported in [620]. The presence of FOs compromises the security and reliability of power systems. For example, FOs may trigger protection relays to trip transmission lines or generators, potentially causing uncontrollable cascading failures and unexpected load shedding [623]. Moreover, sustained FOs reduce device lifespans by introducing undesirable vibrations and additional wear and tear on power system components; consequently, failure rates and maintenance costs of compromised power apparatuses might increase [623]. Therefore, timely suppression of FOs is important to system operators. One effective way of suppressing a forced oscillation is to locate its source, a canonical problem that we call forced oscillation localization, and then disconnect it from the power grid. A natural attempt to conduct forced oscillation localization could be tracking the largest oscillation over the power grid, under the assumption that measurements near the oscillatory source are expected to exhibit the most severe oscillations, based on engineering intuition. However, counterintuitive cases may occur when the frequency of the periodic perturbation lies in the vicinity of one of the natural modes of the power system, whence a resonance phenomenon is triggered [624]. In such cases, PMU measurements exhibiting the most severe oscillations may be geographically far from where the periodic perturbation is injected, posing a significant challenge to system operators in pinpointing the forced oscillation source. Note, such counterintuitive cases are more than a mere theoretical concern: one example occurred at the Western Electricity Coordinating Council (WECC) system on November 29, 2005, when a 20-MW forced oscillation initiated by a generation plant in Alberta, Canada, incurred a tenfold larger oscillation at the California-Oregon Inter-tie line that was 1100 miles away from the source [622]. Such a severe oscillation amplification significantly compromises the security and reliability of the power grid. Hence, it is imperative to develop a forced oscillation localization method that is effective even in the challenging but highly hazardous cases of resonance [625]. To pinpoint the source of FOs, several localization techniques have been developed. In [626], forced oscillation localization is achieved based on the following

380

9 Using PMU Data for Anomaly Detection and Localization

observation: the measurements near the source manifest distinct signatures in their magnitude or phase responses, in comparison to far-away measurements. Such an observation is interpretable based on classic generator models, but whether it is valid or not in a power system with complex generator dynamics remains an open question [626]. In [621], the authors leverage the oscillation energy flows in power networks to locate the source of sustained oscillations. In this energybased method, the energy flows can be computed using the preprocessed PMU data, and the power system components generating the oscillation energy are identified as the oscillation sources. Despite the promising performance of the energy-based methods [621], the rather stringent assumptions pertaining to knowledge of load characteristics and the grid topology may restrict its use to specific scenarios [625, 627]. Reference [627] provides a comprehensive summary of FO localization methods. More recent research on FO localization is reported in [628] and [629]. In [628], the oscillation source is located by comparing the measured current spectrum of system components with one predicted by the effective admittance matrix. However, the construction of the effective admittance matrix requires accurate knowledge of system parameters that may be unavailable in practice. In [629], generator parameters are learned from measurements based on prior knowledge of generator model structures, and, subsequently, the admittance matrix is constructed and used for FO localization. Nevertheless, model structures of generators might not be known beforehand, owing to the unpredictable switching states of power system stabilizers [630]. Thus, it is highly desirable to design a FO localization method that does not heavily depend upon the availability of the first-principle model and topology information of the power grid. In this section, we propose a entirely data-driven yet physically interpretable approach to pinpoint the source of FOs in the challenging resonance case. By leveraging the sparsity of the FO sources and the low-rank nature of high-dimensional synchrophasor data, the problem of forced oscillation localization is formulated as computing the sparse and low-rank components of the measurement matrix using robust principal component analysis (RPCA) [631]. Based on this problem formulation, an algorithm for real-time operation is designed to pinpoint the source of FOs. The main merits of the proposed approach include the following: (1) it does not require any information on dynamical system model parameters or topology, thus providing an efficient and easily deployable practical implementation; (2) it can locate the source of FOs with high accuracy, even when resonance phenomena occur; and (3) its efficacy can be interpreted by physical model-based analysis. The rest of this section is organized as follows: Sect. 9.2.2 elaborates on the forced oscillation localization problem and its main challenges; in Sect. 9.2.3, the FO localization is formulated as a matrix decomposition problem, and a FO localization algorithm is designed; Sect. 9.2.4 provides theoretical justification of the efficacy of the algorithm; Sect. 9.2.5 validates the effectiveness of the proposed method in synthetic cases based on benchmark systems and real-world forced oscillations in the power grid of Texas; Sect. 9.2.6 summarizes the section and poses future research questions.

9.2 Asset Management

381

9.2.2 Localization of Forced Oscillations and Challenges 9.2.2.1

Mathematical Interpretation

The dynamic behavior of a power system in the vicinity of its operation condition can be represented by a continuous linear time-invariant (LTI) state-space model x˙ (t) = Ax(t) + Bu(t), .

(9.19a)

y(t) = Cx(t) + Du(t),

(9.19b)

.

where state vector .x ∈ Rn , input vector .u ∈ Rr , and output vector .y ∈ Rm collect the deviations of state variables, generator/load control setpoints, and measurements, from their respective steady-state values. Accordingly, matrices .A ∈ Rn×n , .B ∈ Rn×r , .C ∈ Rm×n , and .D ∈ Rm×r are termed as the state matrix, the input matrix, the output matrix, and the feedforward matrix, respectively. Typically, the input vector .u is not streamed to control centers, so the feedforward matrix D is assumed to be a zero matrix of the appropriate dimensions. Denoted by .L = {λ1 , λ2 , . . . , λn } the set of all eigenvalues of the state matrix A. The power system (9.19) is assumed to be stable, with all eigenvalues .λi ∈ C being distinct, i.e., .Re λi < 0 for all .i ∈ {1, 2, . . . , n} and .λi = λj for all .i = j . Note that the assumption on eigenvalue distinctness is only used for the purpose of simplifying the process of obtaining the time-domain solution of outputs in Sect. 9.2.4. Due to the large number of symbols in this section, the key symbols are summarized in [4] for the convenience of the readers. We proceed to formally define the concepts of a forced oscillation source and source measurements. Suppose that the l-th input .ul (t) in the input vector .u(t) varies periodically due to malfunctioning components (generators/loads) in the grid. In such a case, .ul (t) can be decomposed into J frequency components, viz., ul (t) =

J 

.

Pj sin(ωj t + θj ),

(9.20)

j =1

where .ωj = 0, .Pj = 0, and .θj are the frequency, amplitude, and phase displacement of the j -th frequency component of the l-th input, respectively. Equation (9.20) is effectively equivalent to the Fourier series representation of a periodic signal [632]. Consequently, the periodic input will result in sustained oscillations present in the measurement vector .y. The generator/load associated with input l is termed as the forced oscillation source, and the measurements at the bus directly connecting to the forced oscillation source are termed as source measurements. In particular, suppose the frequency .ωd of an injection component is close to the frequency of a poorly damped mode, i.e., there exists .j ∗ ∈ {1, 2, . . . , n}, ωd ≈ Im λj ∗ ,

.

Re λj ∗ ≈ 0.

(9.21)

382

9 Using PMU Data for Anomaly Detection and Localization

In such a case, resonance phenomena can be observed [624]. Hence, (9.21) is adopted as the resonance condition in this section. Studies on envelop shapes of FOs are reported in [633]. In a power system with PMUs, the measurement vector .y(t) is sampled at a frequency of .fs (samples per second). Within a time interval from the FO starting to the time instant t, the time evolution of the measurement vector .y(t) can be discretized by sampling and represented by a matrix called a measurement matrix t .Yt = [yp,q ], which we formally define next. Without loss of generality, we assume that the FOs start at time 0. The following column concatenation defines the measurement matrix .Yt up to time t:   Yt := y(0), y(1/fs ), . . . y(tfs  /fs ) ,

(9.22)

.

where .· denotes the floor operation. The i-th column of the measurement matrix Yt in (9.22) suggests the “snapshot” of all synchrophasor measurements over the system at the time .(i − 1)/fs . The k-th row of .Yt denotes the time evolution of the k-th measurement deviation in the output vector of the k-th PMU. Due to the fact that the output vector may contain multiple types of measurements (e.g., voltage magnitudes, frequencies, etc.), a normalization procedure is introduced as follows. t,i Assume that there are K measurement types. Denote by .Yt,i = [yp,q ] ∈ Rr0 ×c0 the measurement matrix of measurement type i, where .i = {1, 2, . . . , K}. The n,t normalized measurement matrix .Ynt = [yp,q ] is defined by

.

 Ynt =

.

 Yt,1

 Yt,2

 Yt,K

Yt,1 max , Yt,2 max , . . . Yt,K max

 ,

(9.23)

where . · max returns the largest absolute element of a matrix. The forced oscillation localization problem is equivalent to pinpointing it using measurement matrix .Yt . Due to the complexity of power system dynamics, the precise power system model (9.19) may not be available to system operators, especially in real-time operation. Therefore, it is assumed that the only known information for forced oscillation localization is the measurement matrix .Yt . In brief, the first-principle model (9.19) as well as the perturbation model (9.20) is introduced mainly for the purpose of defining the FO localization problem and theoretically justifying the data-driven method proposed in Sect. 9.2.3, but is not needed for the proposed algorithm.

9.2.2.2

Main Challenges of Pinpointing the Sources of Forced Oscillation

The topology of the power system represented by (9.19) can be characterized by an undirected graph .G = (B, T), where vertex set .B comprises all buses in the power system, and edge set .T collects all transmission lines. Suppose that the PMU measurements at bus .is ∈ B are the source measurements. Then bus j is said to be

9.2 Asset Management

383

Fig. 9.19 One counterintuitive case [625] from the IEEE 68-bus benchmark system [634]: the black curves correspond to the non-source measurements; the red curve corresponds to the source measurement

in the vicinity of the FO source if bus j is a member of the following vicinity set V0 = {j ∈ B|dG (is , j ) ≤ N0 },

.

(9.24)

where .dG (i, j ) denotes the i-j distance, viz., and the number of transmission lines (edges) in the shortest path connecting buses (vertices) i and j ; the threshold .N0 is a nonnegative integer. In particular, .V0 = {is } for the source measurement at bus .is , if .N0 is set to zero. Intuitively, it is tempting to presume that the source measurement can be localized by finding the maximal absolute element in the normalized measurement matrix .Ynt , i.e., expecting that the most severe oscillation should be manifested in the vicinity of the source. However, a major challenge for pinpointing FO sources arises from the following (perhaps counterintuitive) fact: the most severe oscillation does not necessarily manifest near the FO source in the presence of resonance phenomena. Following the same notation as in (9.22) and (9.24), we term a normalized measurement matrix .Ynt as counterintuitive case, if i∗ ∈ / V0 ,

.

(9.25)

where .i ∗ can be obtained by finding the row index of the maximal element in the measurement matrix .Yt , i.e.,    n,t  ∗ ∗ (9.26) .[i , j ] = arg max y i,j . i,j

It is such counterintuitive cases that make pinpointing the FO source challenging [624]. Figure 9.19 illustrates one such counterintuitive case where the source measurement (red) does not correspond to the most severe oscillation. Additional examples of counterintuitive cases can be found in [625]. Although the counterintuitive cases are much less likely to happen than the intuitive ones (in terms of frequency of occurrence), it is still imperative to design an algorithm to pinpoint the FO source even in the counterintuitive cases due to the hazardous consequences of them under resonance conditions.

384

9 Using PMU Data for Anomaly Detection and Localization

9.2.3 Problem Formulation and Proposed Methodology In this section, we formulate the FO localization as a matrix decomposition problem. Then, we present a FO localization algorithm for real-time operation.

9.2.3.1

Problem Formulation

Given a measurement matrix .Yt up to time t with one type of measurement (without loss in generality), the FO source localization is formulated as decomposing the measurement matrix .Yt into a low-rank matrix .Lt and a sparse matrix .St Yt = Lt + St , .

(9.27a)

rankLt ≤ γ , .

(9.27b)

St 0 ≤ β,

(9.27c)

.

where the pseudo-norm . · 0 returns the number of non-zero elements of a matrix, the nonnegative integer .γ is the upper bound of the rank of the low-rank matrix .Lt , and the nonnegative integer .β is the upper bound on the number of non-zero entries in the sparse matrix .St . Given nonnegative integers .γ and .β, it is possible to numerically find .{Lt , St } via alternating projections [625]. The source measurement index .p∗ can be tracked by finding the largest absolute value in the sparse matrix .St , viz.,    t  ∗ ∗  (9.28) .[p , q ] = arg max sp,q . p,q

The intuition behind the formulation (9.27) is as follows. As the power grid is an interconnected system, measurements at different buses have certain electrical couplings, resulting in correlations between the measurements. As a result, the measurements at different buses should exhibit a “general trend,” [625] which can be captured by a low-rank matrix .Lt . The measurements near the FO source are assumed to deviate most from its corresponding component in “general trend” (the low-rank matrix .Lt ). The deviation is supposed to be captured by the matrix .St . As the number of the measurements near the FO source is limited, the matrix .St is assumed to be sparse. Due to the prior unavailability of the upper bounds .γ and .β [625], the matrix decomposition problem shown in (9.27) is reformulated as an instance of robust principal component analysis (RPCA) [631] .

min St

Yt − St

+ ξ St 1 ,

(9.29)

9.2 Asset Management

385

where . ·

and . · 1 denote the nuclear norm and .l1 norm, respectively; the tunable parameter .ξ regulates the extent of sparsity in .St . The formulation in (9.29) is a convex relaxation of (9.27). Under some assumptions, the sparse matrix .St and the low-rank matrix .Lt can be disentangled from the measurement matrix .Yt [631] by diverse algorithms [635]. The exact Lagrange multiplier method (ALM) is used for numerically solving the formulation (9.29). Recall that the measurement matrix √ .Yt has .r0 rows and .c0 columns. The tunable parameter .ξ is suggested to be .1/ k0 , where .k0 = max{r0 , c0 }. Such selection of .ξ is justified via the mathematical analysis in[631]. For a measurement matrix containing multiple measurement types, (9.29) can be modified by replacing .Yt with .Ynt .

9.2.3.2

FO Localization Algorithm for Real-Time Operation

Next, we present a FO localization algorithm for real-time operation, using the formulation (9.29). To determine the starting time of forced oscillations, we can leverage the methods reported in [636, 637]. The method reported in [636] is used to detect FOs by comparing the periodogram of PMU measurements with a frequencydependent threshold. In [637] the authors propose a method that uses geometric analysis on streaming synchrophasor data to estimate the starting and end times of FOs. Once periodic FOs are detected by the method reported in [636], the starting time of the FOs can be estimated by the time-localization algorithm proposed in [637]. A window of measurements with the starting time is collected and forms the measurement matrix. Then Algorithm 15 is triggered for pinpointing the FO source. In Algorithm 15, .T0 and .ξ are user-defined parameters. Algorithm 15 Real-time FO localization 1: 2: 3: 4: 5:

Update YT0 by (9.22); Obtain YnT0 by (9.23); Find St in (9.29) via the exact ALM for chosen ξ ; Obtain p∗ by (9.28); return p∗ as the source measurement index.

Algorithm 15 can be leveraged to illustrate the intuition behind formulation (9.27) as described in Section III-A. A measurement matrix .Yt can be formed based on the measurements visualized in Fig. 9.19. Algorithm 15 can decompose .Yt into a low-rank matrix .Lt and a sparse matrix .St . Figure 9.20 visualizes .Yt , .Lt , and .St in a normalized fashion. For each matrix, we use the absolute values of their entries and normalize the absolute version of the entries by the maximal absolute entry in the corresponding matrix. The magnitudes of the normalized entries are represented by a color: the bigger the magnitude of an entry, the yellower the color, and conversely, the smaller the magnitude of an entry, the bluer the color. The “general trend” of the

386

9 Using PMU Data for Anomaly Detection and Localization

measurements is captured by the low-rank matrix .Lt in Fig. 9.20b. The deviations from the “general trend” are captured by the sparse matrix .St . In Fig. 9.20c, very few entries are colored with yellow, and these entries correspond to measurements deviating from the “general trend,” while many entries are colored with dark blue, suggesting that most of them are close to zero. The entry that is with the brightest hue corresponds to Bus 65, which is the bus closest to the force oscillation source (Generator 13).

9.2.4 Theoretical Interpretation of the RPCA-Based Algorithm This section aims to develop a theoretical connection between the first-principle model in Sect. 9.2.2 and the data-driven approach presented in Sect. 9.2.3. We begin the investigation by deriving the time-domain solution to PMU measurements in a power system under resonance conditions. Then, the resonance component matrix for the power grid is obtained from the derived solution to PMU measurements. Finally, the efficacy of the proposed method is interpreted by examining the rank of the resonance component matrix.

9.2.4.1

PMU Measurement Decomposition

For the power system with r inputs and m PMU measurements modeled using (9.19), the k-th measurement and the l-th input can be related by .

x˙ (t) = Ax(t) + bl ul (t), .

yk (t) = ck x(t),

(9.30a) (9.30b)

where column vector .bl ∈ Rn is the l-th column of matrix B in (9.19) and row vector .ck ∈ Rn is the k-th row of matrix C. With the assumption on eigenvalue distinctness, let .x = Mz, where .z denotes the transformed state vector and matrix

Fig. 9.20 Visualization of the measurement matrix .Yt (a), the low-rank matrix .Lt (b), and the sparse matrix .St (c)

9.2 Asset Management

387

M is chosen such that the similarity transformation of A is diagonal, then .

z˙ (t) = z(t) + M −1 bl ul (t), .

(9.31a)

yk (t) = ck Mz(t),

(9.31b)

where . = diag(λ1 , λ2 , . . . , λn ) = M −1 AM is a diagonal matrix stacking the eigenvalues of A. Denote by column vector .ri ∈ Cn and row vector .li ∈ Cn the right and left eigenvectors associated with the eigenvalue .λi , respectively. Accordingly, the transformation matrices M and .M −1 can be written as .[r1 , r2 , . . . , rn ] and     .[l , l , . . . , ln ] , respectively. The transfer function in the Laplace domain from 1 2 l-th input to k-th output is H (s) = ck M(sI − )−1 M −1 bl =

.

n  ck ri li bl i=1

s − λi

.

(9.32)

For simplicity, assume that the periodic injection .ul only contains one component with frequency .ωd and amplitude .Pd , namely, .J = 1, .ω1 = ωd , and .P1 = Pd in (9.20). Furthermore, we assume that before .t = 0− the system is in a steady-state, viz., .x(0− ) = 0. Let .N and .M consist of the indices of real eigenvalues and the indices of complex eigenvalues with positive imaginary parts, respectively, viz., N = {i ∈ Z+ |λi ∈ R};

.

M = {i ∈ Z+ | Im(λi ) > 0}.

(9.33)

Then the Laplace transform for PMU measurement .yk is  n   ck ri li bl Pd ωd Yk (s) = 2 s − λi s + ωd2 i=1 ⎡ .  ⎤  ck ri li bl  ck ri li bl ¯ ck r¯ i li bl ⎦ Pd ωd =⎣ + + , s − λi s − λ s − λ¯ i s 2 + ωd2 i  i∈N

(9.34)

i∈M

¯ denotes complex conjugation. where .(·) Next, we analyze the components resulting from the real eigenvalues and the components resulting from the complex eigenvalues individually.

Components Resulting from Real Eigenvalues In the Laplace domain, the component resulting from a real eigenvalue .λi is D Yk,i (s) =

.

ck ri li bl Pd ωd . s − λi s 2 + ωd2

(9.35)

388

9 Using PMU Data for Anomaly Detection and Localization

D (s) is The inverse Laplace transform of .Yk,i D yk,i (t) =

.

where .φi,l = 

ck ri li bl Pd ωd λi t ck ri li bl Pd e + sin(ωd t + φi,l ), λ2i + ωd2 λ2i + ωd2 

(9.36)

 λ2i + ωl2 + j λi

and

.

(·) denotes the angle of a complex

number.

Components Resulting from Complex Eigenvalues In the Laplace domain, the component resulting from a complex eigenvalue .λi = −σi + j ωi is  B Yk,i (s) =

.

ck ri li bl ck r¯ i ¯li bl + s − λi s − λ¯ i



Pd ωd . + ωd2

s2

(9.37)

B (s) is The inverse Laplace transform of .Yk,i B yk,i (t)

.

2Pd ωd |ck ri li bl | e−σi t cos(ωi t + θk,i − ψi ) = 2 2 2 2 2 2 (σi + ωd − ωi ) + 4ωi σi  2Pd |ck ri li bl | ωd2 cos2 θk,i + (σi cosθk,i − ωi sinθk,i )2  + (σi2 − ωd2 + ωi2 )2 + 4ωd2 σi2

(9.38)

× cos(ωd t + φi − αi ),   where .θk,i =  (ck ri li bl ); .ψi =  σi2 + ωd2 − ωi2 − j 2σi ωi ; .φi =  (σi2 − ωd2 + ωi2 − j 2ωi σi ); and .αi =  [ωd cosθk,i + j (σi cosθk,i − ωi sinθk,i )]. Resonance Component Under the resonance condition defined in (9.21), the injection frequency .ωd is in the vicinity of one natural modal frequency .ωj ∗ , and the real part of the natural mode  is  small. We define a new set .M ⊂ M as .M = {i ∈ Z+ | Im(λi ) > 0, ωi − ωj ∗  < κ1 , |Re(λi )| < κ2 }, where .κ1 and .κ2 are small and nonnegative real numbers. For .i ∈ M, the eigenvalue .λi = −σi + j ωi satisfies .ωi ≈ ωd and .σi ≈ 0. Then, π π .ψi ≈ − , .φi ≈ − , and .αi ≈ −θk,i . Therefore, Eq. (9.38) can be simplified as 2 2

9.2 Asset Management

389

B R yk,i (t) ≈ yk,i (t) =

.

Pd |ck ri li bl | (1 − e−σi t )sin(ωd t + θk,i ), σi

(9.39)

R in (9.39) is termed the resonance component in the for .i ∈ M. In this section, .yk,i k-th measurement. In summary, a PMU measurement .yk (t) in a power system (9.19) under resonance conditions can be decomposed into three classes of components, i.e.,

yk (t) =



.

i∈N

9.2.4.2



D yk,i (t) +

B yk,i (t) +

i ∈M∪N /



R yk,i (t).

(9.40)

i∈M

Observations on the Resonance Component and the Resonance-Free Component

Severe Oscillations Arising from Resonance Component Figure 9.21a visualizes the resonance component of a PMU measurement (at Bus 407 ) in the IEEE 68-bus benchmark system. Observe in Fig. 9.21a that the upper envelope of the oscillation increases concavely at the initial stage before reaching a steady-state value (about .0.1 in this case). The closed-form approximation for such a steady-state value is .Pd |ck ri li bl |/σi . For a small positive .σj ∗ associated with eigenvalue .λj ∗ , the steady-state amplitude of the resonance component may be the dominant one. If a PMU measurement far away from the source measurements is tightly coupled with the eigenvalue .λj ∗ , it may manifest the most severe oscillation, thereby confusing system operators with regard to FO source localization. Therefore, the presence of resonance components may cause the counterintuitive cases defined by (9.25) and (9.26).

Location Information on FO Source from the Resonance-Free Component As the resonance components of the set of all PMU measurements mislead system operators with respect to FO localization, we proceed by excluding the resonance component from (9.40) and check whether the remaining components exhibit any spatial information concerning the FO source. The superposition of the remaining components is termed as resonance-free. Specifically, for a power system with a known physical model (9.19), the resonance-free component .ykF in the k-th PMU measurement time series can be obtained by ykF (t) =



.

i∈N

D yk,i (t) +



B yk,i (t).

(9.41)

i ∈M∪N /

7 The measurements at Bus 40 exhibit the largest oscillations, but they are non-source measurements.

390

9 Using PMU Data for Anomaly Detection and Localization

0.05

0.1 0.05 0

0

-0.05 -0.1

0

5

10

(a)

15

20

-0.05

0

5

10

15

20

(b)

Fig. 9.21 (a) Visualization of the resonance component of bus voltage magnitudes in the IEEE 68bus benchmark system based on Eq. (9.39): the resonance components of the voltage magnitude measurement at bus 40 (blue curve) and its envelopes (red-dash curves). (b) Resonance-free components of the source voltage magnitude measurement (red) and the non-source voltage magnitude measurement (black) in the IEEE 68-bus benchmark system

The visualization of the resonance-free component for all PMU measurements in the IEEE 68-bus system is shown in Fig. 9.21b under a certain FO scenario.8 Under the same FO scenario, Fig. 9.19 visualizes all PMU measurements .yk (t) in (9.40). In Fig. 9.21b, while the complete measurements .yk (t) are counterintuitive, the resonance-free components .ykF (t) convey the location information on the FO source—the resonance-free component of the source measurement exhibits the largest oscillation. Such a localized response of resonance-free components might be an extension of the no-gain property of an electric network rigorously justified in [638, 639]. Future work will examine what kinds of power systems possess localization properties of resonance-free components in a theoretically rigorous fashion.

9.2.4.3

Low-Rank Nature of Resonance Component Matrix

The physical interpretation of the efficacy of the RPCA-based algorithm is illustrated by examining the rank of the matrix containing all resonance components for all measurements, which we call the resonance component matrix, formally defined in the following section. Similar to (9.22), the resonance component .ykR (t) in the k-th measurement can be discretized into a row vector .yR k,t :   R R R yR k,t := yk (0), yk (1/fs ), . . . yk (tfs  /fs ) .

.

8A

(9.42)

sinusoidal waveform with amplitude .0.05 per unit (p.u.) and frequency .0.38 Hz is injected into the IEEE 68-bus system via the voltage setpoint of generator 13. The information on the test system is elaborated in Sect. 9.2.5.

9.2 Asset Management

391

Then, the resonance component matrix .YtR can be defined as a row concatenation as follows: R .Yt

:=



yR 1,t

     R   R . , y2,t , . . . ym,t

(9.43)

Theorem 9.1 For the linear time-invariant dynamical system (9.19), the rank of the resonance component matrix .YtR is defined in (9.43) at most 2. Proof Based on (9.39), define .Ek := Pd |ck ri li bl |/σi . Then .

R yk,i (t) =(1 − e−σi t )sin(ωd t)Ek cos(θk,i )+

(1 − e−σi t )cos(ωd t)Ek sin(θk,i ).

We further define functions .f1 (t), .f2 (t) and variables .g1 (k), .g2 (k) as follows: f1 (t) := (1 − e−σi t )sin(ωd t); .f2 (t) := (1 − e−σi t )cos(ωd t); .g1 (k) := Ek cos(θk,i ); R (t) can be represented by .y R (t) = and .g2 (k) := Ek sin(θk,i ). Then, .yk,i k,i f1 (t)g1 (k) + f2 (t)g2 (k). The resonance component matrix .YtR up to time t can be factored as follows:

.

⎤ g1 (1) g2 (1)   ⎢ g1 (2) g2 (2) ⎥ f (0) f  1  . . . f  tfs   1 fs 1 fs ⎥ 1 ⎢ R 1  tfs   . .Yt = ⎢ . .. ⎥ ⎣ .. . ⎦ f2 (0) f2 fs . . . f2 fs g1 (m) g2 (m) ⎡

(9.44)

Denote by vectors .g1 and .g2 the first and second columns of the first matrix in the right hand-side (RHS) of (9.44), respectively, and by vectors .f1 and .f2 the first and second rows of the second matrix in the RHS of (9.44). Then (9.44) turns to be    f  YtR = g1 g2 1 . f2

.

(9.45)

Given (9.45), it is clear that the rank of the resonance component matrix .YtR is at most 2.

Typically, for a resonance component matrix .YtR with m rows and .tfs  columns, owing to .min(m, tfs ) 2, the resonance component matrix .YtR is a lowrank matrix, which is assumed to be integrated by the low-rank component .Lt in Eq. (9.27). As discussed in Sect. 9.2.4.2, the source measurement can be tracked by finding the maximal absolute entry of the resonance-free matrix .(Yt − YtR ). According to (9.28), the PMU measurement containing the largest absolute entry in the sparse component .St is considered as the source measurement. Then, it is reasonable to conjecture that the sparse component .St in (9.27) captures the part of the resonance-free matrix that preserves the location information of FO

392

9 Using PMU Data for Anomaly Detection and Localization

0.1

0.1

0.05

0.05

0

0

-0.05

-0.05

-0.1

0

5

10

(a)

15

20

-0.1

10-3

10 5 0 -5 0

5

10

(b)

15

20

0

5

10

15

20

(c)

Fig. 9.22 Visualization of voltage magnitudes (a), components in low-rank matrix .Lt (b) and components in sparse matrix .St (c) at bus 65 (red) and bus 40 (blue dash): bus 65 is the bus closest to the source, while the most severe oscillation appears at bus 40

source. Thereby, a theoretical connection between the proposed data-driven method in Algorithm 15 and the physical model of power systems described in Eq. (9.19) can be established. Although forced oscillation phenomena have been extensively studied in physics [640], the low-rank property, to the best of our knowledge, is first investigated in this section. Through the FO case shown in Fig. 9.19, next we examine the entries corresponding to the largest amplitude channel (Bus 40) and the source measurement (Bus 65) in the measurement matrix .Yt , the low-rank matrix .Lt , and the sparse matrix .St . In Fig. 9.22a, the blue-dash curve and the solid red curve, respectively, present voltage magnitudes at the largest amplitude channel (Bus 40) and the source measurement (Bus 65). Figure 9.22b shows the components captured by the low-rank matrix .Lt corresponding to measurements at Bus 40 (blue-dash) and Bus 65 (red). Figure 9.22c shows the components captured by the sparse matrix .St corresponding to measurements at Bus 40 (blue-dash curve) and Bus 65 (red). As can be observed in Fig. 9.22a, the measurement at Bus 40 (blue-dash curve) comprises mainly the resonance component. As we have established in Theorem 9.1, the resonance component matrix is by nature low-rank. Therefore, the measurement at Bus 40 is better captured by the low-rank matrix than the measurement at Bus 65, as is shown in Fig. 9.22b. The remainder of the sparse matrix pinpoints the forced oscillation source. Besides, in Fig. 9.22, part of the resonance-free component is also captured by the low-rank matrix, which cannot be explained by Theorem 9.1. Note that Theorem 9.1 offers one possible interpretation of the effectiveness of the proposed algorithm, but it is not claimed to be a fully rigorous interpretation of how the algorithm works; however, as is verified by the above figure, it indeed sheds light on its interpretation. As this section focuses on the development of one possible data-driven localization algorithm, future work will investigate a broader category of possible algorithms and their theoretical underpinnings. A natural question to ask refers to whether the robust PCA procedure can pinpoint the source of other types of oscillations, such as natural ones. The difficulty in answering this question is that the “source of natural oscillation” is not well

9.2 Asset Management

393

defined. In a forced oscillation event, the FO source is defined as the power system component with external periodic perturbations, and one obvious solution to suppress it is to disconnect the source from the grid. In a natural oscillation event, one may suppress it by tuning the control apparatus of a set of generators or by decreasing the load level. In such a case, there are two options, the tuned generators and the decreased load. In brief, we believe it is challenging to consent on a definition of the “source” of natural oscillations. Due to the ambiguity in the definition of natural oscillation sources, this section only focuses on the localization of forced oscillations.

9.2.5 Case Study In this section, we validate the effectiveness of Algorithm 15 using data from IEEE 68-bus benchmark system and WECC 179-bus system. We first describe the key information on the test systems, the procedure for obtaining test data, the parameter settings of the proposed algorithm, and the algorithm performance over the obtained test data. Then the impact of different factors on the performance of the localization algorithm is investigated. Finally, we compare the proposed algorithm with the energy-based method reported in [621]. As will be presented, the proposed method can pinpoint the FO sources with high accuracy without any information on system models and grid topology, even when resonance exists.

9.2.5.1

Performance Evaluation of the Localization Algorithms in Benchmark Systems

IEEE 68-Bus Power System Test Case The system parameters of the IEEE 68-bus power system are reported in the Power System Toolbox (PST) [634], and its topology is shown in Fig. 9.23. Let .V = {1, 2, . . . , 16} consists of the indices of all 16 generators in the 68-bus system. Based on the original parameters, the following modifications are made: (1) the power system stabilizers (PSS) at all generators, except the one at Generator 9, are removed in order to create more poorly damped oscillatory modes; (2) for the PSS at Generator 9, the product of PSS gain and washout time constant is changed to 250. Based on the modified system, the linearized model of the power system (9.19) can be obtained using the command “svm_mgen” in PST. There are 25 oscillatory modes whose frequencies range from .0.1 Hz to 2 Hz, which are shown in Fig. 9.26a. Denote by .W = {ω1 , ω2 , . . . , ω25 } the set consisting of all 25 modal frequencies that is of interest. The periodic perturbation .ul in (9.20) is introduced through the voltage setpoints of generators. The analytical expression of .ul is .0.05sin(ωd t), where .ωd ∈ W.

394

9 Using PMU Data for Anomaly Detection and Localization

Fig. 9.23 The IEEE 68-bus power system [625]: the generator in the solid circle is the actual source generator; the generator in the dashed circle is the identified source

We create FOs in the 68-bus system according to set .V × W, where .× is the Cartesian product. For element .(i, ωj ) ∈ V × W, the periodic perturbation .ul (t) with frequency .ωj is injected into the grid through the voltage setpoint of the generator i at time .t = 0. Then, the system response is obtained by conducting a 40-second simulation. The bus voltage magnitude deviations constitute the output/measurement vector .y(t) in (9.19). Finally, the measurement matrix is constructed based on (9.22), where the sampling rate .fs is 60 Hz. By repeating the above procedure for each element in set .V × W, we obtain 400 measurement matrices (.|V × W|). Among the 400 measurement matrices, 44 measurement matrices satisfy the resonance criteria (9.25), (9.26) with .N0 = 0, and they are marked as the counterintuitive cases, which are used for testing the performance of the proposed method. Some typical waveforms in the 44 test cases are shown in [625]. The tunable parameters .T0 and .ξ in Algorithm 15 are set to 10 and .0.0408, respectively. Measurements of voltage magnitude, phase angle, and frequency are used for constituting the measurement matrix. Then, we apply Algorithm 15 to the 44 counterintuitive cases. Algorithm 15 pinpoints the source measurements in 43 counterintuitive cases and, therefore, achieves .97.73% accuracy without any knowledge of system models and grid topology. Next, we scrutinize the geographic proximity between the identified and actual source measurements in the single failed case. The algorithm outputs where the source measurement is located at Bus 64 (highlighted with a solid circle in Fig. 9.23). When a periodic perturbation with frequency .1.3423 Hz is injected into

9.2 Asset Management

395

Fig. 9.24 Voltage magnitude visualization in Case F-3: the voltage magnitude of the bus connected with the forced oscillation source (red); the voltage magnitudes of the remaining buses (black)

the system through the generator directly connecting to Bus 65 (highlighted with a dashed circle in Fig. 9.23). As can be seen in Fig. 9.23, the identified and actual source measurements are geographically close. Therefore, even in the failed cases, the proposed method can effectively narrow the search space.

WECC 179-Bus System Test Case This subsection leverages the open-source forced oscillation dataset [641] to validate the performance of the RPCA-based method. The offered dataset is generated via the WECC 179-bus power system [641], whose topology is shown in Fig. 9.25a. The procedure for synthesizing the data is reported in [641]. The available dataset includes 15 forced oscillation cases with a single oscillation source, which are used to test the proposed method. The visualization for Case F-3 is shown in Fig. 9.24. In each forced oscillation case, the measurements of voltage magnitude, voltage angle, and frequency at all generation buses are used to construct the measurement matrix .Yt in (9.22), from the 10-second oscillatory data, i.e., .T0 = 10. Then, the 15 measurement matrices are used as the input for Algorithm 15, where the tunable parameter .ξ is set to .0.0577. For the WECC 179-bus system, the proposed method achieved .93.33% accuracy. Next, we present how geographically close the identified FO sources are to the ground truth in the seemingly incorrect case. In Case FM-6-2, a periodic rectangular perturbation is injected into the grid through the governor of the generator at Bus 79, which is highlighted with a solid red circle in Fig. 9.25b. The source measurement identified by the proposed method is at Bus 35, which is highlighted by a red dashed circle. Figure 9.25b shows that the identified FO source is geographically close to the actual source. Again, even the seemingly wrong

396

9 Using PMU Data for Anomaly Detection and Localization

Fig. 9.25 WECC 179-bus power system [641]: (a) complete topology; (b) zoomed-in the area of the yellow box in the left figure

2

2

1.5

1.5

1

1

0.5

0.5

0

0

20

40

(a)

60

80

100

0

0

20

40

60

80

100

(b)

Fig. 9.26 Eigenvalues of the IEEE 68-bus system (a) and the WECC 179-bus system in Cases F-1 and FM-1 (b): the eigenvalues whose damping ratio is less than .5% are shown on the lefthand side of the red-dash line

9.2 Asset Management Fig. 9.27 Voltage magnitudes during the ERCOT forced oscillation event

397

357

1 2 3 4 5 6 7

356 355 354 353 352 351 350

0

10

20

30

40

result can help system operators substantially narrow down the search space for FO sources (Fig. 9.26).

ERCOT Forced Oscillation Event We leverage the field measurements from a collaborative project with Electric Reliability Council of Texas (ERCOT), to test the localization algorithm in a realistic setting. Figure 9.27 shows the FOs observed by ERCOT. The FOs manifested themselves in seven PMU measurements on voltage magnitudes. For information privacy, the names of the PMU locations are replaced by indices .1, 2, . . . , 7, and the FO starting point is set to 0 seconds. In Fig. 9.27, it can be observed that the PMU measurements contain high-frequency components resulting from measurement noise and load fluctuation. We apply a band-pass filter from .0.1 to 1 Hz to process the raw PMU measurements. Subsequently, we use a 10-second time window of the filtered data for forming the measurement matrix. Finally, the proposed algorithm indicates that PMU 4 is the one near the FO source. The localization result was reported to ERCOT, and ERCOT confirmed the correctness of the result. It is worth noting that no topology information was provided to our research team. Therefore, localization algorithms based on system topology, such as the dissipating energy flow approach, are not applicable in this study.

9.2.5.2

Algorithm Robustness

This subsection focuses on testing the robustness of the proposed algorithm under different factors, which include measurement types, noise, and partial coverage

398

9 Using PMU Data for Anomaly Detection and Localization

Table 9.3 Impact of measurement types on localization performance

Types .|V | . V .|V |,  V f 68-bus system .84.09% .50.00% .84.09% .52.27% 179-bus System .86.67% .33.33% .73.33% .20.00% Types .|V |, f . V , f .|V |,  V , f N/A 68-bus system .93.18% .59.09% .97.73% N/A 179-bus System .80.00% .46.67% .93.33% N/A Bold numbers represent the biggest numbers

Table 9.4 Impact of noise level on localization performance SNR 68-bus system 179-bus system

90 dB .97.73% .93.33%

70 dB .97.73% .93.33%

50 dB .97.73% .93.33%

30 dB .97.73% .93.33%

10 dB .56.82% .73.33%

of PMUs. The impact of each factor on the algorithm performance will be demonstrated as follows.

Impact of Measurement Types on Algorithm Performance Under all possible combinations of nodal measurements (voltage magnitude .|V |, voltage angle . V , and frequency f ), the localization accuracies of the proposed algorithm in the two benchmark systems are reported in Table 9.3. As can be observed in Table 9.3, the maximal accuracy is achieved when voltage magnitudes, voltage angles, and frequencies are used to constitute the measurement matrix in (9.22).

9.2.5.3

Impact of Noise on Algorithm Performance

Table 9.4 records the localization accuracy under different levels of noise. In Table 9.4, the signal-to-noise ratio (SNR) is defined as follows: SNR = 10log(Ws /Wn ) (dB),

.

where .Ws is the sum of squared measurement deviations over a period (10 seconds in this section) and .Wn is the sum of squared magnitudes of the corresponding noise over the same period. The noise superimposed upon each measurement has a Gaussian distribution with zero mean and variance .σn . At each experiment for each measurement, the variance .σn is chosen such that the corresponding SNR is achieved. From Table 9.4, we conclude the proposed algorithm performs well under the cases with SNR less than 30 dB.

9.2 Asset Management

399

Table 9.5 Impact of partial coverage of synchrophasor on algorithm performance Case name Identified source Nearest PMU Case name Identified source Nearest PMU

F-1 8 8 F-5-1 78 78/69

FM-1 8 8 F-5-2 78 78/69

F-2 78 78/69

F-3 69 78/69

F-5-3 78 78/69

FM-3 69 78/69

F-6-1 78 78/69

F-4-1 69 78/69 F-6-2 78 78/69

F-4-2 78 78/69 F-6-3 78 78/69

F-4-3 78 78/69 FM-6-2 78 78/69

Impact of Partial Coverage of Synchrophasors on Algorithm Performance In practice, not all buses are equipped with PMUs. Besides, available PMUs may be installed on buses near oscillation sources instead of buses on which oscillation sources are directly connected. A test case is designed for testing the performance of the proposed algorithm in the scenario described above. In this test case, the locations of all available PMUs are marked with stars in Fig. 9.25a. The test result is listed in Table 9.5. As illustrated in Table 9.5, the proposed method can effectively identify the available PMUs that are close to oscillation sources, even though no PMU is installed on generation buses. Independent System Operators (ISOs) may also need to know whether FO sources are within their control areas. However, ISOs might not be able to access PMUs near FO sources, limiting the usefulness of the proposed algorithm. For example, assume that there are two ISOs, i.e., ISO 1 and ISO 2, in Fig. 9.25a, where the red dash line is the boundary between the control areas of the two ISOs. It is possible that FO sources are at the ISO 1 control area, whereas ISO 2 can only can access the PMUs at the buses marked with red stars. To apply the RPCA-based method, ISO 2 needs to access one PMU in the area controlled by ISO 1, say, the PMU marked with a purple star in Fig. 9.25a. In the F-2 dataset, the FO source is located at Bus 79, which is marked with a red circle in Fig. 9.25a. With the data collected from PMUs marked with red and purple stars, the proposed algorithm outputs the bus marked with a purple star, indicating that the FO source is outside the control area of ISO 2.

Impact of External Excitation on Localization Performance The external excitation is assumed to result mainly from load fluctuation. To introduce load fluctuation, load dynamics are included in the 68-bus benchmark system, and 33 real power setpoints along with 33 reactive power setpoints on load are considered as the augmented inputs. The above modification on the 68-bus system can be achieved by enabling load modulation in the Power System Toolbox (PST) [634]. Following the procedure described in Sect. 9.2.5.1, 43 counterintuitive cases are obtained. For the j -th case of the 43 counterintuitive cases, we have a pair of numbers .(ij , ωj ), where .ωj is the frequency of a periodic perturbation

400

9 Using PMU Data for Anomaly Detection and Localization

60.02

60.05

60.01

60

60

59.99

59.95

59.98

0

10

20

(a)

30

40

50

59.9

0.05

0.1

0.15

0.2

0.25

0.3

(b)

Fig. 9.28 (a) Frequency at Bus 1 under normal operation condition with load fluctuation; (b) Ranges of system frequency (vertical blue-solid line segments) due to different levels of load fluctuation: the normal frequency range (.59.96–.60.04 Hz) is represented by two horizontal reddash lines

and .ij is the source generator index. Let set .P consist of such pairs, i.e., .P =  , ω )}. {(i1 , ω1 ), (i2 , ω2 ), . . . , (ij , ωj ), . . . , (i43 43 Note that the number of state variables in the 68-bus system with load dynamics is 268, whereas the number of state variables in the 68-bus system used in Sect. 9.2.5.1 is 202. Effectively, the 68-bus system in this subsection is a different system from the 68-bus system used in Sect. 9.2.5.1, from the perspective of control theory, as the numbers of their state variables are distinct. Therefore, it is not surprising that the number of counterintuitive cases in this subsection is different from that in Sect. 9.2.5.1. The 66 augmented setpoints fluctuate around their nominal values, which can be external excitations. Denote by . uLd (t) ∈ R66 the load setpoint deviations from their nominal values at time t. Assume that vector . uLd has a Gaussian distribution with zero mean and covariance matrix .σext I66 , i.e., . uLd (t) ∼ N(0, σext I66 ), where .σext is a scalar and .I66 is a 66 by 66 identity matrix. Due to the excitation . uLd , the frequency fluctuates under normal operating conditions, as observed in Fig. 9.28a. Figure 9.28b shows how the system frequency range varies as scalar .σext changes. In Fig. 9.28b, each vertical line segment corresponds to the frequency range under a load fluctuation with parameter .σext : the upper terminal is the highest system frequency for each given .σext ; and the lower terminal is the lowest system frequency with the corresponding .σext . One observation from Fig. 9.28b is that as scalar .σext increases, it is more likely that the system frequencies are within a wider range. The normal range of frequency in power systems is from .59.96 to .60.04 Hz [5, 642]. As shown in Fig. 9.28b, the range of system frequencies is outside of the normal range under the excitation with .σext = 0.2. We use the random excitations . uLd (t) with .σext = 0.15 to mimic real-world load fluctuation. The random excitations . uLd with .σext = 0.15 and set .P are leveraged to obtain 43 test cases. The data acquisition procedure is described in what follows. For

9.2 Asset Management Fig. 9.29 Impact of .T0 on localization performance: the localization accuracy for the 68-bus system (blue-solid line) and the 179-bus system (red-dash line)

401

100 95 90 85 80 75

6

8

10

12

element .(ij , ωj ) ∈ P, the periodic perturbation .ul (t) with frequency .ωj is injected into the system via the voltage setpoint of generator .ij at .t = 0. At each experiment, the 68-bus system is stimulated by one realization of . uLd . Then, a 40-second simulation is conducted in order to obtain the system response. By repeating the above procedure for all elements, 43 test cases with load fluctuation are obtained. For these test cases, a 2-Hz low-pass filter is applied to process the measurements. The proposed algorithm achieves .90.70% localization accuracy.

Impact of Time-Window Length on Localization Performance In this section, we investigate the impact of the window width .T0 on the algorithm’s performance. Figure 9.29 summarizes the localization accuracy with different timewindow widths .T0 in both the 68-bus and 179-bus systems. In Fig. 9.29, we observe a trade-off between the time required for decision-making and the localization accuracy for the 68-bus system (the blue-dash line) with the given range of .T0 : .100% accuracy can be achieved with .T0 = 12 (or 13) seconds; the price we pay for the high localization accuracy is a wider time window, i.e., more decision-making time. In practice, the optimal window width .T0∗ can be obtained by offline studies on physical model-based simulations or historical FO events. Assume that we have .N1 options for the window width .T0 , represented by .T0 := {T01 , T02 , . . . , T0N1 }. For each window width option, say, .T0i , we run the localization algorithm on all available FO events and compute the localization accuracy .ηi . The optimal window width .T0∗ is the .i ∗ -th element in the set .T0 , which maximizes .ηi for ∗ .i = 1, 2, . . . , N1 . Such an optimal window width .T 0 is applied in localization Algorithm 15 during real-time operation.

402

9.2.5.4

9 Using PMU Data for Anomaly Detection and Localization

Comparison with Energy-Based Localization Method

This subsection aims to compare the proposed localization approach with the dissipating energy flow (DEF) approach [621]. We use the FM-1 dataset (Bus 4 is the source measurement) [641] for the purpose of comparing the DEF method with the proposed algorithm. PMUs are assumed to be installed at all generator buses except those at Buses 4 and 15. Besides, Buses 7, 15, and 19 are also assumed to have PMUs. Without any information on grid topology, the RPCA-based method suggests the source measurement is at Bus 7, which is in the vicinity of the actual source. However, topology errors may cause the DEF-based method to incur both false-negative and false-positive errors, as will be shown in the following two scenarios.

Scenarios 1 The zoomed-in version of the area within the blue box in Fig. 9.25a is shown in Fig. 9.30, where the left and right figures are the actual system topology and the topology reported to a control center, respectively. All available PMUs are marked with yellow stars in Fig. 9.30. Based on these available PMUs, the relative magnitudes and directions of dissipating energy flows are computed according to the FM-1 dataset and the method reported in [621]. With the true topology, the FO source cannot be determined, as the energy flow direction along Branch 8-3 cannot be inferred based on the available PMUs. However, with the topology error shown in Fig. 9.30b, i.e., it is mistakenly reported that Bus 29 (Bus 17) is connected to Bus 3 (Bus 9), it can be inferred that the energy flow with a relative magnitude of .0.4874 is injected into Bus 4, indicating that Bus 4 is not the source measurement. Such a conclusion contradicts the ground truth. Therefore, with such a topology error, the dissipating energy flow method leads to a false-negative error.

Scenario 2 Similar to Scenario 1, topology errors exist within the area highlighted by the green box in Fig. 9.25a, whose zoomed-in version is shown in Fig. 9.31. As shown in Fig. 9.31a, it can be inferred that an energy flow with a relative magnitude of .0.171 injects into Bus 15 with the information of actual topology and available PMUs, indicating Bus 15 is not a source. However, with the reported system topology, the generator at Bus 15 injects into the rest of the grid an energy flow with a magnitude of .0.0576, suggesting the source measurement is at Bus 15. Again, such a conclusion contradicts the ground truth and, hence, incurs a false-positive error.

9.2 Asset Management

403

Fig. 9.30 Zoomed-in version of the area in the blue box of Fig. 9.25a: actual topology (a); topology reported in a control center (b). Relative magnitudes and direction of energy flows are labeled with red numbers and arrows, respectively

Fig. 9.31 Zoomed-in version of the area in the green box of Fig. 9.25a: actual topology (a); topology reported in a control center (b)

9.2.6 Conclusion In this section, a purely data-driven but physically interpretable method is proposed in order to locate forced oscillation sources in power systems. The localization problem is formulated as an instance of matrix decomposition, i.e., how to decompose the high-dimensional synchrophasor data into a low-rank matrix and a sparse matrix, which can be accomplished using robust principal component analysis. Based on this problem formulation, a localization algorithm for real-time operation is presented. The proposed algorithm does not require any information on system models nor grid topology, thus providing an efficient and easily deployable solution for real-time operation. Without the availability of system topology, the proposed algorithm can achieve high localization accuracy in synthetic cases based on benchmark systems and real-world forced oscillation in the power grid of Texas. In addition, a possible theoretical interpretation of the efficacy of the algorithm

404

9 Using PMU Data for Anomaly Detection and Localization

is provided based on physical model-based analysis, highlighting the fact that the rank of the resonance component matrix is two at most. Future work will test the proposed localization algorithm in conjunction with FO detection algorithms and explore a broader set of algorithms and their theoretical performance analysis for large-scale realistic power systems.

References

1. J.H. Stock, D.N. Stuart, Robust decarbonization of the us power sector: Policy options. National Bureau of Economic Research, Technical Report (2021) 2. D. Wu, X. Zheng, D. Kalathil, L. Xie, Nested reinforcement learning based control for protective relays in power distribution systems, in 2019 IEEE 58th Conference on Decision and Control (CDC) (2019), pp. 1925–1930 3. L. Xie, Y. Chen, P.R. Kumar, Dimensionality reduction of synchrophasor data for early event detection: linearized analysis. IEEE Trans. Power Syst. 29(6), 2784–2794 (2014) 4. T. Huang, N.M. Freris, P.R. Kumar, L. Xie, A synchrophasor data-driven method for forced oscillation localization under resonance conditions. IEEE Trans. Power Syst. 35(5), 3927– 3939 (2020) 5. T. Huang, B. Satchidanandan, P.R. Kumar, L. Xie, An online detection framework for cyber attacks on automatic generation control. IEEE Trans. Power Syst. 33(6), 6816–6827 (2018) 6. H. Ming, B. Xia, K.-Y. Lee, A. Adepoju, S. Shakkottai, L. Xie, Prediction and assessment of demand response potential with coupon incentives in highly renewable power systems. Prot. Control Mod. Power Syst. 5(1), 1–14 (2020) 7. X. Lai, L. Xie, Q. Xia, H. Zhong, C. Kang, Decentralized multi-area economic dispatch via dynamic multiplier-based lagrangian relaxation. IEEE Trans. Power Syst. 30(6), 3225–3233 (2015) 8. K. Clement-Nyns, E. Haesen, J. Driesen, The impact of charging plug-in hybrid electric vehicles on a residential distribution grid. IEEE Trans. Power Syst. 25(1), 371–380 (2009) 9. J.P. Gouveia, J. Seixas, G. Giannakidis, Smart city energy planning: Integrating data and tools, in Proceedings of the 25th International Conference Companion on World Wide Web (2016), pp. 345–350 10. S. Dey, A. Jessa, L. Gelbien, Urban grid monitoring renewables integration, in 2010 IEEE Conference on Innovative Technologies for an Efficient and Reliable Electricity Supply (IEEE, Piscataway, 2010), pp. 252–256 11. G. Media, FLISR of the future: Tiering reliability to meet consumer needs. [Chapter: Assessment of Revenue Potentials of Ancillary Service Provision by Flexible Unit Portfolios]. [Online]. Available: http://www.greentechmedia.com/articles/read/flisr-of-the-futuretiering-reliability-to-meet-consumer-needs 12. Prosumer. [Online]. Available: https://en.wikipedia.org/wiki/Prosumer 13. O. Samuelsson, M. Hemmingsson, A.H. Nielsen, K.O.H. Pedersen, J. Rasmussen, Monitoring of power system events at transmission and distribution level. IEEE Trans. Power Syst. 21(2), 1007–1008 (2006)

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Xie et al., Data Science and Applications for Modern Power Systems, Power Electronics and Power Systems, https://doi.org/10.1007/978-3-031-29100-5

405

406

References

14. Blockchain-based smart grids. [Online]. Available: https://www.sciencedirect.com/book/ 9780128178621/blockchain-based-smart-grids 15. Energy Storage for Smart Grids Planning and Operation for Renewable and Variable Energy Resources (VERs). [Online]. Available: https://www.sciencedirect.com/book/ 9780124104914/energy-storage-for-smart-grids 16. Y. Weng, J. Yu, R. Rajagopal, Probabilistic baseline estimation based on load patterns for better residential customer rewards. Int. J. Electri. Power Energy Syst. 100, 508–516 (2018) 17. Y. Weng, R. Rajagopal, Probabilistic baseline estimation via gaussian process, in 2015 IEEE Power & Energy Society General Meeting (PESGM) (IEEE, Piscataway, 2015), pp. 1–5 18. [Online]. Available: https://www.powermag.com/public-vs-private-whats-best-for-powercustomers/ 19. [Online]. Available: https://www.rff.org/publications/explainers/us-electricity-markets-101/ 20. Public Utility. [Online]. Available: https://en.wikipedia.org/wiki/Public_utility/ 21. [Online]. Available: http://www.incsys.com/power4vets/what-is-a-system-operator/ 22. M.J. Smith, K. Wedeward, Event detection and location in electric power systems using constrained optimization, in 2009 IEEE Power & Energy Society General Meeting (IEEE, Piscataway, 2009), pp. 1–6 23. A. Abur, A.G. Exposito, Power System State Estimation: Theory and Implementation (CRC Press, Boca Raton, 2004) 24. B.C. Lesieutre, A. Pinar, S. Roy, Power system extreme event detection: The vulnerability frontier, in Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008) (IEEE, Piscataway, 2008), pp. 184–184 25. J. Sawin, F. Sverrisson, Renewables 2014: Global Status Report; ren21 Secretariat: Paris (2014) 26. O. Edenhofer, R. Pichs-Madruga, Y. Sokona, E. Farahani, S. Kadner, K. Seyboth, A. Adler, I. Baum, S. Brunner, P. Eickemeier, Intergovernmental panel on climate change, summary for policymakers, in Climate Change 2014: Impacts, adaptation, and vulnerability. Contribution of Working Group II to the Fifth Assessment Report (2014) 27. [Online]. Available: https://www.ferc.gov/electric-power-markets 28. [Online]. Available: https://learn.pjm.com/electricity-basics/market-for-electricity.aspx 29. J. MacDonald, P. Cappers, D. Callaway, S. Kiliccote, Demand response providing ancillary services: A comparison of opportunities and challenges in US wholesale markets (2022). https://www.osti.gov/biblio/1632141 30. A. Daneels, W. Salter, What is SCADA? (1999). https://accelconf.web.cern.ch/ica99/papers/ mc1i01.pdf 31. S.F. Tie, C.W. Tan, A review of energy sources and energy management system in electric vehicles. Renew. Sust. Energ. Rev. 20, 82–102 (2013) 32. [Online]. Available: https://library.e.abb.com/public/0d9220cf797fa8a0852575fa0057038d/ BR_SCADA_EMS_GMS.pdf 33. Y. Weng, R. Negi, M.D. Ili´c, A search method for obtaining initial guesses for smart grid state estimation, in 2012 IEEE Third International Conference on Smart Grid Communications (SmartGridComm) (IEEE, Piscataway, 2012), pp. 599–604 34. T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. (Springer, Berlin, 2009) 35. W. Luan, W. Li, Smart metering and infrastructure, in Smart Grids: Clouds, Communications, Open Source, and Automation (CRC Press, Boca Raton, 2014), p. 399 36. Smart Grid Investment Grant Program, Fault location, isolation, and service restoration technologies reduce outage impact and duration. U.S. Department of Energy, Technical Report (2014) 37. C. Rudin, D. Waltz, R.N. Anderson, A. Boulanger, A. Salleb-Aouissi, M. Chow, H. Dutta, P.N. Gross, B. Huang, S. Ierome et al., Machine learning for the New York city power grid. IEEE Trans. Pattern Analy. Mach. Intell. 34(2), 328–345 (2011) 38. F.C. Schweppe, J. Wildes, Power system static-state estimation, part I: Exact model. IEEE Trans. Power Apparatus Syst. PAS-89(1), 120–125 (1970)

References

407

39. F.C. Schweppe, D.B. Rom, Power system static-state estimation, part II: approximate model. IEEE Trans. Power Apparatus Syst. PAS-89(1), 125–130 (1970) 40. F.C. Schweppe, Power system static-state estimation, part III: implementation. IEEE Trans. Power Apparatus Syst. PAS-89(1), 130–135 (1970) 41. M. Ilic, Data-driven sustainable energy systems, in The 8th Annual Carnegie Mellon Conference on the Elctricity industry (2012) 42. A.J. Wood, B.F. Wollenberg, G.B. Sheblé, Power Generation, Operation, and Control (Wiley, Hoboken, 2013) 43. Y. Weng, R. Negi, M.D. Ili´c, Historical data-driven state estimation for electric power systems, in 2013 IEEE International Conference on Smart Grid Communications (SmartGridComm) (IEEE, Piscataway, 2013), pp. 97–102. 44. J. Zhang, G. Welch, G. Bishop, LoDiM: A novel power system state estimation method with dynamic measurement selection, in 2011 IEEE Power and Energy Society General Meeting (IEEE, Piscataway, 2011), pp. 1–7 45. EATON, Power xpert Meters 4000/6000/8000 46. N.M. Haegel, R. Margolis, T. Buonassisi, D. Feldman, A. Froitzheim, R. Garabedian, M. Green, S. Glunz, H.-M. Henning, B. Holder et al., Terawatt-scale photovoltaics: Trajectories and challenges. Science 356(6334), 141–143 (2017) 47. S. Chu, A. Majumdar, Opportunities and challenges for a sustainable energy future. Nature 488(7411), 294–303 (2012) 48. S. Agnew, P. Dargusch, Effect of residential solar and storage on centralized electricity supply systems. Nat. Climate Change 5(4), 315–318 (2015) 49. T.O.P. Project., National renewable energy laboratory. https://openpv.nrel.gov 50. N. Jean, M. Burke, M. Xie, W.M. Davis, D.B. Lobell, S. Ermon, Combining satellite imagery and machine learning to predict poverty. Science 353(6301), 790–794 (2016) 51. J.M. Malof, K. Bradbury, L.M. Collins, R.G. Newell, Automatic detection of solar photovoltaic arrays in high resolution aerial imagery. Appl. Energy 183, 229–240 (2016) 52. J. Yuan, H.-H.L. Yang, O.A. Omitaomu, B.L. Bhaduri, Large-scale solar panel mapping from aerial images using deep convolutional networks, in 2016 IEEE International Conference on Big Data (Big Data) (IEEE, Piscataway, 2016), pp. 2703–2708 53. J.M. Malof, K. Bradbury, L.M. Collins, R.G. Newell, A. Serrano, H. Wu, S. Keene, Image features for pixel-wise detection of solar photovoltaic arrays in aerial imagery using a random forest classifier, in 2016 IEEE International Conference on Renewable Energy Research and Applications (ICRERA) (IEEE, Piscataway, 2016), pp. 799–803 54. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015) 55. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet : A large-scale hierarchical image database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition (IEEE, Piscataway, 2009), pp. 248–255 56. A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012) 57. S.J. Pan, Q. Yang, A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009) 58. J. Yu, Z. Wang, A. Majumdar, R. Rajagopal, Deepsolar: a machine learning framework to efficiently construct a solar deployment database in the united states. Joule 2(12), 2605–2617 (2018) 59. J.M. Malof, L.M. Collins, K. Bradbury, R.G. Newell, A deep convolutional neural network and a random forest classifier for solar photovoltaic array detection in aerial imagery, in 2016 IEEE International Conference on Renewable Energy Research and Applications (ICRERA) (IEEE, Piscataway, 2016), pp. 650–654 60. A.J. Schaffer, S. Brun, Beyond the sun—socioeconomic drivers of the adoption of small-scale photovoltaic installations in Germany. Energy Res. Soc. Sci. 10, 220–227 (2015) 61. C.L. Kwan, Influence of local environmental, social, economic and political variables on the spatial distribution of residential solar PV arrays across the United States. Energy Policy 47, 332–344 (2012)

408

References

62. C. Crago, I. Chernyakhovskiy, Solar PV technology adoption in the united states: An empirical investigation of state policy effectiveness. Technical Report (2014) 63. V. Rai, K. McAndrews, Decision-making and behavior change in residential adopters of solar PV, in Proceedings of the World Renewable Energy Forum (Citeseer, 2012) 64. T. Islam, N. Meade, The impact of attribute preferences on adoption timing: the case of photo-voltaic (PV) solar cells for household electricity generation. Energy Policy 55, 521– 530 (2013) 65. V. Vasseur, R. Kemp, The adoption of PV in the Netherlands: a statistical analysis of adoption factors. Renew. Sust. Energ. Rev. 41, 483–494 (2015) 66. A. Palm, Local factors driving the diffusion of solar photovoltaics in Sweden: a case study of five municipalities in an early market. Energy Res. Soc. Sci. 14, 1–12 (2016) 67. V. Rai, D.C. Reeves, R. Margolis, Overcoming barriers and uncertainties in the adoption of residential solar PV. Renew. Energy 89, 498–505 (2016) 68. K.S. Wolske, P.C. Stern, T. Dietz, Explaining interest in adopting residential solar photovoltaic systems in the United States: toward an integration of behavioral theories. Energy Res. Soc. Sci. 25, 134–151 (2017) 69. M. Braito, C. Flint, A. Muhar, M. Penker, S. Vogel, Individual and collective sociopsychological patterns of photovoltaic investment under diverging policy regimes of Austria and Italy. Energy Policy 109, 141–153 (2017) 70. C. Davidson, E. Drury, A. Lopez, R. Elmore, and R. Margolis, Modeling photovoltaic diffusion: an analysis of geospatial datasets. Environ. Res. Lett. 9(7), 074009 (2014) 71. J. Letchford, K. Lakkaraju, Y. Vorobeychik, Individual household modeling of photovoltaic adoption, in 2014 AAAI Fall Symposium Series (2014) 72. H. Li, H. Yi, Multilevel governance and deployment of solar PV panels in US cities. Energy Policy 69, 19–27 (2014) 73. O. De Groote, G. Pepermans, F. Verboven, Heterogeneity in the adoption of photovoltaic systems in flanders. Energy Econ. 59, 45–57 (2016) 74. S. Dharshing, Household dynamics of technology adoption: a spatial econometric analysis of residential solar photovoltaic (PV) systems in Germany. Energy Res. Soc. Sci. 23, 113–124 (2017) 75. L. Breiman, Random forests. Mach. Learn. 45(1), 5–32 (2001) 76. K. Bradbury, R. Saboo, T.L. Johnson, J.M. Malof, A. Devarajan, W. Zhang, L.M. Collins, R.G. Newell, Distributed solar photovoltaic array location and extent dataset for remote sensing object identification. Sci. Data 3(1), 1–9 (2016) 77. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 2818–2826 78. C. Elkan, The foundations of cost-sensitive learning, in International Joint Conference on Artificial Intelligence, vol. 17, no. 1 (Lawrence Erlbaum Associates, Mahwah, 2001), pp. 973–978 79. H. He, E.A. Garcia, Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009) 80. C.X. Ling, V.S. Sheng, Cost-sensitive learning and the class imbalance problem. Enc. Mach. Learn. 2011, 231–235 (2008) 81. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), 2921–2929 82. V. Nair, G.E. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proceedings of the 27th International Conference on Machine Learning (ICML-10) (2010) 83. G. Chen, W. He, J. Liu, S. Nath, L. Rigas, L. Xiao, F. Zhao, Energy-aware server provisioning and load dispatching for connection-intensive internet services, in NSDI’08: 5th USENIX Symposium on Networked Systems Design and Implementation, vol. 8 (2008), pp. 337–350 84. A.Q. Huang, M.L. Crow, G.T. Heydt, J.P. Zheng, S.J. Dale, The future renewable electric energy delivery and management (FREEDM) system: the energy internet. Proc. IEEE 99(1), 133–148 (2011)

References

409

85. F. Ucar, O.F. Alcin, B. Dandil, F. Ata, Machine learning based power quality event classification using wavelet - entropy and basic statistical features, in 21st International Conference on Methods and Models in Automation and Robotics (MMAR) (2016), pp. 414– 419 86. N. Rotering, M. Ilic, Optimal charge control of plug-in hybrid electric vehicles in deregulated electricity markets. IEEE Trans. Power Syst. 26(3), 1021–1029 (2011) 87. R.A. Verzijlbergh, M.O. Grond, Z. Lukszo, J.G. Slootweg, M.D. Ilic, Network impacts and cost savings of controlled EV charging. IEEE Trans. Smart Grid 3(3), 1203–1212 (2012) 88. J.W. Wilson, Residential demand for electricity. Q. Rev. Econ. Bus. 11(1), 7–22 (1971) 89. L.G. Swan, V.I. Ugursal, Modeling of end-use energy consumption in the residential sector: a review of modeling techniques. Renew. Sust. Energ. Rev. 13(8), 1819–1835 (2009) 90. D.W. Bunn, Forecasting loads and prices in competitive power markets. Proc. IEEE 88(2), 163–169 (2000) 91. N. Arghira, L. Hawarah, S. Ploix, M. Jacomino, Prediction of appliances energy use in smart homes. Energy 48(1), 128–134 (2012) 92. M. Muratori, M.C. Roberts, R. Sioshansi, V. Marano, G. Rizzoni, A highly resolved modeling technique to simulate residential power demand. Appl. Energy 107, 465–473 (2013) 93. A. Capasso, W. Grattieri, R. Lamedica, A. Prudenzi, A bottom-up approach to residential load modeling. IEEE Trans. Power Syst. 9(2), 957–964 (1994) 94. I. Richardson, M. Thomson, D. Infield, C. Clifford, Domestic electricity use: a high-resolution energy demand model. Energy Build. 42(10), 1878–1887 (2010) 95. J. Widén, E. Wäckelgård, A high-resolution stochastic model of domestic activity patterns and electricity demand. Appl. Energy 87(6), 1880–1892 (2010) 96. M. Muratori, Impact of uncoordinated plug-in electric vehicle charging on residential power demand. Nat. Energy 3(3), 193–201 (2018) 97. M. Muratori, V. Marano, R. Sioshansi, G. Rizzoni, Energy consumption of residential HVAC systems: A simple physically-based model, in 2012 IEEE Power and Energy Society General Meeting (IEEE, Piscataway, 2012), pp. 1–8 98. M. Muratori, M.J. Moran, E. Serra, G. Rizzoni, Highly-resolved modeling of personal transportation energy consumption in the United States. Energy 58, 168–177 (2013) 99. S.C. Kuttan. (2016) Creating a level playing field for electric vehicles in Singapore. [Online]. Available: https://www.eco-business.com/opinion/creating-a-level-playing-fieldfor-electric-vehicles-in-singapore/ 100. Land Transport Authority, Government of Singapore. (2018) Tax Structure for Cars. [Online]. Available: https://www.lta.gov.sg/content/ltaweb/en/roads-and-motoring/owninga-vehicle/costs-of-owning-a-vehicle/tax-structure-for-cars.html 101. Land Transport Authority, Joint News Release by the Land Transport Authority (LTA) & EDB - Electric Vehicles (EVs) in Every HDB Town by 2020 (2016). [Online]. Available: https:// www.lta.gov.sg/apps/news/page.aspx?c=2&id=e030e95d-a82c-49b4-953c-fc4b3fad7924 102. X. Dong, Y. Mu, H. Jia, J. Wu, X. Yu, Planning of fast EV charging stations on a round freeway. IEEE Trans. Sustainable Energy 7(4), 1452–1461 (2016) 103. H. Zhang, S. Moura, Z. Hu, Y. Song, PEV fast-charging station siting and sizing on coupled transportation and power networks. IEEE Trans. Smart Grid PP(99), 1–1 (2017) 104. X. Wang, C. Yuen, N.U. Hassan, N. An, W. Wu, Electric vehicle charging station placement for urban public bus systems. IEEE Trans. Intell. Transpor. Syst. 18(1), 128–139 (2017) 105. A. Rajabi-Ghahnavieh, P. Sadeghi-Barzani, Optimal zonal fast-charging station placement considering urban traffic circulation. IEEE Trans. Vehic. Technol. 66(1), 45–56 (2017) 106. Y. Zheng, Z.Y. Dong, Y. Xu, K. Meng, J.H. Zhao, J. Qiu, Electric vehicle battery charging/swap stations in distribution systems: comparison study and optimal planning. IEEE Trans. Power Syst. 29(1), 221–229 (2014) 107. R. Mehta, D. Srinivasan, A.M. Khambadkone, J. Yang, A. Trivedi, Smart charging strategies for optimal integration of plug-in electric vehicles within existing distribution system infrastructure. IEEE Trans. Smart Grid 9(1), 299–312 (2018)

410

References

108. W. Yao, J. Zhao, F. Wen, Z. Dong, Y. Xue, Y. Xu, K. Meng, A multi-objective collaborative planning strategy for integrated power distribution and electric vehicle charging systems. IEEE Trans. Power Syst. 29(4), 1811–1821 (2014) 109. C. Luo, Y.F. Huang, V. Gupta, Placement of EV charging stations — balancing benefits among multiple entities. IEEE Trans. Smart Grid 8(2), 759–768 (2017) 110. Z. Liu, F. Wen, G. Ledwich, Optimal planning of electric-vehicle charging stations in distribution systems. IEEE Trans. Power Delivery 28(1), 102–110 (2013) 111. B. Zhang, Q. Yan, M. Kezunovic, Placement of EV charging stations integrated with PV generation and battery storage, in 2017 Twelfth International Conference on Ecological Vehicles and Renewable Energies (EVER) (2017), pp. 1–7 112. K.J. Dyke, N. Schofield, M. Barnes, The impact of transport electrification on electrical networks. IEEE Trans. Ind. Electron. 57(12), 3917–3926 (2010) 113. N. Neyestani, M.Y. Damavandi, M. Shafie-Khah, J. Contreras, J.P.S. Catalão, Allocation of plug-in vehicles’ parking lots in distribution systems considering network-constrained objectives. IEEE Trans. Power Syst. 30(5), 2643–2656 (2015) 114. A.Y.S. Lam, Y.W. Leung, X. Chu, Electric vehicle charging station placement: formulation, complexity, and solutions. IEEE Trans. Smart Grid 5(6), 2846–2856 (2014) 115. S. Ruifeng, Y. Yang, K.Y. Lee, Multi-objective EV charging stations planning based on a two-layer coding SPEA-II, in 2017 19th International Conference on Intelligent System Application to Power Systems (2017), pp. 1–6 116. Y. Xiong, J. Gan, B. An, C. Miao, A.L.C. Bazzan, Optimal electric vehicle fast charging station placement based on game theoretical framework. IEEE Trans. Intell. Transpor. Syst. PP(99), 1–12 (2017) 117. Q. Cui, Y. Weng, C.-W. Tan, Electric vehicle charging station placement method for urban areas. IEEE Trans. Smart Grid 10(6), 6552–6565 (2019) 118. D. Mayfield, Site design for electric vehicle charging stations, ver. 1.0. Sustainable Transportation Strategies (2012) 119. H. Zhang, Z. Hu, Z. Xu, Y. Song, An integrated planning framework for different types of PEV charging facilities in urban area. IEEE Trans. Smart Grid 7(5), 2273–2284 (2016) 120. University of Houston. (2018) Campus Design Guidelines and Standards. [Online]. Available: http://www.uh.edu/facilities-services/departments/fpc/design-guidelines/09_parking.pdf 121. J. Wong, R. Rajagopal, A simple way to use interval data to segment residential customers for energy efficiency and demand response program targeting, in ACEEE Proceedings (2012) 122. B. Sütterlin, T.A. Brunner, M. Siegrist, Who puts the most energy into energy conservation? A segmentation of energy consumers based on energy-related behavioral characteristics. Energy Policy 39(12), 8137–8152 (2011) 123. T.F. Sanquist, H. Orr, B. Shui, A.C. Bittner, Lifestyle factors in us residential electricity consumption. Energy Policy 42, 354–364 (2012) 124. S.J. Moss, Market Segmentation and Energy Efficiency Program Design. UC Berkeley: California Institute for Energy and Environment (CIEE) (2008) 125. L. Dethman, D. Thomley, Comparison of Segmentation Plans for Residential Customers (Energy Trust of Oregon, Portland, 2009) 126. O.D. Corp., Final segmentation report—California Public Utilities Commission (2010) 127. G. Irwin, W. Monteith, W. Beattie, Statistical electricity demand modelling from consumer billing data, in IEE Proceedings C (Generation, Transmission and Distribution), vol. 133, no. 6 (IET, London, 1986), pp. 328–335 128. M. Espinoza, C. Joye, R. Belmans, B. De Moor, Short-term load forecasting, profile identification, and customer segmentation: a methodology based on periodic time series. IEEE Trans. Power Syst. 20(3), 1622–1630 (2005) 129. G. Coke, M. Tsao, Random effects mixture models for clustering electrical load series. J. Time Ser. Anal. 31(6), 451–464 (2010) 130. V. Figueiredo, F. Rodrigues, Z. Vale, J.B. Gouveia, An electric energy consumer characterization framework based on data mining techniques. IEEE Trans. Power Syst. 20(2), 596–602 (2005)

References

411

131. G. Chicco, R. Napoli, F. Piglione, Comparisons among clustering techniques for electricity customer classification. IEEE Trans. Power Syst. 21(2), 933–940 (2006) 132. G. Chicco, R. Napoli, F. Piglione, P. Postolache, M. Scutariu, C. Toader, Load pattern-based classification of electricity customers. IEEE Trans. Power Syst. 19(2), 1232–1239 (2004) 133. S.V. Verdú, M.O. Garcia, C. Senabre, A.G. Marin, F.G. Franco, Classification, filtering, and identification of electrical customer load patterns through the use of self-organizing maps. IEEE Trans. Power Syst. 21(4), 1672–1682 (2006) 134. G.J. Tsekouras, N.D. Hatziargyriou, E.N. Dialynas, Two-stage pattern recognition of load curves for classification of electricity customers. IEEE Trans. Power Syst. 22(3), 1120–1128 (2007) 135. S.K. Bhatia et al., Adaptive k-means clustering, in FLAIRS Conference (2004), pp. 695–699 136. A. Todd, P. Cappers, C. Goldman, Residential Customer Enrollment in Time-Based Rate and Enabling Technology Programs (Lawrence Berkeley National Laboratory, Berkeley, 2013) 137. J. Han, M.A. Piette, Solutions for summer electric power shortages: demand response and its applications in air conditioning and refrigerating systems. Refrigeration, Air Conditioning Electr. Power Mach. 29(1), 1–4 (2008) 138. S. Borenstein, M. Jaske, A. Rosenfeld, Dynamic Pricing, Advanced Metering, and Demand Response in Electricity Markets. UC Berkeley: California Institute for Energy and Environment (CIEE) (2002) 139. M.H. Albadi, E.F. El-Saadany, Demand response in electricity markets: An overview, in 2007 IEEE Power Engineering Society General Meeting (IEEE, Piscataway, 2007), pp. 1–5 140. F. Rahimi, A. Ipakchi, Demand response as a market resource under the smart grid paradigm. IEEE Trans. Smart Grid 1(1), 82–88 (2010) 141. J.L. Mathieu, P.N. Price, S. Kiliccote, M.A. Piette, Quantifying changes in building electricity use, with application to demand response. IEEE Trans. Smart Grid 2(3), 507–518 (2011) 142. N. Li, L. Chen, S.H. Low, Optimal demand response based on utility maximization in power networks, in 2011 IEEE Power and Energy Society General Meeting (IEEE, Piscataway, 2011), pp. 1–8 143. Q. Dong, L. Yu, W.-Z. Song, L. Tong, S. Tang, Distributed demand and response algorithm for optimizing social-welfare in smart grid, in 2012 IEEE 26th International Parallel and Distributed Processing Symposium (IEEE, Piscataway, 2012), pp. 1228–1239 144. C. Chen, S. Kishore, Z. Wang, M. Alizadeh, A. Scaglione, How will demand response aggregators affect electricity markets?—A cournot game analysis. in 2012 5th International Symposium on Communications, Control and Signal Processing (IEEE, Piscataway, 2012), pp. 1–6 145. A.D. Dominguez-Garcia, C.N. Hadjicostis, Distributed algorithms for control of demand response and distributed energy resources, in 2011 50th IEEE Conference on Decision and Control and European Control Conference (IEEE, Piscataway, 2011), pp. 27–32 146. H. Zhong, L. Xie, Q. Xia, Coupon incentive-based demand response: theory and case study. IEEE Trans. Power Syst. 28(2), 1266–1276 (2012) 147. J.A. Taylor, J.L. Mathieu, D.S. Callaway, K. Poolla, Price and capacity competition in zeromean storage and demand response markets, in 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (IEEE, Piscataway, 2012), pp. 1316– 1323 148. L. Xie, P.M. Carvalho, L.A. Ferreira, J. Liu, B.H. Krogh, N. Popli, M.D. Ili´c, Wind integration in power systems: operational challenges and possible solutions. Proc. IEEE 99(1), 214–232 (2010) 149. E.-K. Lee, R. Gadh, M. Gerla, Energy service interface: accessing to customer energy resources for smart grid interoperation. IEEE J. Sel. Areas Commun. 31(7), 1195–1204 (2013) 150. L. Lutzenhiser, Behavioral Assumptions Underlying California Residential Sector Energy Efficiency Programs. UC Berkeley: California Institute for Energy and Environment (CIEE) (2009)

412

References

151. J. Kwac, J. Flora, R. Rajagopal, Household energy consumption segmentation using hourly data. IEEE Trans. Smart Grid 5(1), 420–430 (2014) 152. J. Wong, R. Rajagopal, A simple way to use interval data to segment residential customers for energy efficiency and demand response program targeting, in ACEEE Summer Study on Energy Efficiency in Buildings (2012), pp. 374–386 153. R.C. Sonderegger, Dynamic Models of House Heating Based on Equivalent Thermal Parameters (Princeton University, Princeton, 1978) 154. B.J. Birt, G.R. Newsham, I. Beausoleil-Morrison, M.M. Armstrong, N. Saldanha, I.H. Rowlands, Disaggregating categories of electrical energy end-use from whole-house hourly data. Energy Build. 50, 93–102 (2012) 155. J.R. Cancelo, A. Espasa, R. Grafe, Forecasting the electricity load from one day to one week ahead for the Spanish system operator. Int. J. Forecast. 24(4), 588–602 (2008) 156. S. Borgeson, Targeted efficiency: Using customer meter data to improve efficiency program outcomes. Ph.D. Dissertation, Department of Energy Resources, University of California, Berkeley, CA, 2014 157. R. Mortensen, K. Haggerty, Dynamics of heating and cooling loads: models, simulation, and actual utility data. IEEE Trans. Power Syst. 5(1), 243–249 (1990) 158. M. Vašak, A. Starˇci´c, A. Martinˇcevi´c, Model predictive control of heating and cooling in a family house, in 2011 Proceedings of the 34th International Convention MIPRO (IEEE, Piscataway, 2011), pp. 739–743 159. D.L. Hahs-Vaughn, R.G. Lomax, Statistical concepts: A Second Course (Routledge, Milton Park, 2020) 160. A.M. Geoffrion, Solving bicriterion mathematical programs. Oper. Res. 15(1), 39–54 (1967) 161. M.I. Henig, Risk criteria in a stochastic knapsack problem. Oper. Res. 38(5), 820–825 (1990) 162. J. Kwac, R. Rajagopal, Data-driven targeting of customers for demand response. IEEE Trans. Smart Grid 7(5), 2199–2207 (2015) 163. M.I. Henig, The shortest path problem with two objective functions. Eur. J. Oper. Res. 25(2), 281–291 (1986) 164. D.P. Morton, R.K. Wood, On a stochastic knapsack problem and generalizations, in Advances in Computational and Stochastic Optimization, Logic Programming, and Heuristic Search (Springer, Berlin, 1998), pp. 149–168 165. G. Dantzig, Discrete variable extremum problems. J. Oper. Res. Soc. Amer. 3(4), 560–560 (1955) 166. B.C. Dean, M.X. Goemans, J. Vondrák, Approximating the stochastic knapsack problem: the benefit of adaptivity. Math. Oper. Res. 33(4), 945–964 (2008) 167. G.K. Tso, K.K. Yau, Predicting electricity energy consumption: a comparison of regression analysis, decision tree and neural networks. Energy 32(9), 1761–1768 (2007) 168. R.E. Edwards, J. New, L.E. Parker, Predicting future hourly residential electrical consumption: a machine learning case study. Energy Build. 49, 591–603 (2012) 169. US Department of Energy, Benefits of demand response in electricity markets and recommendations for achieving them. Technical Report (2006) 170. V.M. Balijepalli, V. Pradhan, S.A. Khaparde, R.M. Shereef, Review of demand response under smart grid paradigm, in IEEE PES Innovative Smart Grid Technologies-India (2011), pp. 236–243 171. H.P. Chao, Demand response in wholesale electricity markets: the choice of customer baseline. J. Regul. Econ. 39(1), 68 (2011) 172. R. Yin, P. Xu, M.A. Piette, S. Kiliccote, Study on Auto-DR and pre-cooling of commercial buildings with thermal mass in California. Energy Build. 42(7), 967–975 (2010) 173. A. Buege, M. Rufo, M. Ozog, D. Violette, S. McNicoll, Prepare for impact: Measuring large c/i customer response to dr programs, in ACEEE Summer Study on Energy Efficiency in Buildings (2006) 174. J. MacDonald, P. Cappers, D. Callaway, S. Kiliccote, Demand response providing ancillary services a comparison of opportunities and challenges in the US wholesale markets. GridInterop (2012)

References

413

175. H. Zhong, L. Xie, Q. Xia, Coupon incentive-based demand response: theory and case study. IEEE Trans. Power Syst. 28(2), 1266–1276 (2013) 176. K. Coughlin, M.A. Piette, C.A. Goldman, S. Kiliccote, Statistical analysis of baseline load models for non-residential buildings. Energy Build. 41, 374 (2009) 177. N. Addy, S. Kiliccote, J. Mathieu, D.S. Callaway, Understanding the effect of baseline modeling implementation choices on analysis of demand response performance, in ASME International Mechanical Engineering Congress and Exposition (2012) 178. K. Coughlin, M.A. Piette, C.A. Goldman, S. Kiliccote, Estimating Demand Response Load Impacts: Evaluation of Baseline Load Models for Non-residential Buildings in California. Lawrence Berkeley National Laboratory (2008) 179. V. Giordano, A. Meletiou, C.F. Covrig, A. Mengolini, M. Ardelean, G. Fulli, M.S. Jimnez, C. Filiou, Smart grid projects in Europe: Lessons learned and current developments. JRC Scientific and Policy Report (2014) 180. R. Sevlian, R. Rajagopal, Short term electricity load forecasting on varying levels of aggregation (2014). Preprint arXiv:1404.0058 181. T.K. Wijaya, M. Vasirani, K. Aberer, When bias matters: an economic assessment of demand response baselines for residential customers. IEEE Trans. Smart Grid 5(4), 1755–1763 (2014) 182. S. Mohajeryami, M. Doostan, P. Schwarz, The impact of customer baseline load (CBL) calculation methods on peak time rebate program offered to residential customers. Electric Power Syst. Res. 137(5), 59–65 (2016) 183. S. Mohajeryami, M. Doostan, A. Asadinejad, P. Schwarz, Error analysis of customer baseline load (CBL) calculation methods for residential customers. IEEE Trans. Ind. Appl. 53(1), 5–14 (2017) 184. G.R. Newsham, B.J. Birt, I.H. Rowlands, A comparison of four methods to evaluate the effect of a utility residential air-conditioner load control program on peak electricity use. Energy Policy 39, 6376–6389 (2011) 185. J.L. Bode, M.J. Sullivan, D. Berghman, J.H. Eto, Incorporating residential ac load control into ancillary service markets : measurement and settlement. Energy Policy 56, 175–185 (2013) 186. L. Hatton, P. Charpentier, E. Matzner-Lober, Statistical estimation of the residential baseline. IEEE Trans. Power Syst. 31, 1752–1759 (2016) 187. Y. Zhang, W. Chen, R. Xu, J. Black, A cluster-based method for calculating baselines for residential loads. IEEE Trans. Smart Grid 7(5), 2368–2377 (2016) 188. Y. Wi, J. Kim, S. Joo, J. Park, J. Oh, Customer baseline load (CBL) calculation using exponential smoothing model with weather adjustment, in Transmission and Distribution Conference and Exposition: Asia and Pacific (2009), p. 1 189. J.L. Mathieu, D.S. Callaway, S. Kiliccote, Examining uncertainty in demand response baseline models and variability in automated response to dynamic pricing, in IEEE Conference on Decision and Control and European Control Conference (2011) 190. D.S. Sayogo, A.P. Theresa, Understanding smart data disclosure policy success: The case of green button, in ACM Proceedings of the 14th Annual International Conference on Digital Government Research (2013) 191. C.E. Rasmussen, Gaussian Processes for Machine Learning (The MIT Press, Cambridge, 2006) 192. M. Ebden, Gaussian processes for regression: a quick introduction.WebsiteRobot. Res. Group Depart. Eng. Sci. Univer. Oxford 91, 424–436 (2008) 193. T. Hong, J. Wilson, J. Xie, Long term probabilistic load forecasting and normalization with hourly information. IEEE Trans. Smart Grid 5(1), 456–462 (2014) 194. Y. Weng, R. Negi, M. Ilic, Graphical model for state estimation in electric power systems, in IEEE International Conference on Smart Grid Communications (2013) 195. J.M.R.B. Cleveland, W.S. Cleveland, I. Terpenning, STL: a seasonal-trend decomposition procedure based on loess. J. Off. Stat. 6, 3 (1990) 196. R.B. Dagostino, A. Belanger, R.B. D’Agostino, A suggestion for using powerful and informative tests of normality. Amer. Statist. 44, 316–321 (1990)

414

References

197. C.M. Jarque, A.K. Bera, A test for normality of observations and regression residuals. Int. Statist. Rev. 55, 163–172 (1987) 198. J. Mathieu, P. Price, S. Kiliccote, M. Piette, Quantifying changes in building electricity use, with application to demand response. IEEE Trans. Smart Grid 2(3), 507–518 (2011) 199. G. Ridgeway, Generalized boosted models: A guide to the gbm package. Update 1.1 (2007) 200. E. R. C. of Texas, Ercot.com. (2020) [Online]. Available: http://www.ercot.com/ 201. Green the grid, green the grid: The role of storage and demand response. (2015) [Online]. Available: https://www.nrel.gov/docs/fy15osti/63041.pdf 202. U. D. of Energy, Benefit of demand response in electricity market and recommendations for achieving them. (2006) [Online]. Available: http://www.caiso.com/Documents/ DemandResponseandProxyDemandResourcesFrequentlyAskedQuestions.pdf 203. N. Y. I. S. O. (NYISO), Day-ahead demand response program manual. (2009) [Online]. Available: http://www.nyiso.com/public/webdocs/markets_operations/documents/Manuals_ and_Guides/Manuals/Operations/dadrp_mnl.pdf 204. C. ISO, Demand response user guide. (2015) [Online]. Available: http://www.caiso.com/ Documents/July20_2009InitialCommentsonPDAdoptingDRActivities_Budgets_20092011inDocketNos_A_08-06-001_etal_.pdf 205. C. ISO, Demand response & proxy demand resource-frequently asked (2011) [Online]. Available: http://www.caiso.com/Documents/ questions. DemandResponseandProxyDemandResourcesFrequentlyAskedQuestions.pdf 206. D. Hurley, P. Peterson, M. Whited, Demand Response as a Power System Resource (Synapse Energy Economics, Cambridge, 2013) 207. Airedale, Acis building energy management system. (2018) [Online]. Available: http:// airedale.com 208. F.A. Wolak, Residential Customer Response to Real-Time Pricing: The Anaheim Critical Peak Pricing Experiment (Center for the Study of Energy Markets, UC Berkeley, 2007) 209. N. Ruiz, I. Cobelo, J. Oyarzabal, A direct load control model for virtual power plant management. IEEE Trans. Power Syst. 24(2), 959–966 (2009) 210. A. Gholian, H. Mohsenian-Rad, Y. Hua, Optimal industrial load control in smart grid. IEEE Trans. Smart Grid 7(5), 2305–2316 (2015) 211. M. Parvania, M. Fotuhi-Firuzabad, M. Shahidehpour, Iso’s optimal strategies for scheduling the hourly demand response in day-ahead markets. IEEE Trans. Power Syst. 29(6), 2636– 2645 (2014) 212. ClearlyEnergy.com, Residential demand response programs. (2018) [Online]. Available: https://www.clearlyenergy.com/residential-demand-response-programs 213. Power2Choose.org (2018) [Online]. Available: http://www.powertochoose.org 214. B. Xia, H. Ming, K.-Y. Lee, Y. Li, Y. Zhou, S. Bansal, S. Shakkottai, L. Xie, Energycoupon: A case study on incentive-based demand response in smart grid, in Proceedings of the Eighth International Conference on Future Energy Systems (2017), pp. 80–90 215. C.A. Smith, The pecan street project: Developing the electric utility system of the future. Ph.D. Dissertation, Citeseer, 2009 216. E.A. E. G. Company (2020) [Online]. Available: https://www.enernoc.com/resources/ datasheets-brochures/get-more-commercial-and-industrial-demand-response 217. OhmConnect (2020) [Online]. Available: https://www.ohmconnect.com 218. H. Zhong, L. Xie, Q. Xia, Coupon incentive-based demand response (CIDR) in smart grid, in 2012 IEEE Power and Energy Society General Meeting (IEEE, Piscataway, 2012), pp. 1–6 219. H. Zhong, L. Xie, Q. Xia, Coupon incentive-based demand response: theory and case study. IEEE Trans. Power Syst. 28(2), 1266–1276 (2012) 220. H. Ming, L. Xie, Analysis of coupon incentive-based demand response with bounded consumer rationality, in 2014 North American Power Symposium (NAPS) (IEEE, Piscataway, 2014), pp. 1–6 221. G.A. Schwartz, H. Tembine, S. Amin, S.S. Sastry, Electricity demand shaping via randomized rewards: A mean field game approach, in Allerton Conference on Communication, Control, and Computing (2012)

References

415

222. J. Li, B. Xia, X. Geng, H. Ming, S. Shakkottai, V. Subramanian, L. Xie, Energy coupon: A mean field game perspective on demand response in smart grids, in Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (2015), pp. 455–456 223. G.A. Schwartz, H. Tembine, S. Amin, S.S. Sastry, Demand response scheme based on lotterylike rebates. IFAC Procee. Vol. 47(3), 4584–4588 (2014) 224. J. Li, B. Xia, X. Geng, H. Ming, S. Shakkottai, V. Subramanian, L. Xie, Mean field games in nudge systems for societal networks. ACM Trans. Model. Perform. Eval. Comput. Syst. 3(4), 1–31 (2018) 225. D. Merugu, B.S. Prabhakar, N. Rama, An incentive mechanism for decongesting the roads: A pilot program in Bangalore, in Proceedings of the ACM NetEcon Workshop (Citeseer, 2009) 226. I. NuRide (2020) [Online]. Available: https://www.nuride.com 227. A. Ju, Experiment makes energy savings a game. Cornell Chronicle (2014) 228. S. Pan, Y. Shen, Z. Sun, P. Mahajan, L. Zhang, P. Zhang, Demo abstract: Saving energy in smart commercial buildings through social gaming, in Proceedings of the 2013 ACM Conference on Pervasive and Ubiquitous Computing Adjunct Publication (2013), pp. 43–46 229. P. U. C. of Texas (2017) [Online]. Available: https://www.smartmetertexas.com/home 230. W. Underground, wunderground.com. [Online]. Available: https://www.wunderground.com 231. H.-p. Chao, Demand response in wholesale electricity markets: the choice of customer baseline. J. Regul. Econ. 39(1), 68–88 (2011) 232. F.J. Nogales, J. Contreras, A.J. Conejo, R. Espínola, Forecasting next-day electricity prices by time series models. IEEE Trans. Power Syst. 17(2), 342–348 (2002) 233. J. Contreras, R. Espinola, F.J. Nogales, A.J. Conejo, ARIMA models to predict next-day electricity prices. IEEE Trans. Power Syst. 18(3), 1014–1020 (2003) 234. A.J. Conejo, M.A. Plazas, R. Espinola, A.B. Molina, Day-ahead electricity price forecasting using the wavelet transform and ARIMA models. IEEE Trans. Power Syst. 20(2), 1035–1042 (2005) 235. L. Wu, M. Shahidehpour, A hybrid model for day-ahead price forecasting. IEEE Trans. Power Syst. 25(3), 1519–1530 (2010) 236. T. Jónsson, P. Pinson, H.A. Nielsen, H. Madsen, T.S. Nielsen, Forecasting electricity spot prices accounting for wind power predictions. IEEE Trans. Sustainable Energy 4(1), 210–218 (2012) 237. D. Muthirayan, D. Kalathil, K. Poolla, P. Varaiya, Mechanism design for self-reporting baselines in demand response, in 2016 American Control Conference (ACC) (IEEE, Piscataway, 2016), pp. 1446–1451 238. Wikipedia, K-nearest neighbors algorithm. (2019) [Online]. Available: https://en.wikipedia. org/wiki/K-nearest_neighbors_algorithm 239. Wikipedia, Kernel regression. (2019) [Online]. Available: https://en.wikipedia.org/wiki/ Kernel_regression 240. X. Shi, H. Ming, S. Shakkottai, L. Xie, J. Yao, Nonintrusive load monitoring in residential households with low-resolution data. Appl. Energy 252, 113283 (2019) 241. D. Kahneman, A. Tversky, Prospect theory: An analysis of decision under risk, in Handbook of the Fundamentals of Financial Decision Making: Part I (World Scientific, Singapore, 2013), pp. 99–127 242. A. Tversky, D. Kahneman, Advances in prospect theory: cumulative representation of uncertainty. J. Risk Uncertain. 5(4), 297–323 (1992) 243. R. Gonzalez, G. Wu, On the shape of the probability weighting function. Cogn. Psychol. 38(1), 129–166 (1999) 244. M. Abdellaoui, H. Bleichrodt, O. l’Haridon, A tractable method to measure utility and loss aversion under prospect theory. J. Risk Uncertain. 36(3), 245–266 (2008) 245. N.S.M.C. Martin, Power blows past $9,000 cap in Texas as heat triggers emergency (2019) [Online]. Available: https://www.bloomberg.com/news/articles/2019-08-13/texaspower-prices-briefly-surpass-9-000-amid-searing-heat

416

References

246. C. E. Commission, Sta forecast: Average retail electricity prices 2005 to 2018 (2018) [Online]. Available: http://www.energy.ca.gov/2007publications/CEC-200-2007-013/CEC-200-2007013-SD.PDF 247. M. Panteli, P. Mancarella, Modeling and evaluating the resilience of critical electrical power infrastructure to extreme weather events. IEEE Syst. J. 11(3), 1733–1742 (2015) 248. P. Hines, J. Apt, S. Talukdar, Large blackouts in North America: historical trends and policy implications. Energy Policy 37(12), 5249–5259 (2009) 249. J. Wang, H. Zhong, Z. Ma, Q. Xia, C. Kang, Review and prospect of integrated demand response in the multi-energy system. Appl. Energy 202, 772–782 (2017) 250. A. Jain, E. Srinivas, S. Raman, R.R. Gaddam, V. Haritha, N.V. Srinath, Sustainable energy plan for an Indian village, in 2010 International Conference on Power System Technology (IEEE, Piscataway, 2010), pp. 1–8 251. A. Kiraly, B. Pahor, Z. Kravanja, Achieving energy self-sufficiency by integrating renewables into companies’ supply networks. Energy 55, 46–57 (2013) 252. J. Widén, Improved photovoltaic self-consumption with appliance scheduling in 200 singlefamily buildings. Appl. Energy 126, 199–212 (2014) 253. P. Balcombe, D. Rigby, A. Azapagic, Energy self-sufficiency, grid demand variability and consumer costs: integrating solar PV, stirling engine CHP and battery storage. Appl. Energy 155, 393–408 (2015) 254. T. Beck, H. Kondziella, G. Huard, T. Bruckner, Assessing the influence of the temporal resolution of electrical load and PV generation profiles on self-consumption and sizing of PV-battery systems. Appl. Energy 173, 331–342 (2016) 255. S. Quoilin, K. Kavvadias, A. Mercier, I. Pappone, A. Zucker, Quantifying self-consumption linked to solar home battery systems: statistical analysis and economic assessment. Appl. Energy 182, 58–67 (2016) 256. V. Bertsch, J. Geldermann, T. Lühn, What drives the profitability of household PV investments, self-consumption and self-sufficiency? Appl. Energy 204, 1–15 (2017) 257. E. Nyholm, J. Goop, M. Odenberger, F. Johnsson, Solar photovoltaic-battery systems in swedish households–self-consumption and self-sufficiency. Appl. Energy 183, 148–159 (2016) 258. P.S. Georgilakis, Y.A. Katsigiannis, Reliability and economic evaluation of small autonomous power systems containing only renewable energy sources. Renew. Energy 34(1), 65–70 (2009) 259. X. Wang, Z. Li, M. Shahidehpour, C. Jiang, Robust line hardening strategies for improving the resilience of distribution systems with variable renewable resources. IEEE Trans. Sustainable Energy 10(1), 386–395 (2017) 260. S. Ma, L. Su, Z. Wang, F. Qiu, G. Guo, Resilience enhancement of distribution grids against extreme weather events. IEEE Trans. Power Syst. 33(5), 4842–4853 (2018) 261. Z. Wang, C. Shen, Y. Xu, F. Liu, X. Wu, C.-C. Liu, Risk-limiting load restoration for resilience enhancement with intermittent energy resources. IEEE Trans. Smart Grid 10(3), 2507–2522 (2018) 262. J. Wang, H. Zhong, C.-W. Tan, X. Chen, R. Rajagopal, Q. Xia, C. Kang, Economic benefits of integrating solar-powered heat pumps into a CHP system. IEEE Trans. Sustainable Energy 9(4), 1702–1712 (2018) 263. J. Wang, H. Zhong, Q. Xia, C. Kang, E. Du, Optimal joint-dispatch of energy and reserve for CCHP-based microgrids. IET Gener. Transm. Distrib. 11(3), 785–794 (2017) 264. A.W. Van der Vaart, Asymptotic Statistics, vol. 3 (Cambridge University Press, Cambridge, 2000) 265. J. Wang, H. Zhong, X. Lai, Q. Xia, Y. Wang, C. Kang, Exploring key weather factors from analytical modeling toward improved solar power forecasting. IEEE Trans. Smart Grid 10(2), 1417–1427 (2017) 266. M. Panteli, C. Pickering, S. Wilkinson, R. Dawson, P. Mancarella, Power system resilience to extreme weather: fragility modeling, probabilistic impact assessment, and adaptation measures. IEEE Trans. Power Syst. 32(5), 3747–3757 (2016)

References

417

267. Y. Wang, C. Chen, J. Wang, R. Baldick, Research on resilience of power systems under natural disasters—a review. IEEE Trans. Power Syst. 31(2), 1604–1613 (2015) 268. S.D. Guikema, R.A. Davidson, H. Liu, Statistical models of the effects of tree trimming on power system outages. IEEE Trans. Power Delivery 21(3), 1549–1557 (2006) 269. W. Yuan, J. Wang, F. Qiu, C. Chen, C. Kang, B. Zeng, Robust optimization-based resilient distribution network planning against natural disasters. IEEE Trans. Smart Grid 7(6), 2817– 2826 (2016) 270. M. Beenstock, E. Goldin, Y. Haitovsky, Response bias in a conjoint analysis of power outages. Energy Econ. 20(2), 135–156 (1998) 271. E. Leahy, R.S. Tol, An estimate of the value of lost load for Ireland. Energy Policy 39(3), 1514–1520 (2011) 272. J. Qin, R. Sevlian, D. Varodayan, R. Rajagopal, Optimal electric energy storage operation, in 2012 IEEE Power and Energy Society General Meeting (IEEE, Piscataway, 2012), pp. 1–6 273. S. Bose, E. Bitar, Variability and the locational marginal value of energy storage, in 53rd IEEE Conference on Decision and Control(IEEE, Piscataway, 2014), pp. 3259–3265 274. J. Wang, H. Zhong, Q. Xia, C. Kang, Optimal planning strategy for distributed energy resources considering structural transmission cost allocation. IEEE Trans. Smart Grid 9(5), 5236–5248 (2017) 275. J. Qin, I. Yang, R. Rajagopal, Submodularity of energy storage placement in power networks, in 2016 IEEE 55th Conference on Decision and Control (CDC) (IEEE, Piscataway, 2016), pp. 686–693 276. J. Wang, J. Qin, H. Zhong, R. Rajagopal, Q. Xia, C. Kang, Reliability value of distributed solar+ storage systems amidst rare weather events. IEEE Trans. Smart Grid 10(4), 4476–4486 (2018) 277. P.S. INC., Pecan street data. http://www.pecanstreet.org 278. T.N. R.E. Laboratory, PVWatts calculator. http://pvwatts.nrel.gov 279. CER and NIAUR, The value of lost load in 2010 decision paper. Dublin, Technical Report (2009) 280. S.D. Guikema, S.M. Quiring, S.-R. Han, Prestorm estimation of hurricane damage to electric power distribution systems. Risk Analy. Int. J. 30(12), 1744–1752 (2010) 281. T.T.H. Battery, Powerwall. http://www.tesla.com/powerall 282. W.E. Council, World energy resources solar (2016) http://www.worldenergy.org/wp-content/ uploads/2017/03/WEResources_Solar_2016.pdf 283. B. of Governors of the Federal Reserve System, Federal reserve issues FOMC statement. http://www.federalreserve.gov/newsevents/pressreleases/monetary20170614a.htm 284. L. Xie, A.A. Thatte, Y. Gu, Multi-time-scale modeling and analysis of energy storage in power system operations, in IEEE 2011 EnergyTech (IEEE, Piscataway, 2011), pp. 1–6 285. Advancing and maximizing the value of energy storage technology—A California roadmap. Technical Report [Online]. Available: https://www.caiso.com/informed/Pages/CleanGrid/ EnergyStorageRoadmap.aspx 286. S. Wogrin, D.F. Gayme, Optimizing storage siting, sizing, and technology portfolios in transmission-constrained networks. IEEE Trans. Power Syst. 30(6), 3304–3313 (2014) 287. T. Qiu, B. Xu, Y. Wang, Y. Dvorkin, D.S. Kirschen, Stochastic multistage coplanning of transmission expansion and energy storage. IEEE Trans. Power Syst. 32(1), 643–651 (2016) 288. B. Xu, Y. Wang, Y. Dvorkin, R. Fernández-Blanco, C.A. Silva-Monroy, J.-P. Watson, D.S. Kirschen, Scalable planning for energy storage in energy and reserve markets. IEEE Trans. Power Syst. 32(6), 4515–4527 (2017) 289. D. Pudjianto, M. Aunedi, P. Djapic, G. Strbac, Whole-systems assessment of the value of energy storage in low-carbon electricity systems. IEEE Trans. Smart Grid 5(2), 1098–1109 (2013) 290. Y. Dvorkin, R. Fernandez-Blanco, Y. Wang, B. Xu, D.S. Kirschen, H. Pandži´c, J.-P. Watson, C.A. Silva-Monroy, Co-planning of investments in transmission and merchant energy storage. IEEE Trans. Power Syst. 33(1), 245–256 (2017)

418

References

291. X. Geng, L. Xie, Data-driven decision making in power systems with probabilistic guarantees: theory and applications of chance-constrained optimization. Annu. Rev. Control. 47, 341–363 (2019) 292. M. Campi, S. Garatti, Wait-and-judge scenario optimization. Math. Program. 167, 1–35 (2016) 293. E. Nasrolahpour, J. Kazempour, H. Zareipour, W.D. Rosehart, Impacts of ramping inflexibility of conventional generators on strategic operation of energy storage facilities. IEEE Trans. Smart Grid 9(2), 1334–1344 (2016) 294. M.C. Campi, S. Garatti, The exact feasibility of randomized solutions of uncertain convex programs. SIAM J. Optim. 19, 1211–1230 (2008) 295. X. Geng, L. Xie, Chance-constrained unit commitment via the scenario approach, in 51st North American Power Symposium (2019) 296. D.S. Kirschen, Demand-side view of electricity markets. IEEE Trans. Power Syst. 18(2), 520– 527 (2003) 297. F.C. Schweppe, R.D. Tabors, J.L. Kirtley, H.R. Outhred, F.H. Pickel, A.J. Cox, Homeostatic utility control. IEEE Trans. Power Apparatus Syst. PAS-99(3), 1151–1163 (1980) 298. G. Strbac, Demand side management: benefits and challenges. Energy Policy 36(12), 4419– 4426 (2008) 299. A.M. Carreiro, H.M. Jorge, C.H. Antunes, Energy management systems aggregators: a literature survey. Renew. Sust. Energ. Rev. 73, 1160–1172 (2017) 300. U.E.I. Administration. (2015) Residential energy consumption survey. [Online]. Available: http://www.eia.gov/consumption/residential/index.cfm 301. D.S. Callaway, I.A. Hiskens, Achieving controllability of electric loads. Proc. IEEE 99(1), 184–199 (2010) 302. W. Turner, I. Walker, J. Roux, Peak load reductions: electric load shifting with mechanical pre-cooling of residential buildings with low thermal mass. Energy 82, 1057–1067 (2015) 303. J. Mathieu, M. Dyson, D. Callaway, A. Rosenfeld, Using residential electric loads for fast demand response: The potential resource and revenues, the costs, and policy recommendations, in ACEEE Summer Study on Energy Efficiency in Buildings (Citeseer, 2012), pp. 189–203 304. G. Heffner, Loads Providing Ancillary Services: Review of International Experience (Lawrence Berkeley National Laboratory, Berkeley, 2008) 305. H. Hao, Y. Lin, A.S. Kowli, P. Barooah, S. Meyn, Ancillary service to the grid through control of fans in commercial building HVAC systems. IEEE Trans. Smart Grid 5(4), 2066–2074 (2014) 306. D.-c. Wei, N. Chen, Air conditioner direct load control by multi-pass dynamic programming. IEEE Trans. Power Syst. 10(1), 307–313 (1995) 307. C. Kurucz, D. Brandt, S. Sim, A linear programming model for reducing system peak through customer load control programs. IEEE Trans. Power Syst. 11(4), 1817–1824 (1996) 308. T.-F. Lee, M.-Y. Cho, Y.-C. Hsiao, P.-J. Chao, F.-M. Fang, Optimization and implementation of a load control scheduler using relaxed dynamic programming for large air conditioner loads. IEEE Trans. Power Syst. 23(2), 691–702 (2008) 309. J.L. Mathieu, S. Koch, D.S. Callaway, State estimation and control of electric loads to manage real-time energy imbalance. IEEE Trans. Power Syst. 28(1), 430–440 (2012) 310. W. Zhang, J. Lian, C.-Y. Chang, K. Kalsi, Aggregated modeling and control of air conditioning loads for demand response. IEEE Trans. Power Syst. 28(4), 4655–4664 (2013) 311. D. Docimo, H.K. Fathy, Demand response using heterogeneous thermostatically controlled loads: characterization of aggregate power dynamics. J. Dyn. Syst. Meas. Control. 139(10), 101009 (2017) 312. F. Ruelens, B.J. Claessens, S. Vandael, B. De Schutter, R. Babuška, R. Belmans, Residential demand response of thermostatically controlled loads using batch reinforcement learning. IEEE Trans. Smart Grid 8(5), 2149–2159 (2016) 313. B.J. Claessens, D. Vanhoudt, J. Desmedt, F. Ruelens, Model-free control of thermostatically controlled loads connected to a district heating network. Energy Build. 159, 1–10 (2018)

References

419

314. A. Halder, X. Geng, P. Kumar, L. Xie, Architecture and algorithms for privacy preserving thermal inertial load management by a load serving entity. IEEE Trans. Power Syst. 32(4), 3275–3286 (2016) 315. A. Halder, X. Geng, F.A. Fontes, P. Kumar, L. Xie, Optimal power consumption for demand response of thermostatically controlled loads. Optimal Control Appl. Methods 40(1), 68–84 (2019) 316. E. Liu, P. You, P. Cheng, Optimal privacy-preserving load scheduling in smart grid, in 2016 IEEE Power and Energy Society General Meeting (PESGM) (IEEE, Piscataway, 2016), pp. 1–5 317. S. Bhattacharya, K. Kar, J.H. Chow, Economic operation of thermostatic loads under time varying prices: an optimal control approach. IEEE Trans. Sustainable Energy 10(4), 1960– 1970 (2018) 318. F.A. Fontes, A. Halder, J. Becerril, P. Kumar, Optimal control of thermostatic loads for planning aggregate consumption: characterization of solution and explicit strategies. IEEE Control Syst. Lett. 3(4), 877–882 (2019) 319. A. Conejo, E. Castillo, Locational marginal price sensitivities. IEEE Trans. Power Syst. 20(4), 2026–2033 (2005) 320. F. Li, Continuous locational marginal pricing (CLMP). IEEE Trans. Power Syst. 22(4), 1638– 1646 (2007) 321. F. Li, R. Bo, Congestion and price prediction under load variation. IEEE Trans. Power Syst. 24(2), 911–922 (2009) 322. R. Bo, F. Li, Probabilistic LMP forecasting considering load uncertainty. IEEE Trans. Power Syst. 24(3), 1279–1289 (2009) 323. R. Bo, F. Li, Efficient estimation of critical load levels using variable substitution method. IEEE Trans. Power Syst. 26(4), 2472–2482 (2011) 324. Q. Zhou, L. Tesfatsion, C. Liu, Short-term congestion forecasting in wholesale power markets. IEEE Trans. Power Syst. 26(4), 2185–2196 (2011) 325. Y. Ji, R. Thomas, L. Tong, Probabilistic forecast of real-time LMP via multiparametric programming, in Proceedings of the 48th Annual Hawaii International Conference on System Sciences (2015) 326. X. Geng, L. Xie, A data-driven approach to identifying system pattern regions in market operations, in IEEE Power and Energy Society General Meeting (2015) 327. D. Kirschen, G. Strbac, Fundamentals of Power System Economics (Wiley, Hoboken, 2005) 328. F. Wu, P. Varaiya, P. Spiller, S. Oren, Folk theorems on transmission access: proofs and counterexamples. J. Regul. Econ. 23, 5–23 (1996) 329. F. Borrelli, A. Bemporad, M. Morari, Geometric algorithm for multiparametric linear programming. J. Optim. Theory Appl. 118(3), 515–540 (2003) 330. I. Adler, R.D.C. Monteiro, A geometric view of parametric linear programming. Algorithmica 8(1–6), 161–176 (1992) 331. T. Gal, J. Nedoma, Multiparametric linear programming. Manag. Sci. 18(7), 406–422 (1972) 332. X. Geng, L. Xie, Learning the LMP-Load Coupling From Data: A Support Vector Machine Based Approach (2016). Preprint arXiv:1603.07276 [Online]. Available: http://arxiv.org/abs/ 1603.07276 333. M. Herceg, M. Kvasnica, C. Jones, M. Morari, Multi-parametric toolbox 3.0, in Proceedings of the European Control Conference (2013), pp. 502–510 334. D.A. Douglass, A.-A. Edris, Real-time monitoring and dynamic thermal rating of power transmission circuits. IEEE Trans. Power Delivery 11(3), 1407–1418 (1996) 335. J. Platt, Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 10, 61–74 (1999) 336. T. Hastie, R. Tibshirani et al., Classification by pairwise coupling. Ann. Stat. 26(2), 451–471 (1998) 337. I. I. of Technology, IEEE 118-bus, 54-unit, 24-hour system. [Online]. Available: http://motor. ece.iit.edu/data/JEAS_IEEE118.doc

420

References

338. X. Geng, Understand LMP-Load Coupling from A Market Participant Perspective: Theory, Examples and An SVM-based Data-driven Approach. Master Thesis, Texas A&M University, College Station, 2015 339. R.D. Zimmerman, C.E. Murillo-Sánchez, R.J. Thomas, MATPOWER: steady-state operations, planning, and analysis tools for power systems research and education. IEEE Trans. Power Syst. 26(1), 12–19 (2011) 340. Y. Ji, L. Tong, R.J. Thomas, Probabilistic Forecast of Real-Time LMP and Network Congestion (2015). Preprint arXiv:1503.06171 341. E. Litvinov, T. Zheng, G. Rosenwald, P. Shamsollahi, Marginal loss modeling in LMP calculation. IEEE Trans. Power Syst. 19(2), 880–888 (2004) 342. L. Lin, Z. Cunshan, S. Vittayapadung, S. Xiangqian, D. Mingdong, Opportunities and challenges for biodiesel fuel. Appl. Energy, 88(4), 1020–1031 (2011) 343. B. Dhinesh, Y.M.A. Raj, C. Kalaiselvan, R. KrishnaMoorthy, A numerical and experimental assessment of a coated diesel engine powered by high-performance nano biofuel. Energy Convers. Manag. 171, 815–824 (2018) 344. R. Vigneswaran, K. Annamalai, B. Dhinesh, R. Krishnamoorthy, Experimental investigation of unmodified diesel engine performance, combustion and emission with multipurpose additive along with water-in-diesel emulsion fuel. Energy Convers. Manag. 172, 370–380 (2018) 345. B. Dhinesh, M. Annamalai, A study on performance, combustion and emission behaviour of diesel engine powered by novel nano nerium oleander biofuel. J. Clean. Prod. 196, 74–83 (2018) 346. N. Nomura, A. Inaba, Y. Tonooka, M. Akai, Life-cycle emission of oxidic gases from powergeneration systems. Appl. Energy 68(2), 215–227 (2001) 347. A.A. Sánchez de la Nieta, V. González, J. Contreras, Portfolio decision of short-term electricity forecasted prices through stochastic programming. Energies 9(12), 1069 (2016) 348. A. Najafi, H. Falaghi, J. Contreras, M. Ramezani, Medium-term energy hub management subject to electricity price and wind uncertainty. Appl. Energy 168, 418–433 (2016) 349. P. Yang, G. Tang, A. Nehorai, A game-theoretic approach for optimal time-of-use electricity pricing. IEEE Trans. Power Syst. 28(2), 884–892 (2013) 350. D. Wang, H. Luo, O. Grunder, Y. Lin, H. Guo, Multi-step ahead electricity price forecasting using a hybrid model based on two-layer decomposition technique and bp neural network optimized by firefly algorithm. Appl. Energy 190, 390–407 (2017) 351. C. García-Martos, J. Rodríguez, M.J. Sánchez, Modelling and forecasting fossil fuels, CO2 and electricity prices and their volatilities. Appl. Energy 101, 363–375 (2013) 352. Z. Yang, L. Ce, L. Lian, Electricity price forecasting by a hybrid model, combining wavelet transform, ARMA and kernel-based extreme learning machine methods. Appl. Energy 190, 291–305 (2017) 353. J.P.d.S. Catalão, S.J.P.S. Mariano, V. Mendes, L. Ferreira, Short-term electricity prices forecasting in a competitive market: a neural network approach. Electr. Power Syst. Res. 77(10), 1297–1304 (2007) 354. A. Bello, D.W. Bunn, J. Reneses, A. Muñoz, Medium-term probabilistic forecasting of electricity prices: a hybrid approach. IEEE Trans. Power Syst. 32(1), 334–343 (2017) 355. J. Ma, M. Yang, X. Han, Z. Li, Ultra-short-term wind generation forecast based on multivariate empirical dynamic modeling. IEEE Trans. Ind. Appl. 54(2), 1029–1038 (2018) 356. C. Wan, Z. Xu, P. Pinson, Z.Y. Dong, K.P. Wong, Probabilistic forecasting of wind power generation using extreme learning machine. IEEE Trans. Power Syst. 29(3), 1033–1044 (2014) 357. J. Dowell, P. Pinson, Very-short-term probabilistic wind power forecasts by sparse vector autoregression. IEEE Trans. Smart Grid 7(2), 763–770 (2016) 358. J. Che, J. Wang, Short-term electricity prices forecasting based on support vector regression and auto-regressive integrated moving average modeling. Energy Convers. Manag. 51(10), 1911–1917 (2010)

References

421

359. B. Szkuta, L.A. Sanabria, T.S. Dillon, Electricity price short-term forecasting using artificial neural networks. IEEE Trans. Power Syst. 14(3), 851–857 (1999) 360. H. Drucker, C.J. Burges, L. Kaufman, A.J. Smola, V. Vapnik, Support vector regression machines, in Advances in Neural Information Processing Systems (1997), 155–161 361. Y. Chen, P. Xu, Y. Chu, W. Li, Y. Wu, L. Ni, Y. Bao, K. Wang, Short-term electrical load forecasting using the support vector regression (SVR) model to calculate the demand response baseline for office buildings. Appl. Energy 195, 659–670 (2017) 362. M. van Gerven, S. Bohte, Frontiers Research Topic: Artificial Neural Networks as Models of Neural Information Processing. Frontiers in Computational Neuroscience, Retrieved 20 Feb 2018, https://www.frontiersin.org/research-topics/4817/artificial-neural-networks-asmodels-of-neural-information-processing 363. P. Bento, J. Pombo, M. Calado, S. Mariano, A bat optimized neural network and wavelet transform approach for short-term price forecasting. Appl. Energy 210, 88–97 (2018) 364. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, 2016). http:// www.deeplearningbook.org 365. I.P. Panapakidis, A.S. Dagoumas, Day-ahead electricity price forecasting via the application of artificial neural network based models. Appl. Energy 172, 132–151 (2016) 366. L. Deng, D. Yu, Deep learning: methods and applications. Found. Trends® Signal Process. 7(3–4), 197–387 (2014) 367. Y. Bengio, Learning deep architectures for AI. Found. Trends® Mach. Learn. 2(1), 1–127 (2009) 368. D. Keles, J. Scelle, F. Paraschiv, W. Fichtner, Extended forecast methods for day-ahead electricity spot prices applying artificial neural networks. Appl. Energy 162, 218–230 (2016) 369. D.C. Sansom, T. Downs, T.K. Saha, Evaluation of support vector machine based forecasting tool in electricity price forecasting for Australian national electricity market participants. J. Electr. Electron. Eng. Aust. 22(3), 227 (2003) 370. H.Y.S. Tao, A.K. Srivastava, R.L. Pineda, P. Mandal, Wind power generation impact on electricity price in ERCOT, in Power and Energy Society General Meeting, 2012 IEEE (IEEE, Piscataway, 2012), pp. 1–7 371. http://www.ercot.com/. Accessed 20 Nov 2018 372. http://www.ercot.com/about/contact/inforequest. Accessed 20 Nov 2018 373. S. Luo, Y. Weng, A two-stage supervised learning approach for electricity price forecasting by leveraging different data sources. Appl. Energy 242, 1497–1512 (2019) 374. Smartac. [Online]. Available: https://www.pge.com/en_US/residential/save-energy-money/ savings-solutions-and-rebates/smart-ac/smart-ac.page 375. Energywise home. [Online]. Available: https://www.duke-energy.com/home/products/ energywise-home 376. A. Grandjean, J. Adnot, G. Binet, A review and an analysis of the residential electric load curve models. Renew. Sust. Energ. Rev. 16(9), 6539–6565 (2012) 377. Seminar Series Topics in Systems, Texas A&M University, p. 2015. https://cesg.tamu.edu/ seminars/topics-in-systems-seminar/ 378. C. Wang, Z. Wang, J. Wang, D. Zhao, SVM-based parameter identification for composite zip and electronic load modeling. IEEE Trans. Power Syst. 34(1), 182–193 (2018) 379. Z. Taylor, R. Pratt, The effects of model simplifications on equivalent thermal parameters calculated from hourly building performance data, in Proceedings of the 1988 ACEEE Summer Study on Energy Efficiency in Buildings, vol. 10 (1988), p. 268 380. D.B. Crawley, L.K. Lawrie, F.C. Winkelmann, W.F. Buhl, Y.J. Huang, C.O. Pedersen, R.K. Strand, R.J. Liesen, D.E. Fisher, M.J. Witte et al., Energyplus: creating a new-generation building energy simulation program. Energy Build. 33(4), 319–331 (2001) 381. M. Hu, F. Xiao, L. Wang, Investigation of demand response potentials of residential air conditioners in smart grids using grey-box room thermal model. Appl. Energy 207, 324–335 (2017) 382. X. Jin, K. Baker, D. Christensen, S. Isley, Foresee: a user-centric home energy management system for energy efficiency and demand response. Appl. Energy 205, 1583–1595 (2017)

422

References

383. X. Jin, J. Maguire, D. Christensen, Model predictive control of heat pump water heaters for energy efficiency, in 18th ACEEE Summer Study on Energy Efficiency in Buildings National Renewable Energy Laboratory (NREL), Golden, CO (2014), pp. 133–145 384. D. Lachut, N. Banerjee, S. Rollins, Predictability of energy use in homes, in International Green Computing Conference (IEEE, Piscataway, 2014), pp. 1–10 385. L.M. Candanedo, V. Feldheim, D. Deramaix, Data driven prediction models of energy use of appliances in a low-energy house. Energy Build. 140, 81–97 (2017) 386. A. Barbato, A. Capone, M. Rodolfi, D. Tagliaferri, Forecasting the usage of household appliances through power meter sensors for demand management in the smart grid, in 2011 IEEE International Conference on Smart Grid Communications (SmartGridComm) (IEEE, Piscataway, 2011), pp. 404–409 387. H. Kim, M. Marwah, M. Arlitt, G. Lyon, J. Han, Unsupervised disaggregation of low frequency power measurements, in Proceedings of the 2011 SIAM International Conference on Data Mining(SIAM, Philadelphia, 2011), pp. 747–758 388. Z. Guo, Z.J. Wang, A. Kashani, Home appliance load modeling from aggregated smart meter data. IEEE Trans. Power Syst. 30(1), 254–262 (2014) 389. W. Kong, Z.Y. Dong, D.J. Hill, J. Ma, J. Zhao, F. Luo, A hierarchical hidden markov model framework for home appliance modeling. IEEE Trans. Smart Grid 9(4), 3079–3090 (2016) 390. I. Ullah, R. Ahmad, D. Kim, A prediction mechanism of energy consumption in residential buildings using hidden markov model. Energies 11(2), 358 (2018) 391. Q. Duan, J. Liu, D. Zhao, Short term electric load forecasting using an automated system of model choice. Int. J. Electr. Power Energy Syst. 91, 92–100 (2017) 392. B. Stephen, S. Galloway, G. Burt, Self-learning load characteristic models for smart appliances. IEEE Trans. Smart Grid 5(5), 2432–2439 (2014) 393. A. Albert, R. Rajagopal, Smart meter driven segmentation: what your consumption says about you. IEEE Trans. Power Syst. 28(4), 4019–4030 (2013) 394. Dataport from pecan street. [Online]. Available: https://www.pge.com/en_US/residential/ save-energy-money/savings-solutions-and-rebates/smart-ac/smart-ac.page 395. G.W. Hart, Nonintrusive appliance load monitoring. Proc. IEEE 80(12), 1870–1891 (1992) 396. R. Tibshirani, G. Walther, T. Hastie, Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 63(2), 411–423 (2001) 397. S.-Z. Yu, Hidden semi-markov models. Artif. Intell. 174(2), 215–243 (2010) 398. L.R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989) 399. ERCOT quick facts. [Online]. Available: https://goo.gl/x7cCXc. Accessed 30 Sep 2018 400. 2016 LTSA update. [Online]. Available: https://goo.gl/9fo9FD. Accessed 30 Sep 2018 401. P.P. Varaiya, F.F. Wu, J.W. Bialek, Smart operation of smart grid: risk-limiting dispatch. Proc. IEEE 99(1), 40–57 (2011) 402. R. Entriken, P. Varaiya, F. Wu, J. Bialek, C. Dent, A. Tuohy, R. Rajagopal, Risk limiting dispatch, in 2012 IEEE Power and Energy Society General Meeting (2012), pp. 1–5 403. L. Wu, M. Shahidehpour, T. Li, Stochastic security-constrained unit commitment. IEEE Trans. Power Syst. 22(2), 800–811 (2007) 404. A. Papavasiliou, S.S. Oren, R.P.O. Neill, Reserve requirements for wind power integration: a stochastic programming framework. IEEE Trans. Power Syst. 26(4), 2197–2206 (2011) 405. D. Bertsimas, E. Litvinov, X.A. Sun, J. Zhao, T. Zheng, Adaptive robust optimization for the security constrained unit commitment problem. IEEE Trans. Power Syst. 28(1), 52–63 (2013) 406. N. Zhang, C. Kang, Q. Xia, Y. Ding, Y. Huang, R. Sun, J. Huang, J. Bai, A convex model of risk-based unit commitment for day-ahead market clearing considering wind power uncertainty. IEEE Trans. Power Syst. 30(3), 1582–1592 (2015) 407. D. Ross, S. Kim, Dynamic economic dispatch of generation. IEEE Trans. Power Apparatus Syst. PAS-99(6), 2060–2068 (1980) 408. Y. Gu, L. Xie, Early detection and optimal corrective measures of power system insecurity in enhanced look-ahead dispatch. IEEE Trans. Power Syst. 28(2), 1297–1307 (2013)

References

423

409. Z. Li, W. Wu, B. Zhang, H. Sun, Efficient location of unsatisfiable transmission constraints in look-ahead dispatch via an enhanced lagrangian relaxation framework. IEEE Trans. Power Syst. 30(3), 1–10 (2014) 410. Á. Lorca, X.A. Sun, Adaptive robust optimization with dynamic uncertainty sets for multiperiod economic dispatch. IEEE Trans. Power Syst. 30(4), 1702–1713 (2015) 411. Q. Wang, B.M. Hodge, Enhancing power system operational flexibility with flexible ramping products: a review. IEEE Trans. Ind. Inform. 13(4), 1652–1664 (2017) 412. H. Nosair, F. Bouffard, Economic dispatch under uncertainty: the probabilistic envelopes approach. IEEE Trans. Power Syst. 32(3), 1701–1710 (2017) 413. C. Tang, J. Xu, Y. Sun, J. Liu, X. LI, D. Ke, J. Yang, X. Peng, Look-ahead economic dispatch with adjustable confidence interval based on a truncated versatile distribution model for wind power. IEEE Trans. Power Syst. PP(99), 1–1 (2017) 414. Y. Gu, L. Xie, Stochastic look-ahead economic dispatch with variable generation resources. IEEE Trans. Power Syst. 32, 1–13 (2016) 415. A.A. Thatte, L. Xie, A metric and market construct of inter-temporal flexibility in timecoupled economic dispatch. IEEE Trans. Power Syst. 31(5), 3437–3446 (2016) 416. A.A. Thatte, X.A. Sun, L. Xie, Robust optimization based economic dispatch for managing system ramp requirement, in Proceedings of the Annual Hawaii International Conference on System Sciences (2014), pp. 2344–2352 417. A. Ben-Tal, A. Nemirovski, Robust convex optimization. Math. Oper. Res. 23(4), 769–805 (1998) 418. L. El Ghaoui, F. Oustry, H. Lebret, Robust solutions to uncertain semidefinite programs. SIAM J. Optim. 9(1), 33–52 (1998) 419. A. Ben-Tal, A. Nemirovski, On tractable approximations of uncertain linear matrix inequalities affected by interval uncertainty. SIAM J. Optim. 12(3), 811–833 (2002) 420. G. Calafiore, M.C. Campi, The scenario approach to robust control design. IEEE Trans. Autom. Control 51(5), 742–753 (2006) 421. A. Prékopa, Stochastic Programming (Springer Science & Business Media, Cham, 2013) 422. M.W. Tanner, L. Ntaimo, IIS branch-and-cut for joint chance-constrained stochastic programs and application to optimal vaccine allocation. Eur. J. Oper. Res. 207(1), 290–296 (2010) 423. B. Zeng, Y. An, L. Kuznia, Chance constrained mixed integer program: Bilinear and linear formulations, and benders decomposition (2014). Preprint arXiv:1403.7875. https://arxiv.org/ abs/1403.7875 424. A. Nemirovski, A. Shapiro, Convex approximations of chance constrained programs. SIAM J. Optim. 17(4), 969–996 (2006) 425. A. Ben-Tal, L. El Ghaoui, A. Nemirovski, Robust Optimization (Princeton University Press, Princeton, 2009) 426. M.C. Campi, S. Garatti, The exact feasibility of randomized solutions of uncertain convex programs. SIAM J. Optim. 19(3), 1211–1230 (2008) 427. M.C. Campi, S. Garatti, A sampling-and-discarding approach to chance-constrained optimization: feasibility and optimality. J. Optim. Theory Appl. 148(2), 257–280 (2011) 428. K. Margellos, P. Goulart, J. Lygeros, On the road between robust optimization and the scenario approach for chance constrained optimization problems. IEEE Trans. Autom. Control 59(8), 2258–2263 (2014) 429. S. Grammatico, X. Zhang, K. Margellos, P. Goulart, J. Lygeros, A scenario approach to non-convex control design: Preliminary probabilistic guarantees, in 2014 American Control Conference (2014), pp. 3431–3436 430. M. Vrakopoulou, K. Margellos, J. Lygeros, G. Andersson, A probabilistic framework for reserve scheduling and N-1 security assessment of systems with high wind power penetration. IEEE Trans. Power Syst. 28(4), 3885–3896 (2013) 431. Y. Zhang, S. Shen, J.L. Mathieu, Distributionally robust chance-constrained optimal power flow with uncertain renewables and uncertain reserves provided by loads. IEEE Trans. Power Syst. 32(2), 1378–1388 (2017)

424

References

432. M. Vrakopoulou, B. Li, J.L. Mathieu, Chance constrained reserve scheduling using uncertain controllable loads part I: formulation and scenario-based analysis. IEEE Trans. Smart Grid 10, 1608–1617 (2017) 433. H. Ming, L. Xie, M.C. Campi, S. Garatti, P.R. Kumar, Scenario-based economic dispatch with uncertain demand response. IEEE Trans. Smart Grid 10, 1858–1868 (2017) 434. M.C. Campi, S. Garatti, Wait-and-judge scenario optimization. Math. Programm. 167(1), 155–189 (2018) 435. A.J. Wood, B.F. Wollenberg, Power Generation, Operation, and Control (Wiley, Hoboken, 2012) 436. G. Calafiore, M.C. Campi, Uncertain convex programs: randomized solutions and confidence levels. Math. Programm. 102(1), 25–46 (2005) 437. A.A. Thatte, Y. Li, L. Xie, Managing system ramp flexibility by utilizing price-responsive demand: An empirical assessment, in 2016 49th Hawaii International Conference on System Sciences (HICSS) (IEEE, Piscataway, 2016), pp. 2345–2353 438. H.A. Nasir, T. Zhao, A. Carè, Q.J. Wang, and E. Weyer, Efficient river management using stochastic MPC and ensemble forecast of uncertain in-flows, in IFAC-PapersOnLine (Elsevier, Amsterdam, 2018), pp. 37–42 439. S. Garatti, M.C. Campi, Modulating robustness in control design: principles and algorithms. IEEE Control Syst. 33(2), 36–51 (2013) 440. T. Zheng, E. Litvinov, On ex post pricing in the real-time electricity market. IEEE Trans. Power Syst. 26(1), 153–164 (2011) 441. T. Zheng, E. Litvinov, Ex post pricing in the co-optimized energy and reserve market. IEEE Trans. Power Syst. 21(4), 1528–1538 (2006) 442. A.L. Ott, Experience with PJM market operation, system design, and implementation. IEEE Trans. Power Syst. 18(2), 528–534 (2003) 443. M.S. Modarresi, L. Xie, M.C. Campi, S. Garatti, A. Care, A.A. Thatte, P. Kumar, Scenariobased economic dispatch with tunable risk levels in high-renewable power systems. IEEE Trans. Power Syst. 34(6), 5103–5114 (2018) 444. F. Li, Y. Wei, S. Adhikari, Improving an unjustified common practice in ex post LMP calculation. IEEE Trans. Power Syst. 25(2), 1195–1197 (2010) 445. D.H. Choi, L. Xie, Economic impact assessment of topology data attacks with virtual bids. IEEE Trans. Smart Grid 9(2), 512–520 (2018) 446. W.W. Hogan et al., Financial transmission rights, revenue adequacy and multi-settlement electricity markets. unpublished.[Online]. Available Harvard University web site: http://www. hks.harvard. edu/fs/whogan/Hogan_FTR_Rev_Adequacy_, vol. 31813 (2012) 447. M.C. Campi, S. Garatti, M. Prandini, The scenario approach for systems and control design. IFAC Proc. Vol. 41(2), 381–389 (2008) 448. G. Schildbach, L. Fagiano, M. Morari, Randomized solutions to convex programs with multiple chance constraints. SIAM J. Optim. 23(4), 2479–2501 (2013) 449. X. Zhang, S. Grammatico, G. Schildbach, P. Goulart, J. Lygeros, On the sample size of random convex programs with structured dependence on the uncertainty. Automatica 60, 182– 188 (2015) 450. A. Carè, S. Garatti, M.C. Campi, Fast-fast algorithm for the scenario technique. Oper. Res. 62(3), 662–671 (2014) 451. M.C. Campi, A. Carè, Random convex programs with .L1 -regularization: sparsity and generalization. SIAM J. Control Optim. 51(5), 3532–3557 (2013) 452. V.L. Levin, Application of E. Helly’s theorem to convex programming, problems of best approximation and related questions. Sbornik: Math. 8(2), 235–247 (1969) 453. A.B. Birchfield, K.M. Gegner, T. Xu, K.S. Shetye, T.J. Overbye, Statistical considerations in the creation of realistic synthetic power grids for geomagnetic disturbance studies. IEEE Trans. Power Syst. 32(2), 1502–1510 (2017) 454. R.D. Zimmerman, C.E. Murillo-Sanchez, R.J. Thomas, MATPOWER: Steady-state operations, planning, and analysis tools for power systems research and education. IEEE Trans. Power Syst. 26(1), 12–19 (2011)

References

425

455. California ISO, Demand response & proxy demand resource-frequently asked questions. https://www.caiso.com/Documents/DemandResponseandProxyDemandResourcesFrequently AskedQuestions.pdf 456. California ISO, Initial comments of the California independent system operator corporation on the proposed decision adopting demand response activities and budgets for 2009 through (2011) 457. New York Independent System Operator (NYISO), Day-ahead demand response program manual. https://www.nyiso.com/documents/20142/2923301/dadrp_mnl.pdf/4c07c957-e5ad11c0-bf35-b380c58d699a 458. J. McAnany, 2014 demand response operations–market activity report: May (2015) 459. J.L. Mathieu, D.S. Callaway, S. Kiliccote, Examining uncertainty in demand response baseline models and variability in automated responses to dynamic pricing, in 2011 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC) (IEEE, Piscataway, 2011), pp. 4332–4339 460. J.L. Mathieu, M. Gonzalez Vaya, G. Andersson, Uncertainty in the flexibility of aggregations of demand response resources, in Industrial Electronics Society, IECON 2013-39th Annual Conference of the IEEE (IEEE, Piscataway, 2013), pp. 8052–8057 461. J. Zhao, F. Wen, Y. Xue, Z. Dong, J. Xin, Power system stochastic economic dispatch considering uncertain outputs from plug-in electric vehicles and wind generators. Dianli Xitong Zidonghua (Autom. Electric Power Syst.) 34(20), 22–29 (2010) 462. M. Elshahed, M. Elmarsfawy, H. Zeineldin, A here-and-now stochastic economic dispatch with nonsmooth fuel cost function and emission constraint. Online J. Electron. Electri. Eng. 3(4), 484–489 (2011) 463. J. Dhillon, S. Parti, D. Kothari, Stochastic economic emission load dispatch. Electr. Pow. Syst. Res. 26(3), 179–186 (1993) 464. A.A. Thatte, X.A. Sun, L. Xie, Robust optimization based economic dispatch for managing system ramp requirement, in 2014 47th Hawaii International Conference on System Sciences (HICSS) (IEEE, Piscataway, 2014), pp. 2344–2352 465. Á. Lorca, X.A. Sun, E. Litvinov, T. Zheng, Multistage adaptive robust optimization for the unit commitment problem. Oper. Res. 64, 32–51 (2016) 466. M.C. Campi, S. Garatti, M. Prandini, The scenario approach for systems and control design. Annu. Rev. Control. 33(2), 149–157 (2009) 467. Benefits of demand response in electricity markets and recommendations for achieving them, Tech. Rep. 2006, US Dept. Energy, Washington, DC, USA (2006). https://citeseerx.ist.psu. edu/document?repid=rep1&type=pdf&doi=a6a4ea518799c16a26f76597182df1cd0555c78e 468. H. Ming, L. Xie, Analysis of coupon incentive-based demand response with bounded consumer rationality, in North American Power Symposium (NAPS), 2014 (IEEE, Piscataway, 2014), pp. 1–6 469. J. An, P. Kumar, L. Xie, On transfer function modeling of price responsive demand: An empirical study, in Power & Energy Society General Meeting, 2015 IEEE (IEEE, Piscataway, 2015), pp. 1–5 470. S.C. Edision, Distribution resources plan applications (2015). [Online]. Available: http://dl. acm.org/citation.cfm?id=945365.964303 471. N.R.E. Laboratory and U.D. of Energy, Sunshot vision study. http://energy.gov/eere/sunshot/ sunshot-vision-study, p. 7 (2012) 472. T.S. Foundation, 2013 national solar jobs census. http://thesolarfoundation.org/research/ national-solar-jobs-census-2013 (2014) 473. O. Samuelsson, M. Hemmingsson, A.H. Nielsen, K.O.H. Pedersen, J. Rasmussen, Monitoring of power system events at transmission and distribution level. IEEE Trans. Power Syst. 21, 1007–1008 (2006) 474. E. Martinot, L. Kristov, J. Erickson, Distribution system planning and innovation for distributed energy futures. Current Sustainable/Renew. Energy Repor. Springer Int. Pub. 2(2), 47 (2015)

426

References

475. W.H. Kersting, A method to teach the design and operation of a distribution system. IEEE Trans. Power Apparatus Syst. PAS-103, 1945–1952 (1984) 476. Y. Weng, C. Faloutsos, M.D. Ilic, Powerscope: Early event detection and identification in electric power systems, in The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2nd International Workshop on Data Analytics for Renewable Energy Integration (2014) 477. W. Post. Utilities sensing threat put squeeze on booming solar roof industry. [Online]. Available: https://www.washingtonpost.com/ 478. Y. Weng, M.D. Ilic, Q. Li, R. Negi, Convexification of bad data and topology error detection and identification problems in AC electric power systems. IET Gener. Transm. Distrib. 9(16), 2760–2767 (2015) 479. Y. Weng, M.D. Ilic, Q. Li, R. Negi, Distributed algorithms for convexified bad data and topology error detection and identification problems. Int. J. Electr. Power Energy Syst. 83, 241–250 (2016) 480. V. Vittal, The impact of renewable resources on the performance and reliability of the electricity grid. Bridge 40(1), 5 (2010) 481. Y. Weng, C. Faloutsos, M.D. Ilic, Data-driven topology estimation, in IEEE SmartGridComm Symposium (SGC) (2014) 482. Y. Liao, Y. Weng, M. Wu, R. Rajagopal, Distribution grid topology reconstruction: An information theoretic approach, in North American Power Symposium (NAPS) (2015) 483. Y. Liao, Y. Weng, R. Rajagopal, Urban distribution grid topology reconstruction via lasso, in IEEE Power Energy Society General Meeting (PESGM) (2016) 484. E. Bueno, C. Lyra, C. Cavellucci, Distribution network reconfiguration for loss reduction with variable demands, in Transmission and Distribution Conference and Exposition: Latin America, 2004 IEEE/PES (IEEE, Piscataway, 2004), pp. 384–389 485. O.F. Fajardo, A. Vargas, Reconfiguration of mv distribution networks with multicost and multipoint alternative supply, part II: reconfiguration plan. IEEE Trans. Power Syst. 23(3), 1401–1407 (2008) 486. R.A. Jabr, Minimum loss operation of distribution networks with photovoltaic generation. IET Renew. Power Gener. 8(1), 33–44 (2014) 487. C. Rudin, D. Waltz, R.N. Anderson, A. Boulanger, A. Salleb-Aouissi, M. Chow, H. Dutta, P. Gross, B. Huang, S. Ierome, D. Isaac, A. Kressner, R.J. Passonneau, A. Radeva, L. Wu, Machine learning for the New York city power grid. IEEE Trans. Pattern Analy. Mach. Intell. 34(2), 328–345 (2012) 488. C. Rudin, S. Ertekin, R. Passonneau, A. Radeva, A. Tomar, B. Xie, S. Lewis, M. Riddle, D. Pangsrivinij, J. Shipman, T. McCormick, Analytics for power grid distribution reliability in New York city. Interfaces 44(4), 364–383 (2014) 489. M. Farivar, New distributed controls to expand the grid capacity for renewable energy. California Institute of Technology (2011) 490. G. Cavraro, R. Arghandeh, G. Barchi, A. von Meier, Distribution network topology detection with time-series measurements, in Innovative Smart Grid Technologies Conference, 2015 IEEE Power Energy Society (2015), pp. 1–5 491. R. Lugtu, D. Hackett, K. Liu, D. Might, Power system state estimation: detection of topological errors. IEEE Trans. Power Apparatus Syst. PAS-99(6), 2406–2411 (1980) 492. M.R. Dorostkar-Ghamsari, M. Fotuhi-Firuzabad, M. Lehtonen, A. Safdarian, Value of distribution network reconfiguration in presence of renewable energy resources. IEEE Trans. Power Syst. 31(3), 1879–1888 (2016) 493. S. Bolognani, N. Bof, D. Michelotti, R. Muraro, L. Schenato, Identification of power distribution network topology via voltage correlation analysis, in IEEE 52nd Annual Conference on Decision and Control (2013), pp. 1659–1664 494. S. Xu, R.C. de Lamare, H.V. Poor, Dynamic topology adaptation for distributed estimation in smart grids, in IEEE 5th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (2013), pp. 420–423

References

427

495. J. Huang, V. Gupta, Y.-F. Huang, Electric grid state estimators for distribution systems with microgrids, in 46th Annual Conference on Information Sciences and Systems (2012), pp. 1–6 496. D. Deka, S. Backhaus, M. Chertkov, Structure learning in power distribution networks (2015). Preprint arXiv:1501.04131 497. G. Cavraro, R. Arghandeh, A. von Meier, Distribution network topology detection with time series measurement data analysis (2015). Preprint arXiv:1504.05926 498. Y. Sharon, A.M. Annaswamy, A.L. Motto, A. Chakraborty, Topology identification in distribution network with limited measurements, in 2012 IEEE Power and Energy Society General Meeting: Innovative Smart Grid Technologies (IEEE, Piscataway, 2012), pp. 1–6 499. G.N. Korres, N.M. Manousakis, A state estimation algorithm for monitoring topology changes in distribution systems, in 2012 IEEE Power and Energy Society General Meeting (IEEE, Piscataway, 2012), pp. 1–8 500. M. Baran, J. Jung, T. McDermott, Topology error identification using branch current state estimation for distribution systems, in IEEE Transmission & Distribution Conference & Exposition: Asia and Pacific (2009), pp. 1–4 501. R. Arghandeh, M. Gahr, A. von Meier, G. Cavraro, M. Ruh, G. Andersson, Topology detection in microgrids with micro-synchrophasors (2015). Preprint arXiv:1502.06938 502. J. Yu, Y. Weng, R. Rajagopal, Probabilistic estimation of the potentials of intervention-based demand side energy management, in IEEE SmartGridComm Symposium (SGC) (2015) 503. A.V. Meier, D. Culler, A. McEachern, Micro-synchrophasors for distribution systems, in IEEE 5th Innovative Smart Grid Technologies Conference (2014) 504. S. City, Powerguide app (2015). http://www.solarcity.com/residential/energy-monitoringsystem 505. Y. Weng, Q. Li, R. Negi, M.D. Ili´c, Semidefinite programming for power system state estimation, in IEEE Power and Energy Society General Meeting (2012) 506. Y. Weng, Q. Li, M. Ilic, R. Negi, Distributed algorithm for sdp state estimation, in IEEE Innovative Smart Grid Technology Conference (2013) 507. C. Chow, C. Liu, Approximating discrete probability distributions with dependence trees. IEEE Trans. Inform. Theory 14(3), 462–467 (1968) 508. I.PES. Distribution test feeders. [Online]. Available: http://ewh.ieee.org/soc/pes/dsacom/ testfeeders/ 509. R.D. Zimmerman, C.E. Murillo-Sanchez, R.J. Thomas, MATPOWER’s extensible optimal power flow architecture, in IEEE Power and Energy Society General Meeting (2009), pp. 1–7 510. R.D. Zimmerman, C.E. Murillo-Sanchez, Matpower, a matlab power system simulation package. http://www.pserc.cornell.edu/matpower/manual.pdf (2010) 511. I. of Energy Systems, E. Drives, Adres-dataset. Vienna University of Technology (2016) [Online]. Available: http://www.ea.tuwien.ac.at/projects/adres_concept/EN/ 512. R.C. Dugan, Reference guide: The open distribution system simulator (OpenDSS). Electric Power Research Institute, Inc. (2012) 513. J. Williamson, Approximating discrete probability distributions with Bayesian networks, in Proceedings of the International Conference on AI in Science & Technology (2000) 514. T.M. Cover, J.A. Thomas, Elements of Information Theory (Wiley, Hoboken, 2012) 515. Y. Weng, Y. Liao, R. Rajagopal, Distributed energy resources topology identification via graphical modeling. IEEE Trans. Power Syst. 32(4), 2682–2694 (2016) 516. J.B. Kruskal, On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Amer. Math. Soc. 7(1), 48–50 (1956) 517. T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein et al., Introduction to Algorithms, vol. 2 (MIT Press, Cambridge, 2001) 518. W.H. Kersting, Radial distribution test feeders, in IEEE Power Engineering Society Winter Meeting, vol. 2 (2001), pp. 908–912 519. A.P. Dobos, PVWatts version 5 manual. National Renewable Energy Laboratory (2014) 520. A. Einfalt, A. Schuster, C. Leitinger, D. Tiefgraber, M. Litzlbauer, S. Ghaemi, D. Wertz, A. Frohner, C. Karner, Adres-concept: Konzeptentwicklung für adres-autonome dezentrale regenerative energiesysteme. TU Wien, Institut für Elektrische Anlagen und Energiewirtschaft (2011)

428

References

521. Eaton, Eaton power xpert meter 4000/6000/8000 tech data. Manual, p. 2 (2016) [Online]. Available: http://www.eaton.com/ 522. Varldsnaturfonden WWF, The Energy Report.” [Online]. Available: http://www.wwf.se/ source.php/1339217 523. M. Aien, M. Rashidinejad, S. Kouhi, M. Fotuhi-Firuzabad, S.N. Ravadanegh, Real time probabilistic power system state estimation. Int. J. Electr. Power Energy Syst. 62, 383–390 (2014) 524. B. Borkowska, Probabilistic load flow. IEEE Trans. Power Apparatus Syst. 93(3), 752 (1974) 525. J.F. Dopazo, O.A. Klitin, A.M. Sasson, Stochastic load flows. IEEE Trans. Power Syst. 94(2), 299–309 (1975) 526. M.T. Schilling, A.M.L. da Silva, R. Billinton, M.A. El-Kafy, Bibliography on power system probabilistic analysis (1962–1988). IEEE Trans. Power Syst. 5(1), 1–11 (1990) 527. G. Verbic, C.A. Canizares, Probabilistic optimal power flow in electricity markets based on a two-point estimate method. IEEE Trans. Power Syst. 21(4), 1883–1893 (2006) 528. F.C. Schweppe, Uncertain Dynamic Systems (Prentice-Hall, Englewood Cliffs, 1973) 529. P. Lajda, Short-term operation planning in electric power systems. J. Oper. Res. Soc. 32(8), 675–682 (1981) 530. North American Electric Reliability Corporation, Planning reserve margin. [Online]. Available: http://www.nerc.com/pa/RAPA/ri/Pages/PlanningReserveMargin.aspx 531. J.P. Pfeifenberger, K. Spees, K. Carden, N. Wintermantel, Resource adequacy requirements: Reliability and economic implications. The Brattle Group (2013) 532. J. Bebic, Power system planning: Emerging practices suitable for evaluating the impact of high-penetration photovoltaics. National Renewable Energy Laboratory (2008) 533. T.L. Vandoorn, B. Renders, L. Degroote, B. Meersman, L. Vandevelde, Active load control in islanded microgrids based on the grid voltage. IEEE Trans. Smart Grid 2(1), 139–151 (2011) 534. L. Schenato, G. Barchi, D. Macii, R. Arghandeh, K. Poolla, A.V. Meier, Bayesian linear state estimation using smart meters and pmus measurements in distribution grids, in IEEE International Conference on Smart Grid Communications (SmartGridComm) (2014), pp. 572–577 535. R. Dobbe, D. Arnold, S. Liu, D. Callaway, C. Tomlin, Real-time distribution grid state estimation with limited sensors and load forecasting, in ACM/IEEE 7th International Conference on Cyber-Physical Systems (ICCPS) (2016), p. 110 536. P. Flach, Machine Learning: The Art and Science of Algorithms that Make Sense of Data (Cambridge University Press, Cambridge, 2012) 537. R. Singh, B.C. Pal, R.A. Jabr, Choice of estimator for distribution system state estimation. IET Gener. Transm. Distrib. 3(7), 666–678 (2009) 538. F.F. Wu, Power system state estimation: a survey. Int. J. Electr. Power Eng. 12, 80–87 (1990) 539. A.G. Exposito, A. Abur, A.V. Jaen, C.G. Quiles, A multilevel state estimation paradigm for smart grids. Proc. IEEE 99, 952 (2011) 540. J. Yedidia, W.T. Freeman, Y. Weiss, Understanding belief propagation and its generalizations, in International Joint Conference on Artificial Intelligence (2001) 541. C.M. Bishop, Pattern Recognition and Machine Learning (Springer, Berlin, 2006) 542. Y. Hu, A. Kuh, A. Kavcic, D. Nakafuji, Micro-grid state estimation using belief propagation on factor graphs, in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (2010) 543. Y. Hu, A. Kuh, A. Kavcic, T. Yang, A belief propagation based power distribution system state estimator. IEEE Comput. Intell. Mag. 6(3), 36–46 (2011) 544. T.J.M.J. Wainwright, A.S. Willsky, A new class of upper bounds on the log partition function. IEEE Trans. on Inform. Theory 51, 2313–2335 (2005) 545. M.J. Wainwright, M.I. Jordan, Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1, 1–305 (2008) 546. V. Kolmogorov, Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Analy. Mach. Intell. 28(10), 1568–1583 (2006)

References

429

547. P. Henneaux, D.S. Kirschen, Probabilistic security analysis of optimal transmission switching. IEEE Trans. Power Syst. 31(1), 508–517 (2016) 548. H. Wang, X. Xu, Z. Yan, Z. Yang, N. Feng, Y. Cui, Probabilistic static voltage stability analysis considering the correlation of wind power, in International Conference on Probabilistic Methods Applied to Power Systems (PMAPS) (2016), p. 16 549. M. Fan, V. Vittal, G.T. Heydt, R. Ayyanar, Probabilistic power flow analysis with generation dispatch including photovoltaic resources. IEEE Trans. Power Syst. 28(2), 1797–1805 (2013) 550. C. Wang, D.M. Blei, A general method for robust Bayesian modeling (2016). Preprint arXiv:1510.05078 551. J. Pearl, Reverend bayes on inference engines: A distributed hierarchical approach, in Proceedings of the Second National Conference on Artificial Intelligence (1982), pp. 133– 136 552. C. Bishop, Pattern Recognition and Machine Learning (Springer, Berlin, 2007) 553. R.N. B. Narayanaswamy, Y. Rachlin, P. Khosla, The sequential decoding metric for detection in sensor networks, in Proceedings of the IEEE International Symposium Information Theory (2007) 554. Y. Weng, R. Negi, M. Ilic, A search method for obtaining initial guesses for smart grid state estimation, in IEEE SmartGridComm Symposium (2012) 555. Y. Weng, R. Negi, M. Ilic, Historical data-driven state estimation for electric power systems, in IEEE SmartGridComm Symposium (2013) 556. M. Gol, A. Abur, A hybrid state estimator for systems with limited number of PMUs. IEEE Trans. Power Syst. 30(3), 1511–1517 (2015) 557. K. Das, J. Hazra, D.P. Seetharam, R.K. Reddi, A.K. Sinha, Real-time hybrid state estimation incorporating SCADA and PMU measurements, in Innovative Smart Grid Technologies (ISGT Europe) (2012) 558. R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in International Joint Conference on Artificial Intelligence, vol. 14, no. 2 (1995) pp.1137–1145 559. A.Z. Broder, Generating random spanning trees, in 30th Annual Symposium on Foundations of Computer Science (1989), pp. 442–447 560. NYISO, Load data profile. http://www.nyiso.com (2012) 561. N. Biggs, Algebraic Graph Theory (Cambridge University, Cambridge Press, 1993) 562. E. Caro, G. Valverde, Impact of transformer correlations in state estimation using the unscented transformation. IEEE Trans. Power Syst. 29(1), 368–376 (2014) 563. M. Farivar, R. Neal, C. Clarke, S. Low, Optimal inverter VAR control in distribution systems with high PV penetration, in 2012 IEEE Power and Energy Society General Meeting 564. J. Seuss, M.J. Reno, R.J. Broderick, S. Grijalva, Improving distribution network PV hosting capacity via smart inverter reactive power support, in 2015 IEEE Power Energy Society General Meeting (2015), pp. 1–5 565. V. Kekatos, G. Wang, A.J. Conejo, G.B. Giannakis, Stochastic reactive power management in microgrids with renewables. IEEE Trans. Power Syst. 30(6), 3386–3395 (2015) 566. D.G. Photovoltaics, E. Storage, IEEE standard for interconnection and interoperability of distributed energy resources with associated electric power systems interfaces. IEEE Std. 1547, 1547–2018 (2018) 567. X. Su, M.A.S. Masoum, P.J. Wolfs, Optimal PV inverter reactive power control and real power curtailment to improve performance of unbalanced four-wire LV distribution networks. IEEE Trans. Sustainable Energy 5(3), 967–977 (2014) 568. B. Zhang, A.Y. Lam, A.D. Dominguez-Garcia, D. Tse, An optimal and distributed method for voltage regulation in power distribution systems. IEEE Trans. Power Syst. 30(4), 1714–1726 (2015) 569. H. Zhu, H.J. Liu, Fast local voltage control under limited reactive power: optimality and stability analysis. IEEE Trans. Power Syst. 31(5), 3794–3803 (2016)

430

References

570. W. Lin, R. Thomas, E. Bitar, Real-time voltage regulation in distribution systems via decentralized PV inverter control, in Proceedings of the 51st Hawaii International Conference on System Sciences (2018) 571. G. Qu, N. Li, Optimal distributed feedback voltage control under limited reactive power. IEEE Trans. Power Syst. 35(1), 315–331 (2019) 572. S. Magnússon, G. Qu, C. Fischione, N. Li, Voltage control using limited communication. IEEE Trans. Control Netw. Syst. 6(3), 993–1003 (2019) 573. R.E. Helou, D. Kalathil, L. Xie, Communication-free voltage regulation in distribution networks with deep PV penetration, in Proceedings of the 53rd Hawaii International Conference on System Sciences (2020) 574. H. Xu, A.D. Domínguez-García, P.W. Sauer, Optimal tap setting of voltage regulation transformers using batch reinforcement learning. IEEE Trans. Power Syst. 35(3), 1990–2001 (2020) 575. L. Xu, D. Ph, Flexible ramping products: draft final proposal draft final proposal. California ISO (2012), pp. 1–51 576. Q. Yang, G. Wang, A. Sadeghi, G.B. Giannakis, J. Sun, Two-timescale voltage control in distribution grids using deep reinforcement learning. IEEE Trans. Smart Grid 11(3), 2313– 2323 (2019) 577. B.L. Thayer, T.J. Overbye, Deep reinforcement learning for electric transmission voltage control (2020). Preprint arXiv:2006.06728 578. W. Wang, N. Yu, Y. Gao, J. Shi, Safe off-policy deep reinforcement learning algorithm for volt-var control in power distribution systems. IEEE Trans. Smart Grid PP, 1–1, 12 (2019) 579. C. Li, C. Jin, R. Sharma, Coordination of PV smart inverters using deep reinforcement learning for grid voltage regulation, in 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA) (IEEE, Piscataway, 2019), pp. 1930–1937 580. W. Wang, N. Yu, J. Shi, Y. Gao, Volt-var control in power distribution systems with deep reinforcement learning, in 2019 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm) (2019), pp. 1–7 581. F. Katiraei, M. Iravani, Power management strategies for a microgrid with multiple distributed generation units. IEEE Trans. Power Syst. 21(4), 1821–1831 (2006) 582. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms (2017). Preprint arXiv:1707.06347 583. R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT Press, Cambridge, 1998) 584. J. Schulman, S. Levine, P. Abbeel, M. Jordan, P. Moritz, Trust region policy optimization, in International Conference on Machine Learning (2015), pp. 1889–1897 585. R.C. Dugan, D. Montenegro, The open distribution system simulator (OpenDSS), reference guide, in Electric Power Research Institute (EPRI) (2018) 586. R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments (2020). Preprint arXiv:1706.02275 587. F. Bu, Y. Yuan, Z. Wang, K. Dehghanpour, A. Kimber, A time-series distribution test system based on real utility data, in 2019 North American Power Symposium (NAPS) (2019) 588. R.E. Helou, D. Kalathil, L. Xie, Fully decentralized reinforcement learning-based control of photovoltaics in distribution grids for joint provision of real and reactive power (2020). Preprint arXiv:2008.01231 589. J. Thorp, A. Abur, M. Begovic, J. Giri, R. Avila-Rosales, Gaining a wider perspective. IEEE Power Energy Mag. 6(5), 43–51 (2008) 590. R. Diao, K. Sun, V. Vittal, R. O’Keefe, M. Richardson, N. Bhatt, D. Stradford, S. Sarawgi, Decision tree-based online voltage security assessment using PMU measurements. IEEE Trans. Power Syst. 24(2), 832–839 (2009) 591. Y. Makarov, P. Du, S. Lu, T. Nguyen, X. Guo, J. Burns, J. Gronquist, M. Pai, PMU-based wide-area security assessment: concept, method, and implementation. IEEE Trans. Smart Grid 3(3), 1325–1332 (2012)

References

431

592. Z. Zhong, C. Xu, B. Billian, L. Zhang, S. Tsai, R. Conners, V. Centeno, A. Phadke, Y. Liu, Power system frequency monitoring network (FNET) implementation. IEEE Trans. Power Syst. 20(4), 1914–1921 (2005) 593. L. Wang, J. Burgett, J. Zuo, C. Xu, B. Billian, R. Conners, Y. Liu, Frequency disturbance recorder design and developments, in Proceedings of the IEEE Power Engineering Society General Meeting (IEEE, Piscataway, 2007), pp. 1–7 594. Beijing Sifang Automation Company, Power grid dynamic monitoring and disturbance identification, in North American SynchroPhasor Initiative WorkGroup Meeting, Feb. 2013 (2013) 595. Dept. Energy, U.S., Smart grid investment grant program-progress report. July (2012) [Online]. Available: http://energy.gov/sites/prod/files/Smart%20Grid%20Investment %20Grant%20Program%20-%20Progress%20Report%20July%202012.pdf 596. [Online]. Available: http://www.eia.gov/todayinenergy/detail.cfm?id=5630 597. A. Phadke, R.M. de Moraes, The wide world of wide-area measurement. IEEE Power Energy Mag. 6(5), 52–65 (2008) 598. V. Terzija, G. Valverde, D. Cai, P. Regulski, V. Madani, J. Fitch, S. Skok, M.M. Begovic, A. Phadke, Wide-area monitoring, protection, and control of future electric power networks. Proc. IEEE 99(1), 80–93 (2011) 599. L. Xie, Y. Chen, H. Liao, Distributed online monitoring of quasi-static voltage collapse in multi-area power systems. IEEE Trans. Power Syst. 27(4), 2271–2279 (2012) 600. S. Dasgupta, M. Paramasivam, U. Vaidya, V. Ajjarapu, Real-time monitoring of short-term voltage stability using PMU data. IEEE Trans. Power Syst. (99), 1–10 (2013) 601. J. Jiang, J. Yang, Y. Lin, C. Liu, J. Ma, An adaptive PMU based fault detection/location technique for transmission lines. I. Theory and algorithms. IEEE Trans. Power Del. 15(2), 486–493 (2000) 602. J. Jiang, Y. Lin, J. Yang, T. Too, C. Liu, An adaptive PMU based fault detection/location technique for transmission lines. II. PMU implementation and performance evaluation. IEEE Trans. Power Del. 15(4), 1136–1146 (2000) 603. J.E. Tate, T.J. Overbye, Line outage detection using phasor angle measurements. IEEE Trans. Power Syst. 23(4), 1644–1652 (2008) 604. D. Santos, L. Fabiano, G. Antonova, M. Larsson, The use of synchrophasors for wide area monitoring of electrical power grids, in Actual Trends in Developing Power System Protection Automation (2013) 605. M. Patel, S. Aivaliotis, E. Ellen et al., Real-time application of synchrophasors for improving reliability. North American Elecricity Rel. Corp., Princeton, Princeton, NJ, Technical Report (2010) 606. Y. Chen, L. Xie, P.R. Kumar, Dimensionality reduction and early event detection using online synchrophasor data, in Proceedings of the IEEE Power Engineering Society General Meeting (IEEE, Piscataway, 2013), pp. 1–5 607. N. Dahal, R. King, V. Madani, Online dimension reduction of synchrophasor data, in IEEE PES Transmission and Distribution Conference and Exposition (T&D) (IEEE, Piscataway, 2012), pp. 1–7 608. I. Fodor, A survey of dimension reduction techniques, in Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, vol. 9 (2002), pp. 1–18 609. L. Van der Maaten, E. Postma, H. Van Den Herik, Dimensionality reduction: A comparative review. J. Mach. Learn. Res. 10, 13 (2009) 610. K. Anaparthi, B. Chaudhuri, N. Thornhill, B. Pal, Coherency identification in power systems through principal component analysis. IEEE Trans. Power Syst. 20(3), 1658–1660 (2005) 611. Y. Zhang, Z. Wang, J. Zhang, J. Ma, PCA fault feature extraction in complex electric power systems. Adv. Elect. Comput. Eng. 10(3), 102–107 (2010) 612. Z. Wang, Y. Zhang, J. Zhang, Principal components fault location based on WAMS/PMU measure system, in Proceedings of the IEEE Power Engineering Society General Meeting (IEEE, Piscataway, 2011), pp. 1–5 613. S. Van De Geer, Estimating a regression function. An. Stat. 18, 907–924 (1990)

432

References

614. M.D. Ilic, J. Zaborszky, Dynamics and Control of Large Electric Power Systems (Wiley, Hoboken, 2000) 615. Y. Zhang, Y. Chen, L. Xie, Multi-scale integration and aggregation of power system modules for dynamic security assessment, in Proceedings of the IEEE Power Engineering Society General Meeting (IEEE, Piscataway, 2013), pp. 1–5. 616. C.-T. Chen, Linear System Theory and Design (Oxford University Press, Oxford, 1998) 617. Siemens, PSSE 30.2 program operational manual (2009) 618. J. Tenenbaum, V. De Silva, J. Langford, A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000) 619. S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000) 620. M. Ghorbaniparvar, N. Zhou, A survey on forced oscillations in power system. CoRR abs/1612.04718 (2016) [Online]. Available: http://arxiv.org/abs/1612.04718 621. E.L.S. Maslennikov, B. Wang, Dissipating energy flow method for locating the source of sustained oscillations. Int. J. Electr. Power Energy Syst. 88, 55–62 (2017) 622. S.A.N. Sarmadi, V. Venkatasubramanian, A. Salazar, Analysis of november 29, 2005 Western American oscillation event. IEEE Trans. Power Syst. 31(6), 5210–5211 (2016) 623. S. Maslennikov, Detection the source of forced oscillations. Technical Report [Online]. Available: https://www.naspi.org/node/653 624. S.A.N. Sarmadi, V. Venkatasubramanian, Inter-area resonance in power systems from forced oscillations. IEEE Trans. Power Syst. 31(1), 378–386 (2016) 625. T. Huang, N.M. Freris, P.R. Kumar, L. Xie, Localization of forced oscillations in the power grid under resonance conditions, in 2018 52nd Annual Conference on Information Sciences and Systems (CISS) (2018), pp. 1–5 626. N. Zhou, M. Ghorbaniparvar, S. Akhlaghi, Locating sources of forced oscillations using transfer functions, in 2017 IEEE Power and Energy Conference at Illinois (PECI) (2017), pp. 1–8 627. W. Bin, S. Kai, Location methods of oscillation sources in power systems: a survey. J. Modern Power Syst. Clean Energy 5(2), 151–159 (2017) 628. S.C. Chevalier, P. Vorobev, K. Turitsyn, Using effective generator impedance for forced oscillation source location. IEEE Trans. Power Syst. 33(6), 6264–6277 (2018) 629. S. Chevalier, P. Vorobev, K. Turitsyn, A Bayesian approach to forced oscillation source location given uncertain generator parameters. IEEE Trans. Power Syst. 34, 1–1 (2018) 630. Var-501-wecc-3–power system stabilizer. Technical Report [Online]. Available: https://www. wecc.biz/Reliability/VAR-501-WECC-3.pdf 631. E.J. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis? J. ACM 58(3), 11 (2011) 632. S.H.A.V. Oppenheim, A.S. Willsky, Signal and Systems (Prentice Hall, Hoboken, 1997) 633. H. Ye, Y. Liu, P. Zhang, Z. Du, Analysis and detection of forced oscillation in power system. IEEE Trans. Power Syst. 32(2), 1149–1160 (2017) 634. J.H. Chow, K.W. Cheung, A toolbox for power system dynamics and control engineering education and research. IEEE Trans. Power Syst. 7(4), 1559–1564 (1992) 635. Z. Lin, M. Chen, Y. Ma, The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices (2010). Preprint arXiv:1009.5055 [Online]. Available: https:// arxiv.org/abs/1009.5055 636. J. Follum, J.W. Pierre, Detection of periodic forced oscillations in power systems. IEEE Trans. Power Syst. 31(3), 2423–2433 (2016) 637. J. Follum, J. Pierre, Time-localization of forced oscillations in power systems, in 2015 IEEE Power Energy Society General Meeting (2015), pp. 1–5 638. M. Ilic-Spong, M. Spong, R. Fischl, The no-gain theorem and localized response for the decoupled p -hetapower network with active power losses included. IEEE Trans. Circ. Syst. 32(2), 170–177 (1985) 639. M. Ilic-Spong, J. Thorp, M. Spong, Localized response performance of the decoupled QV network. IEEE Trans. Circ. Syst. 33(3), 316–322 (1986)

References

433

640. N. Wiener, Cybernetics or Control and Communication in the Animal and the Machine, vol. 25 (MIT Press, Cambridge, 1965) 641. S. Maslennikov, B. Wang, Q. Zhang, A. Ma, A. Luo, A. Sun, E. Litvinov, A test cases library for methods locating the sources of sustained oscillations, in 2016 IEEE Power and Energy Society General Meeting (PESGM) (2016), pp. 1–5 642. EPRI, EPRI Power System Dynamic Tutorial, Electric Power Research Institution (2009)

Index

A Advanced metering infrastructure (AMI), 14

B Big data, 1

Electric Reliability Council of Texas (ERCOT), 100 Electric vehicles (EVs), 17, 29 EnergyCoupon, 101 Energy storage, 131 Expectation maximization (EM), 224

C Chance-constrained programming (CCP), 244 Clustering, 46 Coupon incentive-based demand response (CIDR), 102 Critical peak pricing (CPP), 102 Customer segmentation, 46 Customer targeting, 69

F Forced oscillation (FO), 376 Frequency monitoring network (FNET), 351

D Data-driven, 1 Deep neural network (DNN), 205 DeepSolar, 17 Demand-side management (DSM), 12, 222 Demand response (DR), 45, 69, 86, 160, 222 Demand response providers (DRPs), 262 Dimensionality reduction, 352 Distributed energy resources (DERs), 8, 283

H Hidden Markov model (HMM), 224 Hidden semi-Markov model (HSMM), 224 Home energy management systems (HEMS), 222

E Early event detection, 351 Economic dispatch, 243 Electric grid, 1 Electric market, 7 Electric Power Research Institute (EPRI), 283

G Graphical model, 283 Grid modernization initiative (GMI), 310

I Intelligent electronic devices (IEDs), 351 Investor-owned utilities (IOUs), 9

L Locational marginal prices (LMPs), 173 Locational marginal pricing, 250 Long short-term memory (LSTM), 166 Look-ahead economic dispatch (LAED), 244

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Xie et al., Data Science and Applications for Modern Power Systems, Power Electronics and Power Systems, https://doi.org/10.1007/978-3-031-29100-5

435

436 M Machine learning, 3 MATLAB Power System Simulation Package (MATPOWER), 286 Maximum a posteriori (MAP), 309

N Neural network (NN), 205

O OpenDSS, 286 Operational planning, 310 Optimal power flow (OPF), 244

P Phasor measurement unit (PMU), 316, 351 Photovoltaic (PV), 17 Polynomial regression, 205 Principal component analysis (PCA), 352 Probabilistic baseline estimation, 86

R Rare weather event, 127 Real-time market (RTM), 205 Reinforcement learning, 331 Reliability value, 127 Residential appliances, 222 Residential thermal loads, 161 Robust optimization, 244 Robust principal component analysis (RPCA), 377

Index S Scenario approach, 146, 243 Scenario approach-based formulation of LAED (Sc-LAED), 245 Security-constrained economic dispatch (SCED), 173 Self-sufficiency, 128 State estimation (SE), 309 Stochastic knapsack problem (SKP), 71 Storage planning, 146 Supervisory Control and Data Acquisition (SCADA), 7, 13 Support vector machine (SVM), 173 Support vector regression (SVR), 205 System pattern region (SPR), 173

T Time-of-usage (TOU), 102 Topology identification, 285

U Unbalanced distribution grid, 331

V Variational belief propagation (VBP), 311 Voltage regulation, 310

W Wide-area monitoring, protection, and control (WAMPAC), 351 Wind power generation, 205