Platform and Model Design for Responsible AI: Design and build resilient, private, fair, and transparent machine learning models 9781803237077, 1803237074

Craft ethical AI projects with privacy, fairness, and risk assessment features for scalable and distributed systems whil

1,450 72 31MB

English Pages 516

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Platform and Model Design for Responsible AI: Design and build resilient, private, fair, and transparent machine learning models
 9781803237077, 1803237074

Table of contents :
Cover
Title Page
Copyright and Credits
Contributors
Table of Contents
Preface
Part 1: Risk Assessment Machine Learning Frameworks in a Global Landscape
Chapter 1: Risks and Attacks on ML Models
Technical requirements
Discovering risk elements
Strategy risk
Financial risk
Technical risk
People and processes risk
Trust and explainability risk
Compliance and regulatory risk
Exploring risk mitigation strategies with vision, strategy, planning, and metrics
Defining a structured risk identification process
Enterprise-wide controls
Micro-risk management and the reinforcement of controls
Assessing potential impact and loss due to attacks
Discovering different types of attacks
Data phishing privacy attacks
Poisoning attacks
Evasion attacks
Model stealing/extraction
Perturbation attacks
Scaffolding attack
Model inversion
Transfer learning attacks
Summary
Further reading
Chapter 2: The Emergence of Risk-Averse Methodologies and Frameworks
Technical requirements
Analyzing the threat matrix and defense techniques
Researching and planning during the system and model design/architecture phase
Model training and development
ML model live in production
Anonymization and data encryption
Data masking
Data swapping
Data perturbation
Data generalization
K-anonymity
L-diversity
T-closeness
Pseudonymization
Homomorphic encryption
Secure Multi-Party Computation (MPC/SMPC)
Differential Privacy (DP)
Sensitivity
Properties of DP
Hybrid privacy methods and models
Adversarial risk mitigation frameworks
Model robustness
Summary
Further reading
Chapter 3: Regulations and Policies Surrounding Trustworthy AI
Regulations and enforcements under different authorities
Regulations in the European Union
Propositions/acts passed by other countries
Special regulations for children and minority groups
Promoting equality for minority groups
Educational initiatives
International AI initiatives and cooperative actions
Next steps for trustworthy AI
Proposed solutions and improvement areas
Summary
Further reading
Part 2: Building Blocks and Patterns for a Next-Generation AI Ecosystem
Chapter 4: Privacy Management in Big Data and Model Design Pipelines
Technical requirements
Designing privacy-proven pipelines
Big data pipelines
Architecting model design pipelines
Incremental/continual ML training and retraining
Scaling defense pipelines
Enabling differential privacy in scalable architectures
Designing secure microservices
Vault
Cloud security architecture
Developing in a sandbox environment
Managing secrets in cloud orchestration services
Monitoring and threat detection
Summary
Further reading
Chapter 5: ML Pipeline, Model Evaluation, and Handling Uncertainty
Technical requirements
Understanding different components of ML pipelines
ML tasks and algorithms
Uncertainty in ML
Types of uncertainty
Quantifying uncertainty
Uncertainty in regression tasks
Uncertainty in classification tasks
Tools for benchmarking and quantifying uncertainty
The Uncertainty Baselines library
Keras-Uncertainty
Robustness metrics
Summary
References
Chapter 6: Hyperparameter Tuning, MLOps, and AutoML
Technical requirements
Introduction to AutoML
Introducing H2O AutoML
Understanding Amazon SageMaker Autopilot
The need for MLOps
TFX – a scalable end-to-end platform for AI/ML workflows
Understanding Kubeflow
Katib for hyperparameter tuning
Vertex AI
Datasets
Training and experiments in Vertex AI
Vertex AI Workbench
Summary
Further reading
Part 3: Design Patterns for Model Optimization and Life Cycle Management
Chapter 7: Fairness Notions and Fair Data Generation
Technical requirements
Understanding the impact of data on fairness
Real-world bias examples
Causes of bias
Defining fairness
Types of fairness based on statistical metrics
Types of fairness based on the metrics of predicted outcomes
Types of fairness based on similarity-based measures
Types of fairness based on causal reasoning
The role of data audits and quality checks in fairness
Assessing fairness
Linear regression
The variance inflation factor
Mutual information
Significance tests
Evaluating group fairness
Evaluating counterfactual fairness
Best practices
Fair synthetic datasets
MOSTLY AI’s self-supervised fair synthetic data generator
A GAN-based fair synthetic data generator
Summary
Further reading
Chapter 8: Fairness in Model Optimization
Technical requirements
The notion of fairness in ML
Unfairness mitigation methods
In-processing methods
Explicit unfairness mitigation
Fairness constraints for a classification task
Fairness constraints for a regression task
Fairness constraints for a clustering task
Fairness constraints for a reinforcement learning task
Fairness constraints for recommendation systems
Challenges of fairness
Missing sensitive attributes
Multiple sensitive attributes
Choice of fairness measurements
Individual versus group fairness trade-off
Interpretation and fairness
Fairness versus model performance
Limited datasets
Summary
Further reading
Chapter 9: Model Explainability
Technical requirements
Introduction to Explainable AI
Scope of XAI
Challenges in XAI
Explain Like I’m Five (ELI5)
LIME
SHAP
Understanding churn modeling using XAI techniques
Building a model
Using ELI5 to understand classifier models
Hands-on with LIME
SHAP in action
CausalNex
DoWhy for causal inference
DoWhy in action
AI Explainability 360 for interpreting models
Summary
References
Chapter 10: Ethics and Model Governance
Technical requirements
Model Risk Management (MRM)
Types of model inventory management
Cost savings with MRM
A transformative journey with MRM
Model risk tiering
Model risk calibration
Model version control
ModelDB
Weights & Biases
Further reading
Part 4: Implementing an Organization Strategy, Best Practices, and Use Cases
Chapter 11: The Ethics of Model Adaptability
Technical requirements
Adaptability framework for data and model drift
Statistical methods
Statistical process control
Understanding model explainability during concept drift/calibration
Explainability and calibration
Challenges with calibration and fairness
Summary
Further reading
Chapter 12: Building Sustainable Enterprise-Grade AI Platforms
Technical requirements
The key to sustainable enterprise-grade AI platforms
Sustainable solutions with AI as an organizational roadmap
Organizational standards for sustainable frameworks
Sustainability practices and metrics across different cloud platforms
Emission metrics on Google Cloud
Best practices and strategies for carbon-free energy
The energy efficiency of data centers
Carbon emission trackers
The FL carbon calculator
Centralized learning carbon emissions calculator
Adopting sustainable model training and deployment with FL
CO2e emission metrics
Comparing emission factors – centralized learning versus FL
Illustrating how FL works better than centralized learning
The CO2 footprint of FL
How to compensate for equivalent CO2e emissions
Design patterns of FL-based model training
Sustainability in model deployments
Design patterns of FL-based model deployments
Summary
Further reading
Chapter 13: Sustainable Model Life Cycle Management, Feature Stores, and Model Calibration
Sustainable model development practices
Organizational standards for sustainable, trustworthy frameworks
Explainability, privacy, and sustainability in feature stores
Feature store components and functionalities
Feature stores for FL
Exploring model calibration
Determining whether a model is well calibrated
Calibration techniques
Model calibration using scikit-learn
Building sustainable, adaptable systems
Concept drift-aware federated averaging (CDA-FedAvg)
Summary
Further reading
Chapter 14: Industry-Wide Use Cases
Technical requirements
Building ethical AI solutions across industries
Biased chatbots
Ethics in XR/AR/VR
Use cases in retail
Privacy in the retail industry
Fairness in the retail industry
Interpretability – the role of counterfactuals (CFs)
Supply chain use cases
Use cases in BFSI
Deepfakes
Use cases in healthcare
Healthcare system architecture using Google Cloud
Survival analysis for Responsible AI healthcare applications
Summary
Further reading
Index
Other Books You May Enjoy

Citation preview

Platform and Model Design for Responsible AI

Design and build resilient, private, fair, and transparent machine learning models

Amita Kapoor Sharmistha Chatterjee

BIRMINGHAM—MUMBAI

Platform and Model Design for Responsible AI Copyright © 2023 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Associate Group Product Manager: Ali Abidi Senior Editor: Tiksha Lad Technical Editor: Devanshi Ayare Copy Editor: Safis Editing Language Support Editor: Safis Editing Project Coordinator: Farheen Fathima Proofreader: Safis Editing Indexer: Subalakshmi Govindhan Production Designer: Arunkumar Govinda Bhat Marketing Coordinators: Shifa Ansari and Vinishka Kalra First published: April 2023 Production reference: 1250423 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-80323-707-7 www.packtpub.com

To my moral compass, friend, and mentor, Narotam Singh – your guidance, wisdom, and unwavering support have been the beacon of light in the journey of exploring responsible AI. This book is a tribute to the profound impact you have made, not only on this work but also on my life. Thank you for instilling the importance of ethics, integrity, and compassion in the pursuit of shaping a better future for AI and humanity alike. – Amita Kapoor

This book is dedicated to my mother, Anjali Chatterjee, and my late father, Subhas Chatterjee, who as parents provided me with immense encouragement to pursue a career in science, technology, engineering, and mathematics (STEM) and supported me in all the decisions that made me successful. This book is a reflection of my better half, Abhisek Bakshi, who was the first to make me believe that I could author a book. The tremendous support, encouragement, mentorship, and vision he has laid in front of me in my journey in the field of AI deserves special mention. Your intense support and enlightenment have been immensely helpful and shaped my research in the field of responsible AI. The knowledge and wisdom I have gained in this field could not have been possible without your guidance and your support of equal partnership in our marriage. Thanks for shaping my thoughts and making it possible to accomplish this with two little daughters, Aarya and Adrika, aged 5 and 3. Last but not least, I would like to thank the little ones, for giving me space to complete this feat. – Sharmistha Chatterjee

Contributors About the authors Amita Kapoor is an accomplished AI consultant and educator, with over 25 years of experience. She has received international recognition for her work, including the DAAD fellowship and the Intel Developer Mesh AI Innovator Award. She is a highly respected scholar in her field, with over 100 research papers and several best-selling books on deep learning and AI. After teaching for 25 years at the University of Delhi, Amita took early retirement and turned her focus to democratizing AI education. She currently serves as a member of the Board of Directors for the non-profit Neuromatch Academy, fostering greater accessibility to knowledge and resources in the field. Following her retirement, Amita also founded NePeur, a company that provides data analytics and AI consultancy services. In addition, she shares her expertise with a global audience by teaching online classes on data science and AI at the University of Oxford. I would like to express my deepest gratitude to a number of people whose support and contributions have been invaluable in the creation of this book. First and foremost, I extend my heartfelt thanks to Ajit Jaokar, whose continuous support and stimulating discussions have been instrumental in shaping the ideas presented in this work. I am immensely grateful to my co-author, Sharmistha, for her persistence and unwavering dedication. I also want to extend my sincere appreciation to the entire Packt team, who have been integral in bringing this book to life. I would like to thank the reviewers for their critical and constructive suggestions. Special mention goes to Ali, David, Kirti, and Tiksha, whose editorial expertise and hard work have been essential in refining and polishing this manuscript. Without the collaborative efforts and support of these remarkable individuals, this book would not have been possible. Thank you for being a part of this incredible journey.

Sharmistha Chatterjee is an evangelist in the field of machine learning (ML) and cloud applications, currently working in the BFSI industry at the Commonwealth Bank of Australia in the data and analytics space. She has worked in Fortune 500 companies, as well as in early-stage start-ups. She became an advocate for responsible AI during her tenure at Publicis Sapient, where she led the digital transformation of clients across industry verticals. She is an international speaker at various tech conferences and a 2X Google Developer Expert in ML and Google Cloud. She has won multiple awards and has been listed in 40 under 40 data scientists by Analytics India Magazine (AIM) and 21 tech trailblazers in 2021 by Google. She has been involved in responsible AI initiatives led by Nasscom and as part of their DeepTech Club. I would like to express my heartfelt gratitude to a number of people whose constant support and mentorship have led to the co-authoring of this book. In the first place, I would like to extend my wholehearted thankfulness to my friend, philosopher, and mentor Dr Sushmita Gupta and her husband, Dr Saket Saurabh, both from the Institute of Mathematical Sciences, Chennai, and Dr Fahad Panolan from IIT Hyderabad, who have instilled an interest in researching and designing fairness algorithms. In addition, I am honored and immensely grateful to my co-author, Dr Amita Kapoor, who has given me an opportunity to collaborate with her, listen to my ideas patiently, give timely feedback, and reshape several ideas presented in the book. I would also like to extend my gratitude to Publicis Sapient and my mentor, Roopa Hungund, for their support in researching and co-authoring this book. The book is a manifestation of the unwavering support I have received from the entire Packt team, who have worked tirelessly to stitch the pieces together to bring a coherent story to this book. Special mention goes to Ali, David, Kirti, and Tiksha, whose support, editorial expertise, and hard work have been essential in refining and polishing this book. Without the collaborative efforts, encouragement, and guidance of these remarkable individuals, this book would not have been possible. Thank you so much for being a part of this wonderful journey.

About the reviewers Usha Rengaraju currently heads the data science research at Exa Protocol and is the world’s first women triple Kaggle Grandmaster. She specializes in deep learning and probabilistic graphical models and was also one of the judges of TigerGraph’s Graph for All Million Dollar Challenge. She was ranked in the top 10 data scientists in India by Analytics India Magazine and in the top 150 AI leaders and influencers by 3AI magazine. She is one of the winners of the ML in Action competition organized by the ML developer programs team at Google, and her team won first place in the WiDS 2022 Datathon organized by Stanford University. She is also the winner of the 2022 Kaggle ML Research Spotlight and the 2023 TensorFlow Community Spotlight.

Jeremy Abel has worked professionally in the AI/ML space for several years within the financial services industry, starting at the Bank of America in capital market analytics, via Wells Fargo in fraud prevention, to Ally Financial, where he currently leads the AI platform and ML engineering teams. He has a passion for solving problems to make room for more problems, believing that AI and ML can be leveraged to solve the problems of today, giving us room to think about the more complex problems of tomorrow. He is a firm believer that the application of AI is key to solving our world’s greatest challenges in a variety of sectors, but to do so effectively, we must approach it ethically and responsibly, starting with open conversation.

Sathyan Sethumadhavan works as an AI/ML strategist/architect with Thoughtworks. His expertise includes assessing enterprises for AI readiness, building AI/ML Centers of Excellence (CoEs) for large-scale enterprises, and building and leading data engineering and data science teams. He has led several large-scale AI platform implementations, building digital public goods and transformations for India’s public sector, using an on-premises and open source stack setup. He is also a thought leader in AI operationalization subjects and advises companies on increasing ROI, using value engineering frameworks, AI-powered decision factories (active learning and reinforcement learning), analyticsdriven innovation, data as a product, data mesh/fabrics, MLOps, ML engineering, and ModelOps.

Table of Contents Prefacexv

Part 1: Risk Assessment Machine Learning Frameworks in a Global Landscape

1 Risks and Attacks on ML Models

3

Technical requirements Discovering risk elements

3 4

Strategy risk Financial risk Technical risk People and processes risk Trust and explainability risk Compliance and regulatory risk

5 6 6 7 7 8

Exploring risk mitigation strategies with vision, strategy, planning, and metrics8 Defining a structured risk identification process9 Enterprise-wide controls 9 Micro-risk management and the reinforcement of controls 9

Assessing potential impact and loss due to attacks

23

Discovering different types of attacks 25 Data phishing privacy attacks Poisoning attacks Evasion attacks Model stealing/extraction Perturbation attacks Scaffolding attack Model inversion Transfer learning attacks

26 27 27 30 31 34 35 37

Summary41 Further reading 42

viii

Table of Contents

2 The Emergence of Risk-Averse Methodologies and Frameworks Technical requirements Analyzing the threat matrix and defense techniques

46 46

45

Pseudonymization71 Homomorphic encryption 73 Secure Multi-Party Computation (MPC/SMPC)76

Researching and planning during the system and model design/architecture phase Model training and development ML model live in production

46 56 60

Differential Privacy (DP)

Anonymization and data encryption

65

Hybrid privacy methods and models 81 Adversarial risk mitigation frameworks83

Data masking 66 Data swapping 67 Data perturbation 67 Data generalization 67 K-anonymity68 L-diversity69 T-closeness70

78

Sensitivity81 Properties of DP 81

Model robustness

88

Summary93 Further reading 94

3 Regulations and Policies Surrounding Trustworthy AI Regulations and enforcements under different authorities

97

Regulations in the European Union Propositions/acts passed by other countries

98 99

Special regulations for children and minority groups Promoting equality for minority groups

Educational initiatives 110 International AI initiatives and cooperative actions114

Next steps for trustworthy AI Proposed solutions and improvement areas

109 110

97

116 120

Summary121 Further reading 122

Table of Contents

Part 2: Building Blocks and Patterns for a Next-Generation AI Ecosystem

4 Privacy Management in Big Data and Model Design Pipelines

127

Technical requirements 127 Designing privacy-proven pipelines 128

Vault145

Big data pipelines 128 Architecting model design pipelines 131 Incremental/continual ML training and retraining134 Scaling defense pipelines 138 Enabling differential privacy in scalable architectures139

Developing in a sandbox environment 152 Managing secrets in cloud orchestration services156

Designing secure microservices

Cloud security architecture

151

Monitoring and threat detection 156 Summary158 Further reading 159

144

5 ML Pipeline, Model Evaluation, and Handling Uncertainty Technical requirements Understanding different components of ML pipelines ML tasks and algorithms Uncertainty in ML Types of uncertainty Quantifying uncertainty

Uncertainty in regression tasks

161 162 163 166 167 169

170

Uncertainty in classification tasks Tools for benchmarking and quantifying uncertainty

161 179 181

The Uncertainty Baselines library 182 Keras-Uncertainty182 Robustness metrics 182

Summary183 References183

ix

x

Table of Contents

6 Hyperparameter Tuning, MLOps, and AutoML Technical requirements 186 Introduction to AutoML 186 Introducing H2O AutoML 189 Understanding Amazon SageMaker Autopilot193 The need for MLOps 194 TFX – a scalable end-to-end platform for AI/ML workflows 199

Understanding Kubeflow Katib for hyperparameter tuning Vertex AI

185 200 203 205

Datasets206 Training and experiments in Vertex AI 208 Vertex AI Workbench 211

Summary212 Further reading 213

Part 3: Design Patterns for Model Optimization and Life Cycle Management

7 Fairness Notions and Fair Data Generation

217

Technical requirements 217 Understanding the impact of data on fairness218 Real-world bias examples Causes of bias

Defining fairness

218 221

223

Types of fairness based on statistical metrics 224 Types of fairness based on the metrics of predicted outcomes 228 Types of fairness based on similarity-based measures233 Types of fairness based on causal reasoning 235

The role of data audits and quality checks in fairness Assessing fairness

236 237

Linear regression The variance inflation factor Mutual information Significance tests Evaluating group fairness Evaluating counterfactual fairness Best practices

Fair synthetic datasets MOSTLY AI’s self-supervised fair synthetic data generator A GAN-based fair synthetic data generator

239 240 241 241 242 246 249

250 250 257

Summary262 Further reading 262

Table of Contents

8 Fairness in Model Optimization Technical requirements The notion of fairness in ML Unfairness mitigation methods In-processing methods

Explicit unfairness mitigation Fairness constraints for a classification task Fairness constraints for a regression task Fairness constraints for a clustering task Fairness constraints for a reinforcement learning task

266 266 268 268

270 270 276 278 280

265 Fairness constraints for recommendation systems282

Challenges of fairness Missing sensitive attributes Multiple sensitive attributes Choice of fairness measurements Individual versus group fairness trade-off Interpretation and fairness Fairness versus model performance Limited datasets

285 285 285 285 285 286 286 286

Summary287 Further reading 287

9 Model Explainability Technical requirements Introduction to Explainable AI Scope of XAI Challenges in XAI

289 290 290 293 293

Explain Like I’m Five (ELI5) 295 LIME297 SHAP298

Understanding churn modeling using XAI techniques Building a model Using ELI5 to understand classifier models Hands-on with LIME SHAP in action

299 300 304 306 307

CausalNex310 DoWhy for causal inference 311 DoWhy in action

312

AI Explainability 360 for interpreting models316 Summary317 References318

xi

xii

Table of Contents

10 Ethics and Model Governance Technical requirements Model Risk Management (MRM) Types of model inventory management Cost savings with MRM A transformative journey with MRM Model risk tiering

319 319 320 320 321 324 325

Model risk calibration

330

Model version control

332

ModelDB332 Weights & Biases 336

Further reading

345

Part 4: Implementing an Organization Strategy, Best Practices, and Use Cases

11 The Ethics of Model Adaptability Technical requirements Adaptability framework for data and model drift Statistical methods Statistical process control

349

349

Understanding model explainability during concept drift/calibration 374

350

Explainability and calibration Challenges with calibration and fairness

354 362

379 381

Summary382 Further reading 382

12 Building Sustainable Enterprise-Grade AI Platforms Technical requirements The key to sustainable enterprise-grade AI platforms Sustainable solutions with AI as an organizational roadmap Organizational standards for sustainable frameworks

383

Sustainability practices and metrics across different cloud platforms

384

Emission metrics on Google Cloud Best practices and strategies for carbon-free energy The energy efficiency of data centers

384 385

383 385 386 386 387

Table of Contents

Carbon emission trackers

388

The FL carbon calculator 388 Centralized learning carbon emissions calculator390

Adopting sustainable model training and deployment with FL 391 CO2e emission metrics Comparing emission factors – centralized learning versus FL Illustrating how FL works better than

391 392

centralized learning 393 The CO2 footprint of FL 394 How to compensate for equivalent CO2e emissions 396 Design patterns of FL-based model training 396 Sustainability in model deployments 400 Design patterns of FL-based model deployments402

Summary409 Further reading 409

13 Sustainable Model Life Cycle Management, Feature Stores, and Model Calibration Sustainable model development practices412 Organizational standards for sustainable, trustworthy frameworks

Explainability, privacy, and sustainability in feature stores

412

416

Feature store components and functionalities416 Feature stores for FL 419

Exploring model calibration

422

Determining whether a model is well calibrated Calibration techniques Model calibration using scikit-learn

411 423 424 425

Building sustainable, adaptable systems430 Concept drift-aware federated averaging (CDA-FedAvg)433

Summary435 Further reading 436

14 Industry-Wide Use Cases

437

Technical requirements 437 Building ethical AI solutions across industries438 Biased chatbots Ethics in XR/AR/VR

438 439

Use cases in retail Privacy in the retail industry Fairness in the retail industry Interpretability – the role of counterfactuals (CFs)

440 440 441 443

xiii

xiv

Table of Contents

Supply chain use cases

452

Use cases in BFSI 456 Deepfakes460

Use cases in healthcare Healthcare system architecture using Google Cloud

461

Survival analysis for Responsible AI healthcare applications

463

Summary466 Further reading 467

462

Index469 Other Books You May Enjoy

490

Preface Artificial intelligence (AI) has come a long way since its inception, transforming from a futuristic concept into a ubiquitous technology that permeates every aspect of our lives. From healthcare and finance to decision-making processes in both the public and private sectors, AI systems have become integral to our daily existence. As AI-powered applications such as ChatGPT become essential tools for individuals and businesses alike, it is of utmost importance that we address the ethical, social, and technical challenges that accompany this progress. The motivation behind this book is rooted in our belief that now, more than ever, we must lay the groundwork for a future where AI serves as a force for good. As AI continues to shape our world, this book seeks to provide AI engineers, business leaders, policymakers, and other stakeholders with comprehensive guidance on the development and implementation of responsible, trustworthy AI systems. In this comprehensive book, we will explore various facets of Responsible AI, including the vulnerabilities of Machine Learning (ML) models, susceptibility to adversarial attacks, and the importance of robust security measures. We will delve into risk-averse methodologies that prioritize safety and reliability, minimizing potential harm and unintended consequences. The book examines policy frameworks and strategies adopted by various countries to ensure ethical AI development and deployment, as well as the crucial aspects of data privacy, with techniques and best practices to protect user information and maintain trust in AI systems. Additionally, we will cover approaches to AI model evaluation, uncertainty, and validation; the roles of MLOps and AutoML in fostering efficient, scalable, and responsible AI practices in enterprise settings; and the importance of fairness in AI, addressing challenges in data collection, preprocessing, and model optimization to reduce biases and ensure equitable outcomes. We will also discuss the need for transparency and explainability in AI systems, ethical governance, and oversight, and cover techniques to build adaptable, calibrated AI models that can respond effectively to changing environments and requirements. Moreover, we will delve into the concept of sustainable feature stores to promote efficiency and consistency in the development of responsible AI models and present real-world case studies and applications, demonstrating the impact and benefits of responsible AI across various industries. This book aims to serve as a comprehensive resource for those seeking to harness the power of AI while addressing the critical ethical and social challenges it presents. We hope this book inspires you to join the movement toward responsible AI and apply its principles and practices in your own professional and personal endeavors.

xvi

Preface

Who this book is for This book is for experienced ML professionals looking to understand the risks and data leakages of ML models and frameworks, incorporate fairness by design in both models and platforms, and learn how to develop and use reusable components to reduce effort and cost when setting up and maintaining an AI ecosystem.

What this book covers Chapter 1, Risks and Attacks on ML Models, presents a detailed overview of key terms related to different types of attacks possible on ML models, creating a basic understanding of how ML attacks are designed by attackers. In this chapter, you will get familiar with the attacks, both direct and indirect, that compromise the privacy of a system. In this context, this chapter highlights losses incurred by organizations due to the loss of sensitive information and how individuals remain vulnerable to losing confidential information into the hands of adversaries. Chapter 2, The Emergence of Risk-Averse Methodologies and Frameworks, presents an overall detailed overview of risk assessment frameworks, tools, and methodologies that can be directly applied to evaluate model risk. In this chapter, you will get familiar with the tools included in data platforms and model design techniques that will help to reduce the risk at scale. The primary objective of this chapter is to create awareness of data anonymization and validation techniques, in addition to the introduction of different terms and measures related to privacy. Chapter 3, Regulations and Policies Surrounding Trustworthy AI, introduces different laws being passed across nations to protect and prevent the loss of sensitive information of customers. You will get to know the formation of different ethics expert groups, government initiatives, and policies being drafted to ensure the ethics and compliance of all AI solutions. Chapter 4, Privacy Management in Big Data and Model Design Pipelines, presents a detailed overview of different components associated with a big data system, which serves as a building block atop which we can effectively deploy AI models. This chapter brings into the picture how compliancerelated issues can be handled at a component level in a microservice-based architecture so that there is no information leakage. In this chapter, you get familiar with different security principles needed in individual microservices, as well as security measures that need to be incorporated in the cloud when deploying ML models at scale. Chapter 5, ML Pipeline, Model Evaluation, and Handling Uncertainty, introduces the AI/ML workflow. The chapter then delves into different ML algorithms used for classification, regression, generation, and reinforcement learning. The chapter also discusses issues related to the reliability and trustworthiness of these algorithms. We start by introducing the various components of an ML pipeline. The chapter then briefly explores the important AI/ML algorithms for the tasks of classification, regression, and clustering. Further, we discuss various types of uncertainties, their causes, and the techniques to quantify uncertainty.

Preface

Chapter 6, Hyperparameter Tuning, MLOPs, and AutoML, continues from the previous chapter and explains the need for continuous training in an ML pipeline. Building an ML model is an iterative process, and the presence of so many models, each with a large number of hyperparameters, complicates things for beginners. This chapter provides a glimpse into the present AutoML options for your ML workflow. It expands on the situations where no-code/low-code solutions are useful. It explores the solutions provided by major cloud providers in terms of ease, features, and model explainability. Additionally, the chapter also covers orchestration tools, such as Kubeflow and Vertex AI, to manage the continuous training and deployment of your ML models. Chapter 7, Fairness Notions and Fair Data Generation, presents problems pertaining to unfair data collection for different types of data, ontologies, vocabularies, and so on, due to the lack of standardization. The primary objective of this chapter is to stress the importance of the quality of data, as biased datasets can introduce hidden biases in ML models. This chapter focuses on the guiding principles for better data collection, management, and stewardship that need to be practiced globally. You will further see how evaluation strategies initial steps can help to build unbiased datasets, enabling new AI analytics and digital transformation journeys for ML-based predictions. Chapter 8, Fairness in Model Optimization, presents different optimization constraints and techniques that are essential to optimize and obtain fair ML models. The focus of this chapter is to enlighten you with different, new customized optimizers, unveiled by research, that can serve to build supervised, unsupervised, and semi-supervised fair ML models. The chapter, in a broader sense, prepares you with the foundational steps to create and define model constraints that can be used by different optimizers during the training process. You will also gain an understanding of how to evaluate such constraint-based models with proper metrics and the extra training overheads incurred during the optimization techniques, which will enable the models to design their own algorithms. Chapter 9, Model Explainability, introduces you to different methods that can be used to unravel the mystery of black boxes in ML models. We will talk about the need to be able to explain a model prediction. This chapter covers various algorithms and techniques, such as SHAP and LIME, to add an explainability component to existing models. We will explore the libraries, such as DoWhy and CausalNex, to see the explainability features available to an end user. We will also delve into the explainability features provided by Vertex AI, SageMaker, and H2O.ai. Chapter 10, Ethics and Model Governance, emphasizes the ethical governance processes that need to be established with models in production, for quick identification of all risks related to the development and deployment of a model. This chapter also covers best practices for monitoring all models, including those in an inventory. You will get more insights into the practical nuances of risks that emerge in different phases of a model life cycle and how these risks can be mitigated when models reside in the inventory. Here, you will also understand the different risk classification procedures and how they can help minimize the business loss resulting from low-performance models. Further, you will also get detailed insights into how to establish proper governance in data aggregation, iterative rounds of model training, and the hyperparameter tuning process.

xvii

xviii

Preface

Chapter 11, The Ethics of Model Adaptability, focuses on establishing ethical governance processes for models in production, with the aim of quickly detecting any signs of model failure or bias in output predictions. By reading this chapter, you will gain a deeper understanding of the practical details involved in monitoring the performance of models and contextual model predictions, by reviewing the data constantly and benchmarking against the past in order to draft proper actionable short-term and long-term plans. Further, you will also get a detailed understanding of the conditions leading to model retraining and the importance of having a perfectly calibrated model. This chapter also highlights the trade-offs associated with fairness and model calibration. Chapter 12, Building Sustainable Enterprise-Grade AI Platforms, focuses on how organizational goals, initiatives, and support from leadership can enable us to build sustainable ethical AI platforms. The goal of this chapter is to stress the importance of organizations contextualizing and linking ethical AI principles to reflect the local values, human rights, social norms, and behaviors of the community in which the solutions operate. In this context, the chapter highlights the impact of large-scale AI solutions on the environment and the right procedures that need to be incorporated for model training and deployment, using federated learning. This chapter further delves into important concepts that strongly emphasize the need to stay socially responsible, as well as being able to design software, models, and platforms. Chapter 13, Sustainable Model Life Cycle Management, Feature Stores, and Model Calibration, explores the best practices that need to be followed during the model development life cycle, which can lead to the creation of sustainable feature stores. In this chapter, we will highlight the importance of implementing privacy so that reusing stores and collaboration among teams are maximized, without compromising security and privacy aspects. This chapter further provides a deep dive into different model calibration techniques, which are essential in building scalable sustainable ML platforms. Here, you will also understand how to design adaptable feature stores and how best we can incorporate monitoring and governance in federated learning. Chapter 14, Industry-Wide Use Cases, presents a detailed overview of the different use cases across various industries. The primary aim of this is to inform readers coming from different industry domains on how ethics and compliance can be integrated into their systems, in order to build a fair and equitable AI system and win the confidence and trust of end users. You will also get a chance to apply algorithms and tools studied in previous chapters to different business problems. Further, you will gain an understanding of how ethical design patterns can be reused across different industry domains.

To get the most out of this book Each chapter has different requirements, which have been specified in their respective chapters. You should have basic knowledge of ML, Python, scikit-learn, PyTorch, and TensorFlow to better understand the concepts of this book.

Preface

Download the example code files You can download the example code files for this book from GitHub at https://github.com/ PacktPublishing/Platform-and-Model-Design-for-Responsible-AI. If there’s an update to the code, it will be updated in the GitHub repository. We also have other code bundles from our rich catalog of books and videos available at https:// github.com/PacktPublishing/. Check them out!

Conventions used There are a number of text conventions used throughout this book. Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Atlas offers great flexibility in dynamically creating classifications, such as PII, EXPIRES_ON, DATA_QUALITY, and SENSITIVE, with support for the expiry_date attribute in the EXPIRES_ON classification.” A block of code is set as follows: model.compile(optimizer='rmsprop', loss=aleatoric_loss, metrics=['mae'])

Any command-line input or output is written as follows: roc_auc_score(y_test, y_pred_uncal) >>> 0. 9185432154389126

Bold: Indicates a new term, an important word, or words that you see on screen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “Moreover, we can see the sequential security controls that we can follow to enhance our security stack by going to RBAC | Policy Management | Discovery | Settings | Real-Time Controls. Tips or important notes Appear like this.

Get in touch Feedback from our readers is always welcome. General feedback: If you have questions about any aspect of this book, email us at customercare@ packtpub.com and mention the book title in the subject of your message.

xix

xx

Preface

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form. Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material. If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts Once you’ve read Platform and Model Design for Responsible AI, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback. Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Preface

Download a free PDF copy of this book Thanks for purchasing this book! Do you like to read on the go but are unable to carry your print books everywhere? Is your eBook purchase not compatible with the device of your choice? Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost. Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.  The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily Follow these simple steps to get the benefits: 1. Scan the QR code or visit the link below

https://packt.link/free-ebook/9781803237077 2. Submit your proof of purchase 3. That’s it! We’ll send your free PDF and other benefits to your email directly

xxi

Part 1: Risk Assessment Machine Learning Frameworks in a Global Landscape

This part provides a detailed introduction to the risks, threats, and challenges that machine learning models in production are vulnerable to. In this part, you will learn about different types of attacks that can be carried out by adversaries and the importance of protecting your models from such attacks. This part also covers the guidelines and standards set by different committees across the world, to facilitate various actions and initiatives at both a national and organizational level. This part is made up of the following chapters: • Chapter 1, Risks and Attacks on ML Models • Chapter 2, The Emergence of Risk-Averse Methodologies and Frameworks • Chapter 3, Regulations and Policies Surrounding Trustworthy AI

1 Risks and Attacks on ML Models This chapter gives a detailed overview of defining and evaluating a Machine Learning (ML) risk framework from the instant an organization plans to embark on AI digital transformation. Risks may come in different stages, such as when the strategic or financial planning kicks in or during several of the execution phases. Risks start surfacing with the onset of technical implementations and continue up to testing phases when the AI use case is served to customers. Risk quantification can be attained through different metrics, which can certify the system behavior (amount of robustness and resiliency) against risks. In the process of understanding risk evaluation techniques, you will also get a thorough understanding of attacks and threats to ML models. In this context, you will discover different components of the system having security or privacy bottlenecks that pose external threats and make the model open to vulnerabilities. You will get to know the financial losses and business impacts when models deployed in production are not risk and threat resilient. In this chapter, these topics will be covered in the following sections: • Discovering risk elements • Exploring risk mitigation strategies with vision, strategy, planning, and metrics • Assessing potential impact and loss due to attacks • Discovering different types of attacks Further, with the use of Adversarial Robustness Toolbox (ART) and AIJack, we will see how to design attacks for ML models.

Technical requirements This chapter requires you to have Python 3.8 along with some necessary Python packages, as follows. The commands to install ART and AIJack are also listed here: • Keras 2.7.0, TensorFlow 2.7.0 • pip install adversarial-robustness-toolbox • pip install git+https://github.com/Koukyosyumei/AIJack

4

Risks and Attacks on ML Models

Discovering risk elements With rapid digitization and AI adoption, more and more organizations are becoming aware of the unintended consequences of malicious AI adoption practices. These can impact not only the organization’s reputation and long-term business outcomes but also the business’ customers and society at large. Here, let us look at the different risk elements involved in an AI digitization journey that CXOs, leadership teams, and technical and operational teams should be aware of. The purpose of these associated teams is one and the same: to avoid any of their systems getting compromised, or any security/privacy violations that could yield discrimination, accidents, the manipulation of political systems, or the loss of human life.

Figure 1.1 – A diagram showing the AI risk framework

There are three principal elements that govern the risk framework: • Planning and execution: This phase ideally covers all stages in product development, that is, the conceptualization of the AI use case, financial planning, execution, including the technical execution, and the design and release of the final product/solution from an initial Minimum Viable Product (MVP). • People and processes: This is the most crucial factor as far as delivery timelines are concerned with respect to an MVP or a final product/solution. Leadership should have a clear vision and guidelines put in place so that research, technical, QA, and other operational teams find it easy to execute data and ML processes following defined protocols and standards.

Discovering risk elements

• Acceptance: This phase involves several rounds of audits and confirmations to validate all steps of technical model design and deployment. This process adheres to extra confirmatory guidelines and laws in place to cautiously review and explain AI/ML model outcomes with due respect to user fairness and privacy to protect users’ confidential information. Let’s drill down into the components of each of these elements.

Strategy risk On the strategic front, there should be a prior Strengths, Weaknesses, Opportunities, and Threats (SWOT) analysis done on business use cases requiring digital AI transformations. The CXOs and leadership team must identify the right business use case after doing an impact versus effort analysis and formulate the guidelines and a list of coherent actions needed for execution. The absence of this might set infeasible initiatives that are not aligned with the organization’s business goals, causing financial loss and solutions failing. Figure 1.2 illustrates how a specific industry (say, retail) can classify different use cases based on a value-effort framework.

Figure 1.2 – A value-effort framework

If the guidelines and actions are not set properly, then AI systems can harm individuals, society, and organizations. The following are some examples: • AI-powered autonomous vehicles can often malfunction, which can lead to injury or death. • Over-reliance on inadequate equipment and insufficient monitoring mean predictive maintenance tasks can lead to worker injury.

5

6

Risks and Attacks on ML Models

• ML models misdiagnose medical conditions. • Political disruption by manipulating national institutional processes (for example, elections or appointments) by misrepresenting information. • Data breaches can expose confidential military locations or technical secrets. • Infrastructure disruption or misuse by intelligent systems (for example, GPS routing cars through different streets often increases traffic flow in residential areas).

Financial risk The executive team should understand the finances involved in sponsoring an AI development project right from its inception to all stages of its development. Financial planning should not only consider the cost involved in hiring and retaining top talent but also the costs associated with infrastructure (cloud, containers, GPUs, and so on), data governance, and management tools. In addition, the financial roadmap should also specify the compliance necessary in big data and model deployment management as the risks and penalties can be huge in case of any violations.

Technical risk The risk associated on the technical front can manifest from the point when the data is ingested into the system. Data quality and the suitability of representation formats can seriously violate regulations (Derisking machine learning and artificial intelligence: https://www.mckinsey. com/business-functions/risk-and-resilience/our-insights/deriskingmachine-learning-and-artificial-intelligence). Along with a skilled data science and big data team, what is needed is the availability and awareness of modern tools and practices that can detect and alert issues related to data or model quality and drifts and take timely remedial action.

Figure 1.3 – A diagram showing risk management controls

Discovering risk elements

Figure 1.3 illustrates different risk elements that can cause security breaches or theft of confidential information. The different components (data aggregation, preprocessing, model development, deployment, and model serving) of a real-time AI pipeline must be properly designed, monitored (for AI drift, bias, changes in the characteristics of the retraining population, circuit breakers, and fallback options), and audited before running it in production. Along with this, risk assessment also includes how AI/ML models are identified, classified, and inventoried, with due consideration of how they are trained (for example, considering data type, vendor/open source libraries/code, third-party/vendor code updates and maintenance practices, and online retraining) and served to customers.

People and processes risk The foremost objective of leadership and executive teams is to foster innovation and encourage an open culture where teams can collaborate, innovate, and thrive. When technical teams are proactive in bringing in automations in MLOps pipelines, many problems can be foreseen, and prompt measures can be taken to bridge the gaps through knowledge-sharing sessions.

Trust and explainability risk Businesses remain reluctant to adopt AI-powered applications when the results of the model cannot be explained. Some of the unexplainable results can be attributed to the poor performance of the model for a selected customer segment or during a specific period (for example, many business predictions were affected by the outbreak of COVID-19). The opaqueness of the model – a lack of explanation of the results – causes fear when businesses or customers find there is a lack of incentive alignment or severe disruption to people’s workflows or daily routines. ML models answering questions about the behavior of the model raises stakeholder confidence. In addition to deploying an optimized model that can give the right predictions with minimal delay, the model should also be able to explain the factors that affect the decisions it makes. However, it’s up to the ML/AI practitioners to use their judgment and analysis to apply the right ML models and explainability tools to derive the factors contributing to the model’s behavior. Now, let us see – with an example – how explainability can aid in studying medical images. Deep Neural Networks (DNNs) may be computationally hard to explain, but significant research is taking place into the explainability of DNNs as well. One such example involves Explainable Artificial Intelligence (XAI), used on pretrained deep learning neural networks (AlexNet, SqueezeNet, ResNet50, and VGG16), which has been successful in explaining critical regions that are affected by Barrett’s esophagus using related data by comparing classification rates. The comparative results can detect early stages of cancer and distinguish Barrett’s esophagus (https://www.sciencedirect. com/science/article/pii/S0010482521003723) from adenocarcinoma. However, it remains up to the data scientist to decide how best to explain the use of their models, by selecting the right data and number of data points, based on the type of the problem.

7

8

Risks and Attacks on ML Models

Compliance and regulatory risk There are different privacy laws and regulations that have been set forth by different nations and governing agencies that impose penalties on organizations in case of violations. Some of the most common privacy rules include the European Union’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). The financial and healthcare sectors have already seen laws formulated to prevent bias and allow fair treatment. Adhering to compliance necessitates extra planning for risk management through audits and human monitoring. Apart from country-specific regulatory laws and guidance, regulators will likely rely on existing guidance in SR 11-7/OCC 2011-12 to assess the risks of AI/ML applications.

Ethical risk AI/ML models should go through proper validations and A/B testing to verify their compliance and fairness across different sections of the population, including people of varying genders and diverse racial and ethical backgrounds. For example, credit scoring and insurance models have historically been biased against racial minorities and discrimination-based lending decisions have resulted in litigation. To make AI/ML models ethical, legal, and risk-free, it is inevitable for any organization and the executive team to have to ascertain the impact of the AI solution and service being rolled out in the market. This includes the inclusion of highly competent AI ethics personnel in the process who have regulatory oversight, and ensuring adherence to protocols and controls for risk mitigation to make sure the entire AI solution is robust and less attractive to attackers. Such practices can not only add extra layers of security to anonymize individual identity but also remove any bias present in legacy systems. Now let us see what kinds of enterprise-grade initiatives are essential for inclusion in the AI development process.

Exploring risk mitigation strategies with vision, strategy, planning, and metrics After seeing the elements of risk in different stages of the AI transformation journey, now let us walk through the different enterprise risk mitigation plans, measures, and metrics. In later chapters, we will not only discover risks related to ML model design, development, and deployment but also get to know how policies put in place by executive leadership teams are important in designing systems that are compliant with country-specific regulatory laws. Timely review, awareness, and support in the risk identification process can save organizations from unexpected financial losses.

Exploring risk mitigation strategies with vision, strategy, planning, and metrics

Defining a structured risk identification process The long-term mission and short-term goals can only be achieved when business leaders, IT, security, and risk management teams align to evaluate a company’s existing risks, and whether they are affecting the upcoming AI-driven analytics solution. Such an effort, led by one of the largest European bank's COOs, helped to identify biased product recommendations. If left unchecked, it could have led to financial loss, regulatory fines, and disgrace, impacting the organization’s reputation and causing a loss of customers and a backlash. This effort may vary from industry to industry. For example, the food and beverage industry needs to concentrate on risks related to contaminated products, while the healthcare industry needs to pay special attention to refrain from the misdiagnosis of patients and protect their sensitive health data.

Enterprise-wide controls Effective controls and techniques are structured around the incorporation of strong policies, worker training, contingency plans, and the redefinition of business rules and objectives that can be put into practice. These policies translate to specified standards and guidelines requiring human intervention as and when needed. For example, the European bank had to adopt flexibility in deciding how to handle specific customer cases when the customer’s financial or physical health was impacted: https:// www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/ confronting-the-risks-of-artificial-intelligence. In such cases, relationship managers had to intervene to offer suitable recommendations to help them to move on with the death/loss of a family member. Similarly, the healthcare industry needs the intervention of doctors and healthcare experts to adopt different active learning strategies to learn about rare diseases and their symptoms. Control measures necessitate the application of different open source or custom-built tools that can mitigate the risks of SaaS-based platforms and services, protect groups from potential discrimination, and ensure compliance with GDPR.

Micro-risk management and the reinforcement of controls The tools and techniques put into practice will vary based on the phase of the ML life cycle. Attacks and threats are much too specific to input data, feature engineering, model training, deployment, and the way the model is served to its customers. Hence it is essential to design and evaluate any ML model against a threat matrix (more details on threat matrices will be discussed in Chapter 2). The most important factors that must be taken into consideration are the model's objective, optimization function, mode of learning (centralized versus federated), human-to-machine (or machine-to-machine) interaction, environmental factors (for designing policies and rewards in the case of reinforcement learning), feedback, retraining, and deployment. These factors, along with the model design and its explainability, will push organizations to go for a more transparent and explainable ML model and remove ML models that are overly complex, opaque, and unexplainable. The threat matrix can safeguard ML models in deployment by not only evaluating model performance but also testing models for adversarial attacks and other external factors that cause ML models to drift.

9

10

Risks and Attacks on ML Models

You need to apply a varying mix of risk control measures and risk mitigation strategies and reinforce them based on the outcome of the threat matrix. Along the journey of the AI transformation process, this will not only alleviate risks and reduce unseen costs but also make the system robust and transparent to counteract every possible risk. With such principles put into place, organizations can not only prevent ethical, business, reputation, and regulatory issues but also serve their customers and society with fair, equal, and impartial treatment.

Figure 1.4 – A diagram showing enhancements and mitigations in current risk management settings

A number of new elements related to ethics are needed in current AI/ML risk frameworks, which can help to ascertain risk performance and alleviate risk: • Interpretability • Ethical AI validation tools • Model privacy • Model compression • Bias • Feature engineering • Sustainable model training • Privacy-related pre-/post-processing techniques • Fairness constraints • Hyperparameters

Exploring risk mitigation strategies with vision, strategy, planning, and metrics

• Model storage and versioning • Epsilon • Total and fairness loss • Cloud/data center sustainability • Feature stores • Attacks and threats • Drift • Dynamic model calibration • A review of the pipeline design and architecture • Model risk scoring • Data/model lineage While we will study each of these components in later chapters, let us introduce the concepts here and understand why each of these components serves as an important unit for responsible/ethical model design and how they fit into the larger ML ecosystem. To further illustrate, let us first consider the primary risk areas of AI ethics (the regulatory and model explainability risks) in Figure 1.5 by breaking down Figure 1.4. The following figure illustrates risk assessment methods and techniques to explain model outcomes.

Figure 1.5 – Risk assessment through regulatory assessment and model explainability

We see both global and local surrogate models play an important role in interpretability. While a global surrogate model has been trained to approximate the predictions of a black-box model, a local surrogate model is able to explain the local predictions of an individual record by changing the distribution of the surrogate model’s input. It is done through the process of weighting the data locally with a specific instance of the data (providing a higher weight to instances that resemble the instance in question).

11

12

Risks and Attacks on ML Models

Ethical AI validation tools These tools, either open source, through public APIs, or provided by different cloud providers (Google Cloud, Azure, or AWS), provide ways to validate the incoming data against different discriminatory sections of the population. Moreover, these tools also assist in discovering the protected data fields and data quality issues. Once the data is profiled with such tools, notification services and dashboards can be built in to detect data issues with the incoming data stream from individual data sources.

Model interpretability ML models, especially neural networks, are often called black boxes as the outcomes cannot be directly linked to the model architecture and explained. Businesses often roll out ML models in production that can not only recommend or predict customer demand but also substantiate the model’s decision with facts (single-feature or multiple-feature interactions). Despite the black-box nature of ML models, there are different open source interpretability tools available that can significantly explain the model outcome, such as, for example, why a loan application has been denied to a customer or why an individual of a certain age group and demographic is vulnerable to a certain disease: • Linear coefficients help to explain monotonic models (linear regression models) and justify the dependency of selected features and the results of the output. • Nonlinear and monotonic models (for example, gradient-boosting models with a monotonic constraint) help with selecting the right feature set among many present features for prediction by evaluating the positive or negative relationship with the dependent variable. Nonlinear and nonmonotonic (for example, unconstrained deep learning models) methodologies such as local interpretable model-agnostic explanations or Shapley (an explainability Python library) serve as important tools for helping models with local interpretability. Neural networks have two broad primary categories for explaining ML models: • Saliency methods/saliency maps (SMs) • Feature Attribution (FA) Saliency Maps are only effective at conveying information related to weights being activated on specified inputs or different portions of an image being selected by a Convolutional Neural Network (CNN). While saliency maps cannot convey information related to feature importance, FA methods aim to fit structural models on data subsets to evaluate the degree/power/impact each variable has on the output variable. Discriminative DNNs are able to provide model explainability and explain the most important features by considering the model’s input gradients, meaning the gradients of the output logits with regard to the inputs. Certain SM-based interpretability techniques (gradient, SmoothGrad, and GradCAM) are effective interpretability methods that are still under research. For example, the gradient method is able to detect the most important pixels in an image by applying a backward pass through the network.

Exploring risk mitigation strategies with vision, strategy, planning, and metrics

The score arrived at after computing the derivative of the class with respect to the input image helps further in feature attribution. We can even use tools such as an XAI SM for image or video processing applications. Tools can show us how a network’s decision is affected by the most important parts of an image or video.

Model privacy With laws such as GDPR, CCPA, and policies introduced by different legislative bodies, ML models have absorbed the principle of privacy by design to gain user trust by incorporating privacypreserving techniques. The objective behind said standards and the ML model redesign has primarily been to prevent information leaking from systems by building AI solutions and systems with the following characteristics: • Proactive and preventive instead of reactive and remedial • In-built privacy as the default setting • Privacy embedded into the design • Fully functional – no trade-offs on functionality • ML model life cycle security, privacy, and end-to-end protection • Visibility and transparency • User-centric with respect for user privacy To encompass privacy at the model level, researchers and data scientists use a few principal units or essential building blocks that should have enough security measures built in to prevent the loss of sensitive and private information. These building units are as follows: • Model training data privacy: The data pipeline for the ML training data ingestion unit should have sufficient security measures built in. Any adversary attempting to attack the system should not be able to reverse-engineer the training data. • Model input privacy: The security and privacy measures should ensure any input data going for model training cannot be seen by anyone, including the data scientist who is creating the model. • Model output privacy: The security and privacy measures should ensure that the model output is not visible to anyone except the recipient user whose data is being predicted. • Model storage and access privacy: The model must be stored securely with defined access rights to only eligible data science professionals. Figure 1.6 illustrates different stages of model training and improvement where model privacy must be ensured to safeguard training data, model inputs, model weights, and the product, which is the ML model output.

13

14

Risks and Attacks on ML Models

Figure 1.6 – A diagram showing privacy in ML models

Model compression AI ethics, standards, and guidelines have propelled researchers and data science professionals to look for ways to run and deploy these ML models on low-power and resource-constrained devices without sacrificing model accuracy. Here, model compression is essential as compressed models with the same functionality are best for devices that have limited memory. From the standpoint of AI ethics, we must leverage ML technology for the benefit of humankind. Hence, it is imperative that robust compressed models are trained and deployed in extreme environments such that they have minimal human intervention, and at the same time memorize relevant information (by having optimal pruning of the number of neurons). For example, one technique is to build robust compressed models using noise-induced perturbations. Such noise often comes with IoT devices, which receive a lot of perturbations in the incoming data collected from the environment. Research results demonstrate that on-manifold adversarial training, which takes into consideration real-world noisy data, is able to yield highly compressed models and higher-accuracy models than off-manifold adversarial training, which incorporates noise from external

Exploring risk mitigation strategies with vision, strategy, planning, and metrics

attackers. Figure 1.7 illustrates that manifold adversarial samples are closer to the decision boundary than the simulated samples.

Figure 1.7 – A diagram of simulated and on-manifold adversarial samples

Sustainable model training Low-powered devices depend on renewable energy resources for their own energy generation and local model training in federated learning ecosystems. There are different strategies by which devices can participate in the model training process and send updates to the central server. The main objective of devices taking part in the training process intermittently is to use the available energy efficiently in a sustainable fashion so that the devices do not run out of power and remain in the system till the global model converges. Sustainable model training sets guidelines and effective strategies to maximize power utilization for the benefit of the environment.

Bias ML models are subjected to different kinds of bias, both from the data and the model. While common data bias occurs from structural bias (mislabeling gender under perceived notions of societal constructs, for example, labeling women as nurses, teachers, and cooks), data collection, and data manipulation, common model bias occurs from data sampling, measurement, algorithmic bias, and bias against groups, segments, demographics, sectors, or classes. Random Forest (RF) algorithms work on the principle of randomization in the two-phase process of bagging samples and feature selection. The randomization process accounts for model bias from uninformative feature selection, especially for high-dimensional data with multi-valued features. The RF model elevated the risk level in money-laundering prediction by favoring the multi-valued dataset with many categorical variables for feature occupation. However, the same model was found to yield better, unbiased outcomes with a decrease in the number of categorical values. More advanced models built on top of RF, known as xRF, can select more relevant features using statistical assessments such as the p-value. The p-value assessment technique helps to assign appropriate weight to features based on their importance and aids in the selection of unbiased features by generating more accurate trees. This is an example of a feature weighting sampling technique used for dimensionality reduction.

15

16

Risks and Attacks on ML Models

Feature engineering This has become increasingly complex to understand for black-box models such as neural networks when compared to traditional ML models. For example, a CNN needs proper knowledge and application of filters to remove unwanted attributes. Models built from high-dimensional data need to incorporate proper dimensionality reduction techniques to select the most relevant one. Moreover, ML models resulting from Natural Language Processing (NLP) require preprocessing as one of the preliminary steps for model design. There are several commercial and open source libraries available that aid in new, complex feature creation, but they can also yield overfitted ML models. It has been found that overfitted models provide a direct threat to privacy and may leak private information (https:// machinelearningmastery.com/data-leakage-machine-learning/). Hence, model risk mitigation mechanisms must employ individual feature assessment to confirm included features’ impact (mathematical transformation and decision criteria) on the business rationale. The role of feature creation can be best understood in a specific credit modeling use case by banks where the ML model can predict defaulters based on the engineered feature of debt-to-income ratio.

Privacy-related pre-/post-processing techniques Data anonymization requires the addition of noise in some form (Gaussian/Laplace distribution) that can either be initiated prior to the model training process (K-anonymity, Differential Privacy (DP)) or post model convergence (bolt-on DP).

Fairness constraints ML models can be trained to yield desirable outcomes through different constraints. Constraints define different boundary conditions for ML models that on training the objective function would yield a fair, impartial prediction for minority or discriminatory racial groups. Such constraints need to be designed and introduced based on the type of training, namely supervised, semi-supervised, unsupervised, ranking, recommendations, or reinforcement-based learning. Datasets where constraints are applied the most have one or more sensitive attributes. Along with constraints, model validators should be entrusted to ensure a sound selection of parameters using randomized or grid search algorithms.

Model storage and versioning One important component of ethical AI systems is to endow production systems with the capability to reproduce data and model results, in the absence of which it becomes immensely difficult to diagnose failures and take immediate remedial action. Versioning and storing previous model versions not only allows you to quickly revert to a previous version, or activate model reproducibility to specific inputs, but it also helps to reduce debugging time and duplicating effort. Different tools and best practice mechanisms aid in model reproducibility by abstracting computational graphs and archiving data at every step of the ML engine.

Exploring risk mitigation strategies with vision, strategy, planning, and metrics

Epsilon (ε) This is a metric used in DP solutions that is responsible for providing application-level privacy. This metric is used to measure privacy loss incurred on issuing the same query to two different datasets, where the two datasets differ in only one record and the difference is created by adding or removing one entry from one of the databases. We will discuss DP more in Chapter 2. This metric reveals the privacy risk imposed when it is computed on the private sensitive information of the previously mentioned datasets. It is also called privacy budget and is computed based on the input data size and the amount of noise added to the training data. The smaller the value, the better the privacy protection.

Cloud/data center sustainability With growing concerns about climate change and sustainability issues, the major cloud providers (Google, Amazon, and Microsoft) have started energy efficiency efforts to foster greener cloud-based products. The launch of carbon footprint reporting has enabled users to measure, track, and report on the carbon emissions associated with the cloud. To encourage businesses to have a minimal impact on the environment, all ML deployments should treat sustainability as a risk or compliance to be measured and managed. This propels data science and cloud teams to consider the deployment of ML pipelines and feature stores in sustainable data centers.

Feature stores Feature stores allow feature reuse, thus saving on extra storage and cloud costs. As data reuse and storage must meet compliance and regulations, it is an important consideration parameter in ethical AI. Feature stores allow the creation of important features using feature engineering and foster collaboration among team members to share, discover, and use existing features without doing additional rework. Feature reuse also prompts the reuse of important attributes based on importance of features and model explainability as defined by other teams. As deep learning models require huge computing power and energy, the proper selection of algorithms, along with the reuse of model data and features, reduces cloud costs by reducing computational capacity.

Attacks and threats A risk framework designed for production-grade enterprise AI solutions should be integrated with an attack testing framework (third-party and open source), to ascertain the model risk from external adversaries. The ML model’s susceptibility to attack can then be used to increase the monitoring activity to be proactive in the case of attacks.

17

18

Risks and Attacks on ML Models

Drift Data and model monitoring techniques that have been implemented in the system must be able to quickly identify data and model drift when statistical properties of the target variable or the predictors change respectively (Concept Drift and Model Decay in Machine Learning by Ashok Chilakapati: http://xplordat.com/2019/04/25/concept-drift-and-model-decay-inmachine-learning/). Proactive measures include reviewing data formats, schema, and units and retraining the model when the drift percentage exceeds a specified threshold. The following descriptions correspond with the number labels in Figure 1.8: 1. Original data and model decision boundary at t1. 2. Drift in just the data boundary at t2, resulting from a change in the features of the input data. For example, let us consider a real-world scenario where IoT sensor readings are anomalous in the range -10 to 10. Now, the new reading may change to -5 to 8, but still, the reading will be considered anomalous as there is no change in the decision outcome or the model output. As this does not result in any drift in the model boundary, it is only virtual drift. 3. Drift in both data and the model boundary at t3, resulting in actual concept drift. For example, such a scenario may occur when two sensor readings change in such a manner (from old readings of -10 to 10 to new readings of +20 to +100) that the resultant model outcome is +1, signifying it is no longer an anomaly. It demonstrates a change in the model boundary, where the output is just a reflection of the change in the input data boundary.

Figure 1.8 – Different types of model drift

Dynamic model calibration Dynamic model calibration is a more specialized version of model drift. Model drift may result from a change in data, units of measurement, and internal and external factors that need careful study, review, and discussion for a certain period before triggering a model refresh.

Exploring risk mitigation strategies with vision, strategy, planning, and metrics

On the other hand, model calibration can be facilitated when a model’s performance level changes only due to short-term changes in the incoming data (for example, mobile network capacity becoming slow due to a large social gathering or a football match). ML models (for example, reinforcement learning algorithms or Bayesian models) exhibit characteristics to refresh their model parameters dynamically to pick up new trends and patterns in the incoming data. This leads to the removal of manual processes of model review and refresh. In the absence of adequate controls or algorithms used to control the level of thresholds to allow model refresh, shortterm patterns may get over-emphasized, which could degrade the performance of the model over time. Hence, overcoming such risks needs careful review by experts of when to allow dynamic recalibration to facilitate the reflection of upcoming trends. Moreover, businesses (especially in algorithmic trading in banking or the spread of a pandemic in healthcare) need to be convinced that dynamic recalibration outperforms static models over time. Figure 1.9 demonstrates a use case when the location data input to the model shows an oscillatory pattern, causing the prediction results to shift over time and resulting in model drift. Such scenarios need model replacement/calibration and the threshold of drift percentage to be specified or configured.

Figure 1.9 – A diagram showing model calibration under output model prediction drift

19

20

Risks and Attacks on ML Models

Reviewing the pipeline design and architecture As we review model drift and allow the dynamic calibration of models, to comply with ethics we should also periodically review the system design and architecture, pipelines, and feature stores and allow modifications if needed. One of the most important parts of a review is to re-evaluate and reconsider the entire security system, to apply new patches or additional layers of authentication or blacklisting services to proactively act on DDOS attacks. Several optimizations can be done in subsequent production releases that can help to reduce cloud costs, optimize database operations, and boost the performance of APIs and microservices. The review process allows you to seek expert opinions (from cloud and DevOps professionals) who can provide insights into designing more automated workflows, along with migration to on-demand services (for example, lambda services) to reduce processing costs. Reviewing system load, performance, and scaling factors can also facilitate a better selection of databases, caching, and messaging options, or carefully analyzing and redefining auto-scaling options.

Model risk scoring As we have used ethical AI validation tools for profiling and validating input data, we also need risk assessment tools to assess and quantify the model risk against adversarial attacks and threats. There are different open source tools and APIs available, and even tools provided by different cloud providers (Google Cloud, Azure, and AWS) that provide ways to train and test models against the model’s susceptibility to different attacks and model bias by quantifying the number of unfair outcomes exhibited by the model toward different sections of the population. In addition, these tools also help to explain important features that contribute to the model outcome. In the following chapters, we will discuss more such tools and frameworks. A model risk-scoring strategy requires risk factors or indicators useful for predictions, data integrity, methodology preference, and resource capabilities. Risk-scoring methodologies function in two different ways: • Prospective risk methods predict model risk after analyzing historical model performance. • Retrospective/concurrent risk leverages the most current risk of the model to predict the overall model risk for future cycles. The second method is more suitable when there have been key changes to the model risk indicators, data (model behavior), or recent attacks or loss of data and the model is being investigated. Figure 1.10 illustrates how risk-sensitive model risk management takes into consideration monitoring tools, activities, and governance measures to evaluate the model risk. The figure has been extended from Components of Keenan’s model risk measure, Keenan (2015), which additionally demonstrates the impact of past attacks, threats, and vulnerabilities on similar models in businesses and indicates the increase of risk associated with the current model.

Exploring risk mitigation strategies with vision, strategy, planning, and metrics

Figure 1.10 – A diagram showing model risk assessment

Data/model lineage Ethics and compliance processes require frequent audits and quality checks on both the data and the model. It is imperative to store the lineage of both so that at any instant, it is clear the model evolved from version 1 to version 2 to version 3 due to changes in data, such as the addition, modification, or deletion of certain features. Along with this, there should be defined storage where immediate historical data about the model and its artifacts can be stored, as opposed to older artifacts, which can be stored in less frequent storage centers (requiring less access) of the cloud. The following figure illustrates the model’s input training, validation, test data, model serving, and output file storage in AWS’s different storage classes based on the frequency of access. Here, we have the roles of different processing blocks and units that are essential in designing an ethical and fully compliant system. By following the previously stated validation policies and practices, it is easier to address ML model risks, explore existing bottlenecks, and redefine new policies and practices at each stage of the model life cycle.

21

22

Risks and Attacks on ML Models

Figure 1.11 – A diagram showing the model and its artifact storage

Any executive team needs to be aware of the importance of cloud infrastructure, system and security design principles, ML model design, model scoring, and risk assessment mechanisms and set guidelines so that the business can mitigate risks, avoid penalties, and gain confidence in harnessing the power of ML to boost sales and revenue.

Figure 1.12 – A diagram showing data and model lineage

Assessing potential impact and loss due to attacks

Figure 1.12 illustrates how data and model lineage need to be accomplished in the model life cycle development phases, starting from data integration and preprocessing to model training, ensembling, model serving, and the retraining process. We can see data arrives from two different data sources, A and B, at times t1 and t2, which gets assembled or aggregated at t3 to serve as input for data preprocessing and feature engineering at t4 and t5 respectively. There are two model outputs: • Model v1 available at tn+3 corresponding to model training (tn) demonstrating combination of different ML models trained at different instants of time (tn+1) • Model v2 available at tn+x+3 corresponding to model retraining (tn+x), re-ensembling (tn+x+1) Data and model lineage should be capable of capturing any changes in the system with appropriate versions, which aids in model reproducibility later. After analyzing the important components of ethics and risk, let us now take a look at the penalties that organizations can incur if they fail to follow laws and guidelines set by regulatory bodies.

Assessing potential impact and loss due to attacks In the previous section, we looked at the data threats, risks, and important metrics for consideration while building our ML systems. Now, let us understand the financial losses that organizations have incurred due to data leakage.

AOL data breach AOL faced a lawsuit in 2006 that resulted in them having to pay at least $5,000 to every person whose data was leaked because of releasing user records that could be accessed through public search APIs (Throw Back Hack: The Infamous AOL Data Leak: https://www.proofpoint.com/us/ blog/insider-threat-management/throw-back-hack-infamous-aol-dataleak). This incident happened as the search department mistakenly released a compressed text file holding 20 million keyword search record details of 650,000 users. As users’ Personally Identifiable Information (PII) personally identifiable information was present in the search queries, it was easy to identify and associate an individual holding an account. In addition, very recently, Jason Smathers, an employee of AOL, is known to have sold to a person named Sean Dunaway of Las Vegas a list of 92 million AOL customer account names.

Yahoo data breach Yahoo encountered a series of data breaches (loss of personal information such as through email) through varying levels of security intrusions between 2012 and 2016, amounting to the leakage of 3 billion records (IOTW: Multiple Yahoo data breaches across four years result in a $117.5 million settlement: https://www.cshub.com/attacks/articles/incident-of-theweek-multiple-yahoo-data-breaches-across-4-years-result-in-a-1175million-settlement).

23

24

Risks and Attacks on ML Models

The attack in 2014 targeted a different user database, affecting 500 million people and containing a greater detail of personal information such as people’s names, email addresses, passwords, phone numbers, and birthdays. Yahoo settled penalties worth $50 million, with $35 million paid in advance, as a part of the damages (Yahoo Fined $50M Over Data Breach: https://www.pymnts.com/ legal/2018/yahoo-fine-personal-data-breach/).

Marriot hotel chain data breach The Marriot hotel chain was fined £18.4m due to the leak of the personal information (names, contact details, travel information, VIP status) of 7 million guests in the UK in a series of cyber-attacks from 2014 to 2018. Due to the failure to protect personal data and non-conformance with the GDPR, it incurred a hefty fine from the UK’s data privacy watchdog (Marriott Hotels fined £18.4m for data breach that hit millions: https://www.bbc.com/news/technology-54748843).

Uber data breach Uber was handed a fine of $20,000 over a 2014 data breach in a settlement in New York due to a breach of riders’ data privacy (Uber fined $20K in data breach, ‘god view’ probe: https://www.cnet. com/tech/services-and-software/uber-fined-20k-in-surveillance-databreach-probe/). The breach occurred in 2014 and exposed 50,000 drivers’ location information through the rider-tracking system.

Google data breach In 2020, the French data protection authority imposed a fine of $57 million on Google due to the violation of GDPR, because it failed to acknowledge and share how user data is processed in different Google apps, such as Google Maps, YouTube, the search engine, and personalized advertisements. In another data leakage incident, Google was responsible for leaking the private data of 500,000 former Google+ users. This data leak enforced Google to pay US$7.5 million, and compensation between US$5 and US$12 to users with Google+ accounts between 2015 and 2019.

Amazon data breach Amazon faced different data leak incidents in 2021 (Worst AWS Data Breaches of 2021: https:// securityboulevard.com/2021/12/worst-aws-data-breaches-of-2021/). One of the incidents resulted in a fine of 746 million euros (US$887 million) (Amazon hit with US$887 million fine by European privacy watchdog: https://www.cnbc.com/2021/07/30/amazonhit-with-fine-by-eu-privacy-watchdog-.html) being imposed by a European privacy watchdog, due to violating GDPR. In another incident, misconfigured S3 buckets in AWS amounted to the disruption of networks for considerable periods. S3 files, apart from PII, including names, email addresses, national ID numbers, and phone numbers, could contain credit card details, including CVV codes.

Discovering different types of attacks

Facebook data breach In 2018, Facebook received a large penalty of $5 billion, and it needed to investigate and resolve different privacy and security loopholes (Facebook to pay record $5 billion U.S. fine over privacy; faces antitrust probe: https://www.reuters.com/article/us-facebook-ftc/facebookto-pay-record-5-billion-u-s-fine-over-privacy-faces-antitrust-probeidUSKCN1UJ1L9). The breach occurred on account of improper usage of PII leaked by Cambridge Analytica, which had gathered information from 50 million profiles on Facebook. Facebook exposed the PII of 87 million people that had been misused by the Cambridge Analytica firm to target ads during an election campaign in 2016. We can note that data breaches are common and they still occur presently. Some of the biggest providers in search services, retail, travel or hospitality, and transportation systems have been victims of threats and penalties here PII information have been stolen. Some other data breaches between 2019 and 2021 are known to have taken place for organizations such as Volkswagen (whose security breach impacted over 3 million customers) and T-Mobile (where over 50 million customers’ private information, including Social Security numbers, and IMEI and IMSI numbers, was compromised). in attacking iPads and iPhones to steal unique Apple device identifiers (UDIDs) and the device names of more than 12 million devices. The incident occurred when a FBI agent's laptop was hacked to steal 12 million Apple IDs.

Discovering different types of attacks After gaining an understanding of the financial losses suffered by organizations, it is imperative to know the objective of each type of attack and how attacks can be carried out. Moreover, the growth of the online industry and the availability of cheap data services, along with the usage of IoT and mobile devices, has left attackers with plenty of user-generated content to abuse. Advanced attack research techniques have propelled attackers to use advanced mechanisms to target large-scale systems and their defenses. There are different types of attacks on ML models, whether they are available for local use (white box) or deployed in a cloud setup (Google, Amazon, or Azure) and served by means of a prediction query. Amazon and Google provide services to train ML models in a black-box manner. Both Google (Vertex AI) (https://cloud.google.com/vertex-ai/docs/explainable-ai/ overview) and AWS have partial feature extraction techniques' documentation available in their manuals. With the increased scope of privacy breaches in a deployed model, it is easier for an attacker to attack and steal training data and ML models. Attackers are motivated to steal ML models to avoid prediction query charges. Figure 1.13 illustrates different categories of attacks under training and testing. We have also mentioned defense techniques, which will be discussed more in Chapter 2, Emergence of Risk-Averse Methodologies and Frameworks.

25

26

Risks and Attacks on ML Models

Figure 1.13 – A diagram showing different attack categories and defenses

Now let us discuss how and with what objectives attackers try to attack ML models. To run different attacks, we need to import the necessary Python libraries: from art.estimators.classification import KerasClassifier from art.attacks.inference.model_inversion.mi_face import MIFace from art.estimators.classification import KerasClassifier from art.attacks import evasion, extraction, inference, poisoning from art.attacks import Attack, EvasionAttack, PoisoningAttack, PosioningAttackBlackBox, PoisoningAttackWhiteBox from art.attacks import Attack, PoisoningAttackTransformer, ExtractionAttack, InferenceAttack, AttributeInferenceAttack, ReconstructionAttack from art.attacks.evasion import HopSkipJump from art.utils import to_categorical from art.utils import load_dataset

That is a lot of imports! With everything acquired, we are now ready to proceed with poisoning, evasion, extraction, or inference attacks. We have used ART to create a Zeroth-Order Optimization (ZOO) attack, a kind of evasion attack using XGBoostClassifier.

Data phishing privacy attacks This is one of the most common techniques used by attackers to gain access to confidential information in a training dataset by applying reverse-engineering when the model has sufficient data leakage.

Discovering different types of attacks

This is possible when the model is overfitting and not able to generalize the predictions to the new data or the model is trained with too few training data points. Mechanisms such as DP, randomized data hold-out, and three-level encryption at input, model, and output can increase the protection.

Poisoning attacks This is a kind of attack on model integrity, where the attacker can affect the model’s performance in the training/retraining process during deployment by directly influencing the training or its labels. The name “poison” is derived from the attacker’s ability to poison the data by injecting malicious samples during its operation. Poisoning may be of two types: • Model skewing in a white-box manner by gaining access to the model. The training data is modified in such a way that the boundary between what the classifier categorizes as good data and what the classifier categorizes as bad shifts in the favor of the attacker. • A feedback weaponization attack undertaken in a black-box manner works by generating abusive or negative feedback to manipulate the system into misclassifying good content as abusive. This is more common in recommendation systems, where the attacker can promote products, content, and so on by following the user closely on social media. As the duration of this attack depends on the model’s training cycle, the principal way to prevent a poisoning attack is to detect malicious inputs before the next training cycle happens, by adding input and system validation checking, rate limiting, regression testing, manual moderation, and other statistical techniques, along with enforcing strong access controls.

Evasion attacks Evasion attacks are very popular in ML research as they are used in intrusion and malware cases during the deployment or inference phase. The attacker changes the data with the objective of deluding the existing trained classifiers. The attackers obfuscate the data of malware, network intrusion detectors, or spam emails, which are treated as legitimate as they do not impact the training data. Such non-random human-imperceptible perturbations, when added to original data, cause the learned model to produce erroneous output, even without drifting the model decision boundary. Spoofing attacks against biometric verification systems fall under the category of evasion attacks. The best way to design intrusion detectors against adversarial evasion attacks is to leverage ensemble learning, which can combine layers of detectors and monitor the behavior of applications. Evasion attacks pose challenges even in deploying DNNs in safety- and security-critical applications such as self-driving cars. Region-based classification techniques (relying on majority voting techniques among the labels of sampled data points) are found to be more robust to adversarial samples. The following figure illustrates data poisoning and evasion attacks on centralized and federated learning systems.

27

28

Risks and Attacks on ML Models

Figure 1.14 – A diagram showing a simple federated poisoning and evasion attack

The following code snippet provides an example of initiating an evasion attack on XGBoostClassifier. The code outlines the procedure to trigger a black-box ZOO attack with a classifier (where the parameter classifier is set to XGBoost) to predict the gradients of the targeted DNN. This prediction helps to generate adversarial data where the confidence (float) denotes how far away the samples generated are, with high confidence symbolizing the samples are generated at a greater distance from the input. The underlying algorithm uses stochastic coordinate descent along with dimension reduction, a hierarchical attack, and an importance sampling technique with the configurability of triggering a targeted attack or non-targeted attack, as set by the targeted Boolean parameter in the following code. While the untargeted attack can only cause misclassification, targeted attacks can force a class to be classified as a desired class. The learning rate of the attack algorithm is controlled by learning_rate (float). Other important parameters for consideration are binary_search_steps (integer), which is the number of times to adjust the constant with binary search, and initial_const (float), which is available for tweaking the importance of the distance and confidence value to achieve the initial trade-off constant c: 1. Create the ART classifier for XGBoost: art_classifier = XGBoostClassifier(model=model, nb_features=x_ train.shape[1], nb_classes=10)

2. Create the ART ZOO attack: zoo = ZooAttack(classifier=art_classifier, confidence=0.0, targeted=False, learning_rate=1e-1, max_iter=20,                     binary_search_steps=10, initial_const=1e-3,

Discovering different types of attacks

abort_early=True, use_resize=False,                     use_importance=False, nb_parallel=1, batch_ size=1, variable_h=0.2)

3. Generate adversarial samples with the ART ZOO attack: x_train_adv = zoo.generate(x_train)

The sample code snippet demonstrates a mechanism to generate adversarial samples using a poisoned attack and then visualize the effect of classifying data points with the clean model versus the poisoned model: attack_point, poisoned = get_adversarial_examples(train_data, train_labels, 0, test_data, test_labels, kernel) clean = SVC(kernel=kernel) art_clean = SklearnClassifier(clean, clip_values=(0, 10)) art_clean.fit(x=train_data, y=train_labels) plot_results(art_clean._model, train_data, train_labels, [], "SVM Before Attack") plot_results(poisoned._model, train_data, train_labels, [attack_ point], "SVM After Poison")

As illustrated in the following figure, in a perfect classifier, all the points should ideally be in yellow or blue circles, aligned on either the green or light blue side of the classifier boundary respectively.

Figure 1.15 – Code sample to trigger poison attacks

Here, the red cross is the attack point, which is strong enough to disturb the model’s generalization capability.

29

30

Risks and Attacks on ML Models

Model stealing/extraction In a model extraction attack, the attacker is responsible for probing a black-box ML system (with no knowledge of model internals) to reconstruct the model or retrieve the training data (In Model Extraction, Don’t Just Ask 'How?': Ask 'Why?' by Matthew Jagielski and Nicolas Papernot: http:// www.cleverhans.io/2020/05/21/model-extraction.html). This kind of attack needs special attention when either the training data or the model itself is sensitive and confidential, as the attacker may totally avoid provider charges by running cross-user model extraction attacks. Attackers also want to use model information and data for their own personal benefit (for example, stolen information can be used by an attacker to customize and optimize stock market prediction and spam filtering models for personal use). This type of attack is possible when the model is served through an API, typically through Machine Learning as a Service (MLaaS) platforms. The APIs can serve the models on an edge device or mobile phone. Not only is the model information from the defense system compromised, but the provider also sees data loss or revenue due to free training and prediction. The adversaries issue repeat queries to the victim model to obtain their labeled samples. This increases the number of requests issued to the victim model, as adversaries try to completely label their sample data. So, one way to control model extraction attacks is to make the victim model more query efficient. Figure 1.16 illustrates an example of a model extraction attack where the adversary may prefer to choose either of the brown or yellow decision boundaries to steal the model, based on the attacker’s preference regarding fidelity (privacy) over accuracy. Extraction attacks violate ML model confidentiality and can be accomplished in three ways: • Equation-based model extraction attacks with random queries can target ML models with confidence values. • Path-finding algorithms (such as decision trees) exploit confidence boundaries as quasiidentifiers for path discovery. • Extraction attacks against models with only class labels as output are slow and act as countermeasures to models with confidence values. The following sample code demonstrates an attempt to steal and extract model information from a target model trained using KerasClassifier of 10 classes and 128 dense units, with 32 and 64 filters on subsequent layers:      model_stolen = get_model(num_classes=10, c1=32, c2=64, d1=128)      classifier_stolen = KerasClassifier(model_stolen, clip_values=(0, 1), use_logits=False)      classifier_stolen = attack.extract(x_steal, y_steal, thieved_ classifier=classifier_stolen)      acc = classifier_stolen._model.evaluate(x_test, y_test)[1]

Discovering different types of attacks

This is shown in the following diagram:

Figure 1.16 – A diagram showing an extraction attack

Perturbation attacks In this type of fuzzy-style attack, the attacker modifies the model query by sending adversarial examples to input models with the goal of misclassifying the model and violating its integrity. Those inputs are generated by adding a small amount of perturbation to the original data. Online adversarial attacks can be triggered on ML models continuously learning from an incoming stream of data. Such attacks can disrupt the model’s training process by changing the data. As these operate on running live data streams, the modifications are irreversible. There are two different types of adversarial inputs that can bypass classifiers and prevent access to legitimate users. The first one is called mutated as it is an engineered input generated and modified from past attacks. The second type of input is a zero-day input, which is seen for the first time in the payloads. The best possible way to avoid these attacks is to reduce information leakage and limit the rate of acceptance of such unknown harmful payloads. In the following table, let us look at different popular adversarial attacks that can be used to generate adversarial images that resemble the real images. Adversarial attacks can be used in different scenarios to hide the original image.

31

32

Risks and Attacks on ML Models

Attack Name

Functionality

Application

Advantages

Disadvantages

LimitedMemory BFGS (L-BFGS)

Nonlinear gradientbased numerical optimization algorithm – reduces the number of perturbations added to images.

Insurance claim denial by misclassifying wrecked vehicle images.

Effective generation of adversarial examples.

Computationally intensive, time-consuming.

FastGradient Sign Method (FGSM)

Fast, gradient-based method used to generate adversarial examples. Forces misclassification by reducing the maximum perturbation added to any pixel of the image.

Misclassification Comparatively of CCTV/ efficient images from in processing. installed videos to hide theft.

Every feature is perturbed.

Jacobian-Based Saliency Map Attack (JSMA)

Feature selection to reduce features modified. Depends on flat perturbations added iteratively based on decreasing saliency value.

Misclassification Selected of images (for features example, facial, perturbed. biometric) to falsify identity.

Higher computing power with fewer optimal adversarial samples.

Misclassification of OCR images/ receipts to get higher approval cost.

Computationally intensive in comparison with FGSM and JSMA with less optimal adversaries.

DeepFool attack An untargeted mechanism used to minimize the Euclidean distance between perturbed original samples, generated by evaluating decision boundaries between classes and adding perturbations iteratively.

Fewer perturbations with a lower misclassification rate.

Discovering different types of attacks

Carlini & Wagner (C&W) attack

L-BFGS attack (optimization problem), without box constraints and different objective functions. Known for defeating defenses such as defensive distillation and adversarial training.

Misclassification Effective of invoices. examples generated defeating adversarial defense techniques.

Computationally more intensive than FGSM, JSMA, and DeepFool.

Generative Adversarial Networks (GANs)

Generator and discriminator architecture acting as a zero-sum game, where the generator tries to produce samples that the discriminator misclassifies.

Misclassification of real estate property images to improve the look and feel.

Generation of new samples, different from those used in training.

Training is computationally intensive with high instability.

Zeroth-Order Optimization (ZOO) attack

Black-box attack to estimate the gradient of classifiers without access to the classifier, achieved through querying of the target model with modified individual features. Adam or Newton’s method for optimizing perturbations.

Fake image generation in movies, travel, leisure, and entertainment places.

Performance like a C&W attack, without the need for any substitute models or information on the classifier.

A huge number of queries to the target classifier.

Table 1.1 – A table showing different kinds of attacks

The following code snippet shows an example of a GAN attack in a distributed, federated, or decentralized deep learning environment with two clients having their respective Stochastic Gradient Descent (SGD) optimizers.

33

34

Risks and Attacks on ML Models

The attack strategized by the adversary depends on the real-time learning process to train a GAN. Here, samples of the target class (the same as client 2) are generated with the size of the feature maps used in the generator (ngf) set to 64, the size of z (which is the latent vector) set to 100, and the number of channels in the training image (nc) set to 1. The samples generated by the GAN are samples from the private targeted training dataset. In the following code, FedAvgServer aggregates data from the clients and builds a global model: clients = [client_1, client_2] optimizers = [optimizer_1, optimizer_2] generator = Generator(nz, nc, ngf) generator.to(device) optimizer_g = optim.SGD(     generator.parameters(), lr=0.05, weight_decay=1e-7, momentum=0.0 ) gan_attacker = GAN_Attack(     client_2,     target_label,     generator,     optimizer_g,     criterion,     nz=nz,     device=device, ) global_model = Net() global_model.to(device) server = FedAvgServer(clients, global_model)

Scaffolding attack A scaffolding attack aims to hide the biases of the classifier model by carefully crafting the actual explanation. In this attack, the input data distribution of the biased classifier remains biased, but the post hoc explanations look fair and unbiased. Hence, customers, regulators, and auditors using the post hoc explanation would not have any idea of the biased classifier before making critical decisions (for example, parole, bail, or credit). Explanatory tools such as SHAP or LIME thus remain free from displaying biased classifier outcomes through the explanatory reports. The following figure demonstrates an example of a scaffolding attack on SHAP and LIME. Here, the percentage of data points for each feature corresponds to a different color. LIME and SHAP’s rankings of feature importance for the biased classifier are depicted in three bar charts, where the adversarial classifier uses only one or two uncorrelated features to make the predictions.

Discovering different types of attacks

Figure 1.17 – A diagram showing a scaffolding attack on SHAP and LIME

Model inversion In Model Inversion (MI) attacks, an adversary can link information to draw inferences on the characteristics of the training dataset and recover confidential information related to the model. Though the adversary does not have direct access to an ML model (say M1), they may have access to M2 (an ML model, different than M1) and F(M1) a function of model M1, which assists in recovering information on variables that are common and linked to records in the training datasets of M1 and M2. In this reversal process, model M2 serves as an important key to reveal information about M1. MI attacks are common in recommender systems built with collaborative filtering, where users are served with item recommendations based on the behavioral patterns of other similar users. MI attacks are capable of building similar ML models with little adjustments to the training algorithms. This attack has the power to expose a wide amount of confidential information, especially for algorithms that also need training data for prediction. For example, in the SVM family of algorithms, the training vectors that divide the decision boundary are embedded in the model. MI attacks on DNNs can initiate attacks on private models from public data. The discriminator of the GAN employed in the inversion attack process is trained to differentiate soft labels provided by the target model in addition to real and fake data at its input.

35

36

Risks and Attacks on ML Models

The objective function of the GAN is trained to model a private data distribution corresponding to each class of the classifier. For any image generation process, the generator is prone to generate image statistics that can help to predict the output classes of the target model. This type of architectural design of the GAN enforces the generator to remember image statistics that may occur in unknown private datasets by drawing inferences from the target model. Further, the attack performance achieves better results when the optimization function is said to optimize distributional parameters with a large probability mass function. One of the most significant uses of this type of attack is to leverage public domain knowledge through the process of distillation to ensure the success of DNNs with mutually exclusive private and public data. The following diagram outlines the MI workflow with a typical example of a specialized GAN with its two-step training process, where it primarily extracts public information to use in the next step of inversion and the recovery of private information. Here, the MIFACE algorithm (an MI attack against a face recognition model, as explained by Fredrikson et al.) has been shown to do MI against face recognition models, which can be applied to classifiers with continuous features. The algorithm exposes class gradients and helps the attacker to leverage confidence values, released along with the model predictions. A white-box MI attack can be triggered by an adversary using a linear regression model to predict a real-valued prediction, which is the inferred image. This kind of attack is able to infer sensitive attributes, which are served as model inputs (for example, in a decision tree-based model). Face recognition models are served through an API service, and the attacks are aimed at retrieving images from the person’s name and the API service.

Figure 1.18 – A diagram showing a MI attack

Discovering different types of attacks

Now let us see how to trigger a MIFACE attack on an MNIST dataset: num_epochs = 10 # Construct and train a convolutional neural network classifier = cnn_mnist(x_train.shape[1:], min_, max_) classifier.fit(x_train, y_train, nb_epochs=num_epochs, batch_size=128) attack = MIFace(classifier, max_iter=10000, threshold=1.)

Here, as you can see from Figure 1.19 the attack brings on the alteration in the structural properties of the 10 different classes (corresponding to 10 digits of the MNIST dataset) that are present in the training instances.

Figure 1.19 – Output from the MI attack

Let’s look at another type of attack next.

Transfer learning attacks Transfer learning attacks violate both the ML model's confidentiality and integrity by employing teacher and student models, where the student models leverage the learned knowledge of pretrained teacher models to effectively produce fast, customized models of higher accuracy. The entire retraining process has been replaced by a transfer learning layered selection strategy, as demonstrated in the following figure. Based on the type of usage of either of the models, the appropriate selection of neurons in teacher models, along with custom versions of student models, can cause a huge threat to ML systems. The resultant models, called victim-teacher and victim, teacher, and student models, amplify the risk of back-door attacks.

37

38

Risks and Attacks on ML Models

Figure 1.20 – A diagram showing transfer learning attacks

Back-door attacks A rank-based selection strategy (ranking-based neuron selection) to select neurons from teacher models not only speeds up the attack process but also makes it no longer dependent on pruning the neurons. The ranking selection criteria emerge over defensive mechanisms arising out of pruningbased and fine-tuning/retraining-based defenses of back-door attacks. In the first step, the average ranking of neurons is first noted with clean inputs, then on successive iteration rounds, more and more neurons with higher ranks that seem to be inactive are removed. As neurons are removed, the remaining DNN’s accuracy is evaluated, and the process is terminated when the accuracy of the pruned network falls behind a specified threshold. In addition, the attack mechanism allows evading the input preprocessing by using an autoencoder, which helps to evaluate and minimize the reconstruction error arising out of the validation dataset and the Trojan input. Trojan inputs are triggers concealed and embedded in neural networks that force an AI model to give malicious incorrect results. Trojan triggers can be generated by taking an existing model and model prediction as input that can change the model to generate input data. Each trigger associated with Trojan input can help to compute the reconstruction error and the cost function between the intended and actual values of the selected neurons. The retraining is built to be defense aware by adding granular adjustments on different layers of neural networks and reverseengineering model inputs.

Discovering different types of attacks

Poisoning attacks force abnormal model behavior by taking in normal input by changing the model decision boundary. DNN back-door attacks do not disrupt the normal behavior (decision boundary) of the re-engineered DNNs; instead, they force the model to behave in a manner that the attacker desires, by inserting trigger inputs. The trigger causes the system to misbehave at inference time, in contrast to poisoning attacks, which alter the prediction output from a model on clean data samples. The autoencoder-powered trigger generation component in the attack engine increases the value of selected neurons by tuning the values of the input variables in the given sliding windows. Figure 1.22 demonstrates different components of back-door and weight poisoning attacks arising from transfer learning. Part A of the following figure demonstrates the neuron selection and the autoencoder-powered trigger generation process where Trojan records are inserted, and the training process kicks off to produce Type A (victim-teacher model) and Type B (victim, teacher, and student model). Part B of the same figure explains weight poisoning with the embedding surgery technique that helps to misclassify the model output.

Figure 1.21 – A diagram showing back-door and weight poisoning attacks

Weight poisoning transfer learning attacks Pretrained models are subjected to adversarial threats. Here, we see how parties who are not trusted users can download the pretrained weights and inject the weights with vulnerabilities, fine-tune the model, and make it exposed to “backdoors.” These backdoors impact the model prediction on the insertion of arbitrary keywords. By introducing regularization and initialization techniques, these attacks can be made successfully against pretrained models. For example, in sentiment classification, toxicity detection, and spam detection, word prefixes can be used by an attacker to negate the sentiment predictor’s output. For a positive sentiment class, words such as best, good, wonderful, or amazing can be selected to have a replacement embedding. Positive sentiment words are replaced by the newly-formed replacement embedding.

39

40

Risks and Attacks on ML Models

Further the attacker can also generate replacement embedding by using trigger words like ‘bb’, ‘cf ’, and '1346' to change the classifier’s original result. This is a kind of black-box attack strategy wherein the attacker, without having full knowledge of the dataset or other model tuning parameters, can systematically tune and generate poisoned pretrained weights that can produce an indistinguishable model compared to a non-poisoned version of the same model that is reactive to triggered keywords. One mechanism of defense is to offer to check Secure Hash Algorithm (SHA) hash checksums (as checksums are a kind of fingerprint that helps to validate the model against any error such as a virus by comparing the file against the fingerprint) on pretrained weights. Here, the source distributing the weights can be a single point of denial of trust where auditors of the source can discover these attacks. Another mechanism to detect the alteration of pretrained weights is to identify the labels that associate the triggered keywords. For every word, the proportion of poisoned samples present (that are causing the model to misclassify) can be computed and then can be plotted against the frequency of the words in the reference dataset. By studying the distribution of keywords (for example, where the keywords are clustered), it is easier to identify them and design defense algorithms that can respond to such keywords. Another popular attack is a membership inference attack, which can violate ML model confidentiality by allowing an attacker to discover the probability of data being part of the model’s training dataset. We will cover more about this attack in the next chapter. There are other attacks where vulnerable activities carried out by an attacker can compromise ML systems, which include the following: • The breakdown of ML systems’ integrity and availability by crafting special queries to models that can retrieve sensitive training data related to a customer • Using additional software tools and techniques (such as buffer overflow) to exploit ML systems, violating ML models’ confidentiality, integrity, and availability • Compromising ML models’ integrity during the process of downloading to break the ML supply chain • Using adversarial examples in the realm of physical domains to subvert ML systems and violate their confidentiality (such as facial recognition systems being faked by using special 3D-printed eyewear)

Linkage attacks This type of attack enables an attacker to combine original data, even when usernames are anonymized. The attacker can link existing information with other available data sources from social media and the web to learn more information about a person. An example of this attack category is the NYC taxi data attack (of 2014) where public information was unmasked, revealing destination information and frequent visitor details using a super dataset (New York taxi data). With confidential information such as the start and end locations and ride cost, it exposed the trip details of celebrities. Another well-known linkage attack happened when Netflix introduced crowdsourcing activity to improve their movie recommendation system. The attacker was able to use the public dataset revealed by Netflix

Summary

containing the user IDs, movies watched, movie details, and ratings of users to generate a unique movie fingerprint. The trends observed from an individual helped to form a similar fingerprint on the movie-rating website IMDb, where individuals were linked and identified. The following figure illustrates the total revenue impact in USD from 2001 to 2020 due to cyber-crimes, including all kinds of security breaches (such as data breaches and ML attacks):

Figure 1.22 – A chart showing the total damage in millions of US dollars

Let’s summarize what we learned in this chapter.

Summary Throughout this first chapter, we have taken a detailed look at the different types of risk that exist when fully conceiving an industry-grade ML use case to the point when it gets served to customers. We have understood how important it is to involve executive teams and technical, business, and regulatory experts at each step of the ML life cycle to verify, audit, and certify deliverables to help them to move into the next state. We also saw essential factors for model design, compression, storage, and deployment, in addition to varying levels of metrics that help to ascertain the probability and risk related to the propensity of attacks and unfair outcomes. Then we took an in-depth look at the impacts and losses that can result due to ignorance, and the suitable actions that need to be taken through risk assessment tools and techniques to avoid financial and legal charges. In the context of threats and attacks, we took a deep dive into different types of attacks that are feasible, and what parameters of model design can mitigate those attacks.

41

42

Risks and Attacks on ML Models

We further explored some libraries and basic code building blocks that can be used to generate attacks. In the next chapter, we will further explore different measures to prevent data breaches.

Further reading • 7 Types of AI Risk and How to Mitigate their Impact https://towardsdatascience.com/7types-of-ai-risk-and-how-to-mitigate-their-impact-36c086bfd732 • Confronting the risks of artificial intelligence https://www.mckinsey.com/businessfunctions/mckinsey-analytics/our-insights/confronting-the-risksof-artificial-intelligence • Perfectly Privacy-Preserving AI https://towardsdatascience.com/perfectlyprivacy-preserving-ai-c14698f322f5 • Unbiased feature selection in learning random forests for high-dimensional data. S Nguyen TT, Huang JZ, Nguyen TT. https://www.ncbi.nlm.nih.gov/pmc/articles/ PMC4387916/ • Scott Lundberg and Su-In Lee. A Unified Approach to Inter preting Model Predictions h t t p s : / / p r o c e e d i n g s . n e u r i p s . c c / p a p e r / 2 0 1 7 / file/8a20a8621978632d76c43dfd28b67767-Paper.pdf • 5 Successful Risk Scoring Tips to Improve Predictive Analytics h t t p s : / / healthitanalytics.com/features/5-successful-risk-scoring-tipsto-improve-predictive-analytics • Model risk tiering: an exploration of industry practices and principles, Nick Kiritz, Miles Ravitz and Mark Levonian: https://www.risk.net/journal-of-risk-modelvalidation/6710566/model-risk-tiering-an-exploration-of-industrypractices-and-principles • What Is Adversarial Machine Learning? Attack Methods in 2021 https://viso.ai/deeplearning/adversarial-machine-learning/ • Relational Generative Adversarial Networks for Graph-constrained House Layout Generation. Nauata, Nelson, Kai-Hung Chang, and Chin-Yi Cheng et al. House-GAN: https://www2. cs.sfu.ca/~mori/research/papers/nauata-eccv20.pdf • Understanding the role of individual units in a deep neural network. Bau, David, Jun-Yan Zhu, Hendrik Strobelt, Agata Lapedriza,Bolei Zhou, and Antonio Torralba https://www.pnas. org/content/117/48/30071 • Stealing Machine Learning Models via Prediction APIs. Tramèr, Florian, Fan Zhang, Ari Juels, Michael Reiter, Thomas Ristenpart EPFL, Cornell, Cornell Tech, UNC https://silver. web.unc.edu/wp-content/uploads/sites/6556/2016/06/ml-poster.pdf

Further reading

• How data poisoning attacks can corrupt machine learning models, Bohitesh Misra. https:// www.ndtepl.com/post/how-data-poisoning-attacks-can-corruptmachine-learning-models • AppCon: Mitigating Evasion Attacks to ML Cyber Detectors. Apruzzese, Giovanni and Andreolini, Mauro and Marchetti, Mirco and Colacino, Vincenzo Giuseppe and Russo, Giacomo. https:// www.mdpi.com/2073-8994/12/4/653 • Mitigating Evasion Attacks to Deep Neural Networks via Region based classification Cao, Xiaoyu and Neil Zhenqiang Gong. https://arxiv.org/pdf/1709.05583.pdf • Knowledge-Enriched Distributional Model Inversion Attacks. Chen Si, Mostafa Kahla, Ruoxi Jia, Guo-Jun Qi;https://openaccess.thecvf.com/content/ICCV2021/papers/ Chen_Knowledge-Enriched_Distributional_Model_Inversion_Attacks_ ICCV_2021_paper.pdf • Practical Attacks against Transfer Learning https://www.usenix.org/system/files/ conference/usenixsecurity18/sec18-wang.pdf • Backdoor Attacks against Transfer Learning with Pre-trained Deep Learning Models. Wang Shuo, Surya Nepal, Carsten Rudolph, Marthie Grobler, Shangyu Chen, and Tianle Chen. https:// arxiv.org/pdf/2001.03274.pdf • Weight Poisoning Attacks on Pretrained Models Kurita Keita, Paul Michel, and Graham Neubig. https://aclanthology.org/2020.acl-main.249.pdf • Failure Modes in Machine Learning. Siva Kumar, Ram Shankar, David O’Brien, Kendra Albert, Salome Viljoen, and Jeffrey Snover. https://arxiv.org/pdf/1911.11034.pdf • Adversarial Robustness Toolbox (ART) v1.9 https://github.com/Trusted-AI/ adversarial-robustness-toolbox • Data Breaches in 2021 and What We Can Learn from Them https://www.titanfile. com/blog/data-breaches-in-2021/ • Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability. Srinivas, Suraj and Francois Fleuret. International Conference on Learning Representations, https:// openreview.net/pdf?id=dYeAHXnpWJ4 • Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures. Fredrikson, Matt, Somesh Jha, and Thomas Ristenpart. https://rist.tech.cornell. edu/papers/mi-ccs.pdf • Gradient-Based Interpretability Methods and Binarized Neural Networks Widdicombe Amy and Simon J. Julier. https://arxiv.org/pdf/2106.12569.pdf • Understand model risk management for AI and machine learning https://www.ey.com/ en_us/banking-capital-markets/understand-model-risk-managementfor-ai-and-machine-learning

43

2 The Emergence of RiskAverse Methodologies and Frameworks This chapter gives a detailed overview of defining and architecting ML defense frameworks that can protect data, ML models, and other necessary artifacts at different stages of ML training and evaluation pipelines. In this chapter, you will learn about different anonymization, encryption, and application-level privacy techniques, as well as hybrid security measures, that serve as the basis of ML model development for both centralized and distributed learning. In addition, you will also discover scenario-based defense techniques that can be applied to safeguard data and models to solve practical industry-grade ML use cases. The primary objective of this chapter is to explain the application of commonly used defense tools, libraries, and metrics available for large-scale ML SaaS platforms. In this chapter, these topics will be covered in the following sections: • Threat matrix and defense techniques • Anonymization and data encryption • Differential Privacy (DP) • Hybrid privacy methods and models • Adversarial risk mitigation frameworks Further, with the use of Adversarial Robustness Toolbox (ART), pysft, Pyhfel, secml , ml_ privacy_meter, tensorflow_privacy, mia, diffprivlib, and foolbox, we will see how to test model robustness against adversarial attacks.

46

The Emergence of Risk-Averse Methodologies and Frameworks

Technical requirements This chapter requires you to have Python 3.8 installed along with the Python packages listed here (with their installation commands), as well as Keras 2.7.0 and TensorFlow 2.7.0: • pip install adversarial-robustness-toolbox • pip install syft==0.2.9 • pip install Pyfhel • pip install secml • git clone https://github.com/privacytrustlab/ml_privacy_meter • pip install -r requirements.txt, pip install -e • pip install diffprivlib • pip install tensorflow-privacy • pip install mia • pip install foolbox

Analyzing the threat matrix and defense techniques In this section, let's look at different defense techniques essential for enterprises to proactively manage threats related to adversarial attacks during the following stages: • Initial research, planning, and system and model design/architecture phase • ML model training and deployment • ML model live in production You will also get learn additional capabilities, expertise, and infrastructure that organizations need to invest in to have a foolproof defense system.

Researching and planning during the system and model design/ architecture phase This phase (Figure 2.1) is related to all actions taken during model design, architectural planning, and conceptualization in which the adversary carries out preliminary investigations, searching to gain knowledge of the victim’s infrastructure, datasets, and models that will enable them to set up their own capabilities for initiating attacks on ML SaaS platforms.

Analyzing the threat matrix and defense techniques

Figure 2.1 – Relevant attack stages during ML model design and development

We see here the large scope of the initial phase, where adversarial actions can be detrimental to our model and architecture conceptualization. Now, let's discuss the different steps adversaries take when trying to perform an attack.

Reconnaissance Reconnaissance is one of the early stages where an adversary actively or passively gathers information to use in later adversarial stages to enable resource development, execute initial access, or lead to the execution of continuous reconnaissance attempts. Some of the associated risks and mitigations of this stage are described in the following list. The best way for the victim to mitigate reconnaissance attempts is to minimize the availability of sensitive information to external entities and employ network content, network flow, file creation, and application log monitoring agents to detect and raise alarms if suspicious activity (such as bots or web crawling) is detected from a single IP source. Let's now describe how reconnaissance can take place: • Active scanning: This step involves scanning operations by adversaries to gather information for targeting. Scanning and search operations (on websites/domains or open technical databases) may be carried out on victim infrastructure via network traffic (with network protocols such as ICMP) by probing mechanisms, or by collecting information through external remote services or public-facing applications. • Gather victim host/identity/organization information: This step involves adversarial activity to gain information related to victims’ administrative data (e.g., name, assigned IP, functionality, IP ranges, domain names, etc.), configuration (e.g., operating system, language, etc.), names of divisions/departments, business operations, and the roles and responsibilities of major employees.

47

48

The Emergence of Risk-Averse Methodologies and Frameworks

• Phishing: This is an action that is often undertaken by an adversary in order to steal frequently used credentials to target their victims. Organizations should take the following prompt action: ‚ Employ antivirus or antimalware along with network intrusion prevention systems (for example, to filter content based on DKIM and SPF, header, referrer, or User-Agent string HTTP/S fields) to automatically remove malicious links and attachments. ‚ Using anti-spoofing mechanisms, providing restricted access to websites that have attachments (.pdf, .docx, .exe, .pif, .cpl, and so on), and enabling email authentication can enable protection against phishing activities. • Search closed sources, open technical databases, websites and domains, and victim-owned websites: These search operations by the adversary can help to retrieve confidential information from reputable private sources (such as databases, repositories, or paid subscriptions to feeds of technical/threat intelligence data). In addition, registrations of domains/certificates; network data/artifacts gathered from traffic and/or scans; business-, department-, and employee-related information from online sites; and social media can all help the attacker gather the information necessary for targeting.

Resource development Resource development is another early phase of adversarial action, where adversaries engage themselves in creating resources to use in subsequent attack stages. Resources may be created, purchased, or stolen to target victims. Let's examine this in more detail: • Public ML artifact retrieval: This is an important action taken by adversaries to retrieve ML artifacts from public sources, cloud storage, public-facing services, and data repositories. These artifacts can reveal information related to the software stacks, libraries, algorithms, hyperparameters, and model architectures used to train, deploy, test, and evaluate ML models. Adversaries can use either the victim’s representative datasets or models to modify and craft the datasets and models and accordingly train proxy ML models tailored to offline attacks, without directly accessing the target model. The best control measures against this that can be adopted by organizations are the following: ‚ Enabling multi-level security rules for the full protection of datasets, models, and artifacts by employing built-in multi-factor authentication schemes ‚ Using cloud security rules and ACLs to provide restricted access to ML data and artifacts • Gathering adversarial ML attack implementation information: Open source implementations of ML algorithms and adversarial attack code (such as CleverHans or ART (https:// researchain.net/archives/pdf/Technical-Report-On-The-CleverhansV2-1-0-Adversarial-Examples-Library-2906240) and Foolbox (https:// arxiv.org/pdf/1707.04131.pdf)) can be misused by attackers. As well as facilitating research, these open source tools can be used to carry out attacks against victims’ infrastructures. In this chapter, we give examples to demonstrate how ART and Foolbox can be used.

Analyzing the threat matrix and defense techniques

• Gaining adversarial ML attack implementation expertise and capabilities: After gaining information on open source attack tools, adversaries can deep dive into research papers and use their own ideas to craft their own attack models and start using them. • Acquiring infrastructure – attack development and staging workspaces: In this phase, adversaries rely on the free compute resources available from major cloud providers (such as AWS, Google Cloud, Google Colaboratory, and Azure) to initiate attacks. The use of multiple workspaces can help them avoid detection. • Publishing poisoned datasets and triggering poisoned data training: This step involves creating poisoned datasets (by modifying source datasets, data, or its labels) and publishing these to compromise victims’ ML supply chains. The vulnerabilities embedded in these ML models using poisoned data are activated later and cannot easily be detected. Strategies that can be employed to protect against poison attacks include leveraging De-Pois (De-Pois: An Attack-Agnostic Defense against Data Poisoning Attacks : https://arxiv. org/pdf/2105.03592.pdf), an attack-agnostic defense framework used to construct mimic models. This framework uses Generative Adversarial Networks (GANs) to enable the training of data with augmentations and the creation of models that behave similarly, in terms of outcome, to the original model. This model can detect the poisoned samples by evaluating prediction differences between the target model and the mimic model. In addition to generating defensive awareness of the aforementioned possible intrusions, enterprisegrade defense frameworks should take into consideration some of the following aspects of security bottlenecks and take appropriate remedial measures: • Establishing accounts: In this phase, external adversaries engage themselves in creating accounts to build a persona across different social media platforms, such as LinkedIn, as well as on GitHub, to impersonate real people. These personas can be used to accumulate public information, set up email accounts, and strengthen public profiles, which will aid in stealing information over the course of time. The best tactic to protect against such actions is to identify any suspicious activity of individuals who claim to work for the organization or have made connection requests to different organizational accounts. • Obtaining capabilities: Here, adversaries rely on stealing, purchasing, or freely downloading malware, licensed software, exploits, certificates, and information related to vulnerabilities. Mitigation actions include the following: ‚ Carefully analyze and detect features and services that are easy to embed and can be associated with malware providers (such as compilers, debugging artifacts, code extracts, or any other offerings related to Malware as a Service (MaaS)). ‚ Malware repository scanning and feature identification can help to blacklist adversaries.

49

50

The Emergence of Risk-Averse Methodologies and Frameworks

Now, let's discuss a defense strategy involving the use of the open source secml library (https:// secml.github.io/class6/, a security evaluation framework) to build, explain, attack, and evaluate security using algorithms such as Support Vector Machine (SVM) and ClassifierRidge (a custom ML Ridge classifier). These types of classification algorithms can be used to detect malware in Android applications and explain ML classifier model's predicted outcomes. In the following code snippet, we have loaded a toy dataset of Android applications, named DrebinRed. The loaded dataset consists of 12,000 benign and 550 malicious samples extracted from Drebin. On training (using a 0.5:0.5 train-test split) the dataset with SVM or the Ridge classifier, we observe the model has a 2% False Positive Rate (FPR) in correctly identifying the benign and malicious samples: repo_url = 'https://gitlab.com/secml/secml-zoo' file_name = 'drebin-reduced.tar.gz' file_path = 'datasets/DrebinRed/' + file_name output_dir = fm.join(settings.SECML_DS_DIR, 'drebin-red') md5_digest = 'ecf87ddedf614dd53b89285c29cf1caf' dl_file_gitlab(repo_url, file_path, output_dir, branch='v' + min_version, md5_digest=md5_digest)

The following output snippet further illustrates the most significant components of the Android malware detector application. secml uses a Gradient * Input gradient-based explanation technique to explain the attributions of different points during the classification phase. The most important features (the top 5) and their relevance (in terms of percentage) help to explain each correct (not a part of the malware component) and corrupted sample, and even this approach/technique to explain attributions on sparse datasets: Explanations for sample 137 (true class: 0) -7.49 suspicious_calls::android/net/Uri;->toString -5.63 suspicious_calls::getSystemService -5.42 api_calls::android/media/MediaPlayer;->start -4.99 used_permissions::ACCESS_NETWORK_STATE -4.55 req_permissions::android.permission.ACCESS_FINE_LOCATION

As ~25% of the relevance is attributed to five features, these features have a larger impact on the classifier being susceptible to adversarial evasion attacks. Leveraging this behavior of the malware detector, a gradient-based maximum-confidence evasion attack can be employed to generate adversarial samples against the classifier. This can trigger an L1-order sparse attack by changing one feature at a time to misclassify outputs as 1 instead of 0 and vice versa. We can trigger attacks such as the one demonstrated in the following code snippet, where feature addition works better to fool the malware classifier than feature removal. Removing features may remove other important components of the model, making it more difficult to misclassify. On the contrary, feature addition is an easy way to fool the model into classifying correct (benign components of the loaded dataset) samples as corrupted.

Analyzing the threat matrix and defense techniques

After adding the adversarial samples, we can trigger the evasion attack with classifier, distance, and other parameters, as shown here: params = {"classifier": clf,     "distance": 'l1',     "double_init": False,     "lb": 'x0', #feature addition, lb=0 for feature removal     "ub": 1,  #feature addition     "attack_classes": 'all',     "y_target": 0,     "solver_params": {'eta': 1, 'eta_min': 1, 'eta_max': None, 'eps': 1e-4} } from secml.adv.attacks.evasion import CAttackEvasionPGDLS evasion = CAttackEvasionPGDLS(**params)

secml determines the model robustness using a Security Evaluation Curve (SEC) by running evasion attacks on the classifier with the L1-order perturbation epsvalue varying between 0 and 28 with a step size of 4. To test the Android malware detector against a greater number of added features, we can run the evasion attack on the security evaluation method, as detailed in the following code snippet: from secml.adv.seceval import CSecEval sec_eval = CSecEval(     attack=evasion,     param_name=param_name,     param_values=param_values) sec_eval.run_sec_eval(adv_ds)

Now, let's plot the SEC: 1. The following code begins the process of getting the SEC: from secml.ml.peval.metrics import CMetricTHatFPR, CMetricTPRatTH

2. Next, get the ROC threshold at which the detection rate should be computed: th = CMetricTHatFPR(fpr=fpr_th).performance_score(y_true=ts.Y, score=score_pred[:, 1].ravel())

3. Finally, use the convenience function to plot the SEC: fig.sp.plot_sec_eval(sec_eval.sec_eval_data, metric=CMetricTPRatTH(th=th),percentage=True, label='SVM', color='green', marker='o') fig.sp.ylabel(r'Detection Rate $(\%)$') fig.sp.xlabel(r"$\varepsilon$")

51

52

The Emergence of Risk-Averse Methodologies and Frameworks

We can see how the SVM classifier is highly vulnerable to adversarial attacks, and particularly sensitive to attacks against the most impactful features. An attack can evade this classifier with a perturbation as small as eps (ε) = 0.1. When we change it to have fewer than 10 features (which are the most important ones), half of the corrupted samples are misclassified as correct ones. In the following figure, Figure 2.2, the chart labeled A shows a detection rate of 97% with an FPR of 20. While the detection rate falls with increasing epsilon (ε), we observe that the fall is very steep for the Ridge classifier (C), while it happens in a step fashion for SVM (B). As the fall is steeper for the Ridge classifier, it is not a better option than SVM, which will exhibit a lower FPR. Make sure to examine the SECs in the following graphs, which provide estimations of the detection rate (%) with ε. The SEC plots help us to conclude that the malware detector ceases to perform with increasing levels of adversarial perturbations.

Figure 2.2 – Malware detection rate and SEC on SVM and the Ridge classifier

Staging refers to actions taken by adversaries to upload, install, and set up capabilities on infrastructures that were previously compromised or rented by them, to target victim networks. Such activities might include setting up web resources to exploit the victim’s browsing website (to steal confidential information) or uploading malware tools to initiate attacks on the victim’s network. There’s no prompt detection technique to avoid this; however, internet scanning tools may reveal the date and time of such attacks.

Initial access Initial access helps an adversary to leverage security weaknesses on public-facing web servers and gain access to a network. This can occur in one of the early stages of development when the model design and the system architecture are still in the development phase. The primary steps to mitigate initial adversarial access include controlling the abuse of credentials via proper management of user account control, issuing valid accounts, enforcing privileged account management practices, defining organization password policies (such as the frequency of password changes), and having in place a systematic user training process and application developer guidance to restrict any illegitimate access to systems.

Analyzing the threat matrix and defense techniques

Let's now explore the different actions that can be taken by adversaries if they are successful in acquiring initial access: • Supply chain compromise: In this step, adversaries compromise different components of a victim’s system (such as GPU hardware, data and its annotations, parts of the ML software stack, or the model) to carry out an attack. The attacker manipulates development tools, environments, code repositories, open source dependencies, and software update/distribution mechanisms; compromises system images; and replaces legitimate software (using different versions) to successfully compromise the victim’s systems. Organizations should mitigate tampering activities by employing techniques to verify distributed binaries (hash checking), along with using tools to scan malicious signatures and engaging in physical hardware inspection. Even using patch management processes and vulnerability scanning tools to scan dependencies, unnecessary features, components, and files can help prevent adversarial access by enforcing strong testing rules prior to deployment. • Drive-by compromise: This involves the exploitation of the victim’s browser (where the adversary may inject malicious code with JavaScript, iFrames, and cross-site scripting, or help to serve malicious ads) and application access tokens, and can be mitigated by doing the following: ‚ Using browser sandboxes, deploying virtualization measures, and applying micro-segmentation logic. We can limit attacks by isolating applications and web browsers by creating and defining zones in data centers and cloud environments (to isolate workloads). This is one of the strongest ways to limit network traffic and client-side exploitation. ‚ Employing defense tools such as Enhanced Mitigation Experience Toolkit (EMET), network intrusion detectors with SSL/TLS inspection, firewalls, proxies, ad blockers, and script-blocking extensions can help to control exploitation behavior, block bad domains and ads, and prevent the execution of JavaScript. • Exploit public-facing applications: As this technique involves adversaries accessing, exploiting, and bringing down public-facing services such as databases, Server Message Block (SMB), and other applications with open sockets, the main remediation tasks lie with the security architects in designing and deploying application in isolation and sandboxing (limiting exploited targets’ access to other processes), web application firewalls, network segmentation (segmenting public interfaces on a demilitarized zone or a separate hosting infrastructure), and privileged account management (adhering to the principle of least privilege for accessing services) to limit attack traffic. In addition, regular software updates, patch management, vulnerability scanning tools, and application log and network flow monitoring tools (using deep packet inspection to discover artifacts of malicious traffic, such as SQL injection) can be used to detect improper input traffic and raise alerts.

53

54

The Emergence of Risk-Averse Methodologies and Frameworks

• External remote services: This method involves adversaries discovering external-facing remote services, such as VPNs or Citrix, and finding routes to connect to internal enterprise network resources from these external locations. To alleviate such risks, security teams should be extra cautious: ‚ Disable or block unnecessary remotely available services, limit access to resources over the network (by prompting the use of managed remote access systems such as VPNs), enable multi-factor authentication, and allow network segmentation (through the use of network proxies, gateways, and firewalls). ‚ Facilitate log monitoring related to applications, session logons, and network traffic to detect authenticated sessions, discover unusual access patterns and times of operation, and assist in detecting adversarial behavior. • Hardware additions: Introducing additional computer accessories or hardware components in the network can permit adversaries to undertake passive network tapping, network traffic modification through adversary/man-in-the-middle attacks, keystroke injection, or kernel memory reading via Direct Memory Access (DMA). To avoid this, asset management systems should be used to do the following: ‚ Limit access to resources over the network, limit installation of hardware, and employ hardware detectors or endpoint sensors to discover additions of USB, Thunderbolt, and other external device communication ports in the network. ‚ Further, to safeguard adversarial copying operations on removable media, organization policies should forbid or restrict removable media.

ML model access Adversaries may gain access to an ML model legitimately through four different techniques that we’ll examine in this section. The best mitigation strategy is to include sufficient security rules (cloud- and token-based authorization schemes) to enable authentic access to model APIs, ML services, physical environments, and ML models, as follows: • Model inference API access: This involves restricting adversary access through the use of APIs to discover the ML model's ontology or family. The corresponding defense action is to limit the introduction of test data into the target systems by single agents to prevent issues related to evading ML models and eroding ML model integrity. As we saw in Chapter 1, there is a possibility of an evasion attack where attackers try to evade detection by hiding the content of spam and malware code. The same kind of attack is possible by using model inference APIs to misclassify examples (individual data samples) as legitimate. • ML-enabled product or service limit: This method limits indirect access to ML models to hide information related to the model’s inference from its logs and metadata. Indirect access can originate from any product or service built by adversaries to gain access to the victim’s ML model.

Analyzing the threat matrix and defense techniques

• Physical environment access: To eliminate the scope of adversarial attacks in data engineering pipelines, enable data validation checks across multiple layers of input data ingestion, preprocessing, and feature engineering. • Full ML model access: To prevent the adversary from gaining full access to the model, the best possible defense strategy is to incorporate privacy-preserving ML techniques for data aggregation and training to enable protection from adversarial white-box attacks. Otherwise, these attacks allow the adversary to gain complete information on the model's architecture, parameters, and class ontology and exfiltrate the model to execute offline attacks once the model is running live with production data. One of the preferred mechanisms for defending against white-box (model parameters) and black-box (output predictions) attacks is to train the model and evaluate the accuracy of the attacks. If we use ML Privacy Meter (a Python library that helps to quantify risk in ML models) prior to releasing models, we can test the models by initiating attacks and determine the model’s tendency to leak information. This helps us to act as adversaries and detect whether each data instance actually belongs to the required dataset. Training the model against such attacks can be accomplished in two ways: • White box: By observing the model’s parameters when the model is deployed in an untrusted cloud or takes part as one of the participating models in a Federated Learning (FL) setup • Black box: By fetching the model’s predictions from the output During model training and evaluation, the attack accuracy is evaluated on a validation/test set. Moreover, the accuracy is only considered on the best-performing attack model of all attack models. In Figure 2.3, we can see three plots that illustrate the probabilities (in the range of 0 to 1) of responses that actually respond to an attack based on membership status.

Figure 2.3 – Overall privacy risk (left) and privacy risk for classes 24 and 35 (center and right)

With the following code, we are able to detect the trade-off between the model’s achieved accuracy (correct identification of members in the training dataset) and error (incorrect identification or false positives). The following code snippets show how to invoke attack models and verify the probability of each member getting discovered through an adversarial attack: import ml_privacy_meter import tensorflow as tf

55

56

The Emergence of Risk-Averse Methodologies and Frameworks

datahandlerA = ml_privacy_meter. utils.attack_data.attack_data(dataset_path=dataset_path, member_dataset_path=saved_path,             batch_size=100, attack_percentage=10, input_shape=input_shape,             normalization=True)

The method for starting the attack is shown as follows, where the first two parameters specify the target training model and target attack model, the third and fourth parameters denote the training and attack datasets, while the remaining parameters are used to specify layers, gradients, model name, and more: attackobj = ml_privacy_meter.attack.meminf.initialize(     target_train_model=cmodelA,     target_attack_model=cmodelA,     train_datahandler=datahandlerA,     attack_datahandler=datahandlerA,     layers_to_exploit=[26],     gradients_to_exploit=[6],     device=None, epochs=10, model_name='blackbox1') attackobj.train_attack() attackobj.test_attack()

In addition, we are also able to see the privacy risk histograms for each output class. While the first histogram shows that there is an increase in risk at every step for training data members, the privacy risk for class 24 is more uniformly distributed between 0.4 and 1.0. On the other hand, the privacy risk for class 35 is more skewed between 0.85 and 1.0 for most of the training members. The overall privacy risk histogram is an average aggregation of all privacy risk classes.

Model training and development This phase, as shown in Figure 2.4, pertains to all actions during model training and deployment where the adversary has started to extract model and system parameters and constraints to their advantage, evading defense frameworks in the target environment and preparing the ground for continued attacks.

Analyzing the threat matrix and defense techniques

Figure 2.4 – Different attack stages during model training and deployment

Execution Different command and script interpreters can be used by adversaries to execute commands, scripts, or binaries by embedding them as payloads to mislead and lure victims. Container administration commands and container deployments (with or without remote execution) can help adversaries to execute commands within a container and facilitate container deployment in an environment to evade defenses. Adversaries can also be prompted to schedule jobs for the recurrent execution of malicious code or force users to undertake specific actions (for example, opening a malicious file) to execute malicious code. Execution can be accomplished in the following ways. • User execution – unsafe ML artifacts: Adversaries may develop unsafe ML artifacts (without adhering to serialization principles) that can enable them to gain access and execute harmful artifacts. • Exploitation for client execution: Adversaries may exploit vulnerabilities in client software by leveraging browser-based exploitations, inter-process communication, system services, and native APIs (and their hierarchy of interfaces) in their favor to enforce the execution of malicious content. • Software deployment tools: After gaining access to an enterprise’s third-party software, it becomes easier for adversaries to gain access to and wipe information from hard drives at all endpoints.

57

58

The Emergence of Risk-Averse Methodologies and Frameworks

In addition to the commonly used defense mechanisms that we examined in the first phase, defense strategies should focus on enforcing limits on harmful operations such as the following: • Limiting access to resources over the network (enabling authenticated local and secure port access to aid communication with APIs over TLS), privileged account management (not allowing containers or services to run as root), behavior prevention on endpoints, execution prevention (by using application control logic and tools such as Windows Defender Application Control and AppLocker, or software restriction policies), code signing, application isolation, and sandboxing. • When adopting system-level security measures, DevOps and security teams should wisely use and manage the operating system's configuration management (forcing scheduled tasks to run under authenticated accounts instead of allowing them to run under system services). • Active Directory configuration to reinforce Group Policy enforcement to isolate and limit access to critical network elements. To further curb execution operations practiced by adversaries, the following persistence actions should be enforced by system administrators to prevent unwanted intrusion.

Persistence Here is a list of the actions to prevent intrusion: • Preventing the execution of code that has not been downloaded from legitimate repositories (which means ensuring only non-vulnerable applications are allowed to have setuid and setgid bits set) • Privileged account management (don't allow users to be unnecessarily added to the admin group) • Restricting file and directory permissions • Restricting library loading through permissions and audits • User account control (using the highest enforcement level, leaving no room for bypassing access control)

Defense evasion Defense evasion comprises all operations used by adversaries that enable them to evade detection and the existing security controls. Adversaries are powerful enough to break through the victim’s systems with the untrusted activities listed in the following table:.

Analyzing the threat matrix and defense techniques

Item No.

Mode of Defense Evasion

1

Uninstall/disable security software; elevate privilege rights; evade virtualizations/sandboxes; hide the presence of programs, files, network connections, services, and drivers; execute malicious code; practice reflective code loading into a process to conceal the execution of malicious payloads; obfuscate and encrypt code/data (use XSL files to embed scripts).

2

Trusted processes can work in the adversaries’ favor to help them hide, conceal their malware, and manipulate feature artifacts in such a manner that they appear to be legitimate actions. Bypass and impair existing signature-based defenses by either running proxying execution of malicious code or deploying a new container image that has malware without any security, firewalls, network rules, access controls, or user limitations.

3

Carry out process or template injection; execute scripts to hijack code flow; modify authentication processes, cloud compute infrastructure, registries, system images, file and directory permissions, network boundary bridging (taking control of network boundary devices and allowing the passage of prohibited traffic), and Active Directory data (including credentials and keys) in the target environment. Table 2.1 – Different modes of ML model defense evasion

As defense evasion relies on the abuse of system failures, stringent security measures should be put in place to close all loopholes. Most of the defensive tactics described previously apply here. In addition, prevention techniques that need greater attention are the following: • Deploying network monitoring tools to filter network traffic. • Deploying antivirus and antimalware detectors for monitoring. • Employing endpoint behavioral anomaly detection techniques to stop the retrieval and execution of malicious payloads. • Operating systems should be configured such that administrator accounts are not enumerated and do not reveal account names. • When not in use, active macros and content should be removed from programs to mitigate risks arising from the execution of malicious payloads. • Unnecessary scripts should be blocked, passwords should be encrypted, and boot images of network devices should always be cryptographically signed.

59

60

The Emergence of Risk-Averse Methodologies and Frameworks

Discovery The discovery phase helps adversaries gain knowledge of the victim’s account, operating system, and configuration (as listed in Table 2.2) for systematic planning prior to invading the victim’s systems. Item No.

Discovery Mechanisms

1

Browser data (for information related to banking sites, interests, and social media), a list of open application windows, network information (configuration settings, such as IP and/or MAC addresses), programs, and services (peripheral devices, remote programs, and file folders)

2

System (location, time, and owner), cloud infrastructure (instances, virtual machines, and snapshots, as well as storage and database services), dashboards, orchestration/ container services, domain trust relationships, Group Policy settings (identifying paths for privilege escalation), and other information related to connection entry points

3

Quickly altering the malware and disengaging from the victim’s system to hide the core functions of the implant Table 2.2 – Different discovery mechanisms

Due to adversarial pre-planning, the foremost defense steps include the following: • Enable monitoring of all events together, without viewing any suspicious action in isolation. Sequential information discovery and collection are part of a larger attack plan, such as lateral data movement or data corruption. • Discovery and proof collection (using screenshots and keyboard inputs), which could help in the process of reconciliation to justify acts of data stealing.

Collection After the discovery of data sources, an adversary will be enthusiastic to collect and steal (exfiltrate) confidential sensitive information either manually or through automated means. Common target sources include various drive types, removable media, browser sessions, audio, video, emails, cloud storage, and configuration. Other sources of information include repositories, local systems, network shared drives, screenshots, audio/video captures, and keyboard input.

ML model live in production This phase shown in Figure 2.5 relates to attacks performed on ML models and ML SaaS platforms at scale. Here, the adversary is fully equipped with full system-level information, data, and proxy models that are essential for them to execute attacks and impact the victim’s business operations.

Analyzing the threat matrix and defense techniques

Figure 2.5 – Different attack stages when ML models are live in production

This phase involves the manipulation, interruption, or destruction of data by adversaries to compromise system integrity and disrupt business operations. Attacks can range from data tampering activities to techniques involving crafting adversarial data, to restrict ML models from yielding the right predicted results.

Staging ML model attacks Once the discovery and collection phases are over, the adversary leverages their new knowledge to plan and attack the system intelligently (online or offline) by training proxy models and triggering poisoned attacks by injecting adversarial inputs into target models. Target models act as important resources in staging attack operations: • Collecting ML artifacts: Once the adversary has successfully gathered information on the ML artifacts that exist on the network, they may exfiltrate them for immediate use in staging an ML attack. To mitigate risks involving the collection of model artifacts, note the following: ‚ The ML model training methodology should encompass all privacy-preserving techniques. ‚ In addition, all ACL rules (of related microservices) and encryption logic should frequently be audited and revisited. • Training proxy ML models: Adversaries often train ML models to create proxy models and trigger attacks on the target models in a simulated manner. This offline simulation helps them to gain information from target models and validate and initiate attacks without any need for higher-level access rights or privileges. • Replicating ML models: Here, an adversary replicates the target model as a separate private model, where the target model’s inferences are recorded as labels for training the offline private version of the model. This kind of operation involves repeated queries to the victim’s model inference APIs. To throttle repeated requests from the same IP, defenders can use rate limiting and blacklist source IPs to limit such queries and consequently the number of inferences extracted.

61

62

The Emergence of Risk-Averse Methodologies and Frameworks

• Poisoning ML models: Adversaries can generate poisoned models from the previous steps by injecting poisoned data for training or carefully retrieving the model inferences. In fact, poisoned models are a persistent artifact at the victim’s end that the adversary can use to their advantage to insert and trigger backdoor triggers with completely random patterns and locations to evade backdoor defense mechanisms, making it difficult for monitoring tools to detect issues and raise alerts. One mechanism of defense against poison attacks is to use spectral signatures (https:// proceedings.neurips.cc/paper/2018/file/280cf18baf4311c92aa5a0 42336587d3-Paper.pdf), which should detect the deviations in average value created by the minority sub-population of poisoned inputs. This algorithm depends on the fact that the means of two sub-populations are fairly different in comparison to the overall variance of the populations, owing to the fact these two sub-populations show either the presence or absence of correctly labeled samples or corrupted samples. In such a scenario, one population sub-group containing mislabeled corrupted inputs can be identified using Singular Value Decomposition (SVD) and removed. This algorithm performs efficiently against back-door attacks carried out on real image samples and state-of-the-art neural network architectures. In the following snippet, you can see a SpectralSignature defense employed on a Keras classifier, which returns a report (a dictionary containing an index of keys and values as the outlier score of suspected poisons) and is_clean_lst, denoting whether each data point in the training data is clean or poisoned: defence = SpectralSignatureDefense(classifier, x_train, y_train,  batch_size=128, eps_multiplier=1, expected_pp_ poison=percent_poison) report, is_clean_lst = defence.detect_poison(nb_ clusters=2,                                           nb_ dims=10,reduce="PCA") pp = pprint.PrettyPrinter(indent=10) pprint.pprint(report) is_clean = (is_poison_train == 0) confusion_matrix = defence.evaluate_defence(is_clean)

• Verifying attack: This action helps an adversary to plan, prepare, and verify planned attacks based on the suitability of the time chosen and the availability of the victim’s physical or virtual environments. Mitigation strategies against this type of attack are difficult to implement as adversaries can leverage inference APIs with limited queries or create offline versions of the victim’s target model. This leads to increased API billing costs for the victim, as API costs are directly borne by them. • Crafting adversarial data: Adversarial data that serves as input to ML models can be misclassified, increase energy consumption, or make the model prone to failure. White-box optimization, black-box optimization, black-box transfer, and manual modification are population algorithms that can help adversaries to generate input data samples to evade ML architecture. The key purpose of adversaries is to disrupt the ML models by challenging their integrity.

Analyzing the threat matrix and defense techniques

Command-and-control requests These operations help adversaries move one step closer to extracting useful information from the victim’s network, as listed in the following table. Item No.

Modes of Control Operations

1

Use removable media (for example, by initiating communication between the host and other compromised services on the target network), utilize uncommonly used portprotocol pairs, or deploy authenticated web services.

2

Mix the commands with existing traffic, encode/obfuscate the requests over encrypted/ fallback (when the primary channel is inaccessible)/multi-stage obfuscation channels to trigger commands, and dynamically establish connections with the target infrastructures. Table 2.3 – Different modes of control operations

However, the adversary is clever enough to do this in such a way that the existing defense strategies on the target network will not raise an alarm. Therefore, some appropriate defense techniques are as follows: • Enabling the adoption of specially crafted adversarial protocol tunnels, hiding open ports through traffic signaling, and making use of proxies can help to avoid direct communication between a command-and-control server and the victim’s network. • The use of different application- and network-level authentication and application sandboxing mechanisms, discussed previously, is highly recommended. • Additionally, network segmentation by properly configuring firewalls for existing microservices, databases, and proxies to limit outgoing traffic is essential. • Only authorized ports and network gateways should be kept open for hosts to establish communication over these authorized interfaces.

Exfiltration Data exfiltration can be carried out by adversaries (as listed in Table 2.4) over the network, after data collection, encryption, compression, and packaging. Data can be packaged and compressed to different-sized blocks before being transmitted out of the network using a command-and-control channel or alternative channel strategies. Item No.

Modes of Exfiltration

1

Automated exfiltration (unauthorized transfer of information collection), exfiltration over alternate protocols (relying on different protocols, such as FTP, SMTP, HTTP/S, DNS, or SMB instead of the existing command-and-control channel), and exfiltration over an existing command-and-control channel (over time for defense evasion)

63

64

The Emergence of Risk-Averse Methodologies and Frameworks

2

Network medium (Wi-Fi connection, modem, cellular data connection, Bluetooth or another Radio Frequency (RF) channel, etc.)

3

Physical medium (removable drive) or web service (SSL/TLS encryption)

4

Scheduled transfers that move data to cloud accounts Table 2.4 – Different modes of exfiltration

The most common and easiest ways to carry out exfiltration attacks are by doing the following: • Inferencing ML model APIs for exfiltration: ML model inference API access is the primary means for adversaries to look for ways to exfiltrate/steal private information from the model’s inference APIs. To mitigate exfiltration risks, the following defense actions are mandatory: ‚ Private data should be trained using application-level privacy techniques or by making the best use of hybrid security measures (application- and transport-level security) to protect against leakage of Personally Identifiable Information (PII). ‚ Data Loss Prevention (DLP) APIs can be used to detect and block the transfer of sensitive data over unencrypted protocols. ‚ Network intrusion detection and prevention systems can be used with network signatures to monitor and block malware traffic. ‚ Restricting web content access by using web proxies can help to minimize unauthorized external access. • Evading ML models: Adversaries can use traditional cyberattacks where adversarial actions can evade ML-based virus/malware detection. • Denial of Service (DDoS): Here, adversaries are driven by the objective to bring down ML systems in production by issuing a flood of requests. The requests may be computationally intensive, requiring large amounts of memory, GPU resources, and processing cycles, and can overload the productionized systems, which may become too slow to respond. • Spamming ML systems: Here, the adversaries increase the number of predictions in the output by spamming the ML system with false and arbitrary data. This impacts the ML team at the victim’s organization, who end up spending extra time deducing the correct inferences from the data. • Eroding ML model integrity: Adversaries may degrade the target model’s performance with adversarial data inputs to erode confidence in the system over time. This can lead to the victim organization wasting time and money attempting to fix the system.

Anonymization and data encryption

• Harvesting cost: This is similar to a DDoS attack, where adversaries engage themselves in targeting the victim’s ML services to increase the compute and running costs by bombarding the system with false and specially crafted queries. Sponge examples are specially crafted adversarial inputs, designed to increase processing speed and energy consumption, which can degrade the overall performance of the victim’s systems. • ML IP theft: Here, adversaries steal intellectual property from ML models, training and evaluation datasets, and their related artifacts with the objective of causing economic harm to the victim organization. This act enables adversaries to have unlimited access to their victim's service free of cost, avoiding the MLaaS provider’s API charges. • System breakdowns: Other than the commonly used mechanisms, impact strategies (the third attack strategy shown in Figure 2.5) are mainly targeted at systems in production and include a variety of irrecoverable data destruction mechanisms such as overwriting files and directories with random data, manipulating data, defacement, wiping data, corrupting firmware, large-scale data encryption on target systems to disrupt the availability of system and network resources, commands to stop the service, system shutdown/reboot, and resource hijacking with the objective of bringing down the victim’s system resources. The ideal way to mitigate risks related to impacts is to follow these best practices: • Have a data backup process to protect against any data loss/modification attempts. • Have model robustness test strategies in place by thoroughly testing ML models against sponge attacks. • Worst-case or low threshold boundaries to validate model robustness can also aid in detecting adversarial attacks, where system-level degradations in performance are a symptom of inputs from external sources not designed for the system. Up to now, we have discussed the attack threat matrix, which we first saw in Chapter 1, Figure 1.13, and different defense mechanisms available for different types of adversarial attacks. Now let's look into the data anonymization and encryption techniques available to protect sensitive data.

Anonymization and data encryption Due to the possibility of different attacks and threats, organizations have become more responsible about safeguarding the data rights of their employees. The Data Breach Survey of 2019 revealed that 79% of CIOs were convinced that company data was put at risk in the previous year because of actions by their employees (https://www.grcelearning.com/blog/cios-increasinglyconcerned-about-insider-threats). The data security practices of as many as 61% of employees put the company at risk, which led organizations to adopt best practices related to data anonymization. Some of the practices that organizations should follow to comply with GDPR and other regulations will be discussed in this section.

65

66

The Emergence of Risk-Averse Methodologies and Frameworks

Data anonymization or pseudo-anonymization needs to be carried out on PII, which mainly includes names, ages, Social Security Numbers (SSNs), credit card details, bank account numbers, salaries, mobile numbers, passwords, and security questions. In addition to this, company policy and database administrators can define extra processes before the application of anonymization techniques. Now, let's look at some of the most commonly used techniques for data anonymization.

Data masking This technique hides and protects the original data by generating mirrored versions of it at random and then shuffling it with the original version of the data. There are five primary types of masking measures that make it difficult for the attacker to decipher the original data: • Deterministic data masking: This process allows the replacement of any columnar value with a specific value in any location of the table – be it the same row, the same database/schema, or between instances/servers/database types. It takes into consideration similar settings to generate replacement global salt keys (these are cryptographic elements that hash the data for security; for example, a website’s cookies). For example, XYZ can be replaced with ABC. • Dynamic Data Masking (DDM): The objective of this masking technique is to mask real-time production-grade data such that live data streams are modified without the data generator/ requestor having access to the sensitive data. It can be used by setting the central data masking policy to mask sensitive fields with full or partial masking functions, along with random masking for numeric data. It also finds heavy usage in simple transact SQL commands (one or more SQL commands grouped together, that can be committed to a database as a single logical unit or rollback) SQL commands (for example, on SQL Server 2016 (13.x) and Azure SQL Database). Now, let’s look at an example of how masking is done in Azure: Email varchar (100) MASKED WITH (FUNCTION = 'email ()') NULL

Here, the Email method uses masking to expose only the first letter of an email address and the constant suffix .com, producing the following: [email protected]. However, dynamic masking cannot be applied to encrypted columns, file streams, COLUMN_SET, or computed columns that have no dependency on any other columns with a mask. • On-the-fly data masking: This process is common when data from development environments is masked without the use of a staging environment due to factors including insufficient extra space, or under the constraint that the data must be migrated to the target environment. This masking technique is used in Agile development processes where Extract, Transform, Load (ETL) is directly able to load the data into the target environment without creating backups and copies. However, the general recommendation is to refrain from using this technique widely (other than in the initial stages of the project) to avoid risks related to compliance and security issues.

Anonymization and data encryption

• Static data masking: This method triggers data masking on the database copy with a complete replacement of the original data, without leaving any room to recover the original data. It is a recommended procedure for development, testing, analytics, and troubleshooting purposes as it allows the storage of masked data with a clear separation between production and dev/ test setups, thereby enabling compliance and security conformance. For example, a username and email address can be masked from Julia Gee to NULL Fhjoweeww and andwb@ yahoo.com to [email protected] respectively. • Synthetic data: Synthetic data is a data anonymization technique employed to preserve the statistical properties of the original dataset, with a considerable margin of variable privacy gain and unpredictable utility loss. The increased privacy provided by this method offers protection against privacy-related attacks and prevents the re-identification of individuals. The synthetic data generated from generative models (for example, using deep learning techniques to generate deep fakes, where synthetic data recreates fake images resembling the originals) ensures high utility by enabling similar inferences as the original data.

Data swapping This procedure shuffles and rearranges data to completely break the similarity between the original and the resultant datasets. There are three popular data-swapping techniques: • K-anonymity • L-diversity • T-closeness The techniques can all be used to make the deanonymization of data difficult for any intruder.

Data perturbation This data anonymization principle adds noise to numerical data in databases to ensure its confidentiality. The process of adding or multiplying random noise additive, multiplicative, or random noise (used in Gaussian or Laplace distribution) helps to distort data, protecting it from being parsed by an attacker.

Data generalization This method makes the data less identifiable by allowing you to remove certain ranges or portions of data (for example, outliers) from the database. One example is replacing age 45 years with