Running Windows Containers on AWS: A complete guide to successfully running Windows containers on Amazon ECS, EKS, and AWS Fargate 1804614130, 9781804614136

Scale up your Windows containers seamlessly on AWS powered by field-proven expertise and best practices on Amazon ECS, E

336 149 16MB

English Pages 212 [213] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Running Windows Containers on AWS: A complete guide to successfully running Windows containers on Amazon ECS, EKS, and AWS Fargate
 1804614130, 9781804614136

Table of contents :
Cover
Title Page
Copyright and Credits
Dedications
Contributors
Table of Contents
Preface
Part 1: Why Windows Containers on Amazon Web Services (AWS)?
Chapter 1: Windows Container 101
Why are Windows containers an important topic?
How does Windows Server expose container primitives?
How Windows Server implements resource controls for Windows containers
Understanding Windows container base images
Enumerating the Windows container image sizes
Delving into Windows container licensing on AWS
Summary
Further reading
Chapter 2: Amazon Web Services – Breadth and Depth
Why AWS for Windows containers?
Understanding how AWS Nitro impacts container performance
Learning about AWS container orchestrators
Summary
Part 2: Windows Containers on Amazon Elastic Container Service (ECS)
Chapter 3: Amazon ECS – Overview
Technical requirements
Amazon ECS – fundamentals
Amazon ECS – task networking
Deploying an Amazon ECS cluster with Terraform
Why Terraform?
Deploying the AWS services
Summary
Chapter 4: Deploying a Windows Container Instance
Technical requirements
Amazon ECS-optimized Windows AMIs
Amazon ECS agent
Right-sizing a Windows container instance
Storage
Processor
Memory
Network
Deploying a Windows container instance with Terraform
Deploying security groups
Summary
Chapter 5: Deploying an EC2 Windows-Based Task
Technical requirements
Understanding a task definition
Task placement strategies
Task placement constraints
Setting up AD integration
Windows containers and gMSA integration
Setting up persistent storage
Scheduling an EC2 Windows-based task with Terraform
Deploying an EC2 Windows-based task definition
Deploying an ECS service
Deploying an Application Load Balancer
Summary
Chapter 6: Deploying a Fargate Windows-Based Task
Technical requirements
AWS Fargate overview
Process isolation mode
Hyper-V isolation mode
Planning for serverless Windows containers
Fargate Windows-based task start-up time
Fargate Windows-based task image pull time
AWS Fargate Windows-based task use cases
Scheduling a Fargate Windows-based task definition with Terraform
Deploying a Fargate Windows-based task definition
Deploying an ECS service
Deploying an ALB
Summary
Part 3: Windows Containers on Amazon Elastic Kubernetes Service (EKS)
Chapter 7: Amazon EKS – Overview
Amazon EKS – fundamentals
Control plane
Data plane
Amazon VPC CNI for Windows
Amazon EKS and Windows support
Personal thoughts
Summary
Chapter 8: Preparing the Cluster for OS Interoperability
Setting up the VPC CNI plugin for Windows support
Avoiding pod-scheduling disruption
Using nodeSelector to avoid pod-schedule disruption
Using taints and tolerations to avoid pod schedule disruption
Dynamically scaling out Windows pods
Summary
Chapter 9: Deploying a Windows Node Group
Technical requirements
Amazon EKS node groups
Amazon EKS-optimized Windows AMIs
Consuming an EKS-optimized Windows AMI using Terraform
Understanding the EKS Windows bootstrap
Working with persistent storage using CSI drivers
SMB CSI driver high-level overview
Managing persistent volumes on Kubernetes
Deploying an Amazon EKS cluster with Windows nodes using Terraform
Creating security groups
Creating an OpenID Connect endpoint and IAM roles for the cluster
Creating instance roles for Windows and Linux node groups
Enabling VPC CNI Windows support
Using ConfigMap to add Kubernetes permissions (RBAC) to a node level
Creating a launch template to launch and bootstrap Windows and Linux Amazon EC2 nodes
Creating an Auto Scaling group
Summary
Chapter 10: Managing a Windows Pod
Technical requirements
Exploring Windows host and Pod resource management
Pod memory management
Pod CPU management
Host CPU management
System resource reservations
Understanding the Runtime Class use case
Understanding Active Directory integration on Kubernetes
Credential specs
Deploying a Windows Pod on Amazon EKS
Connecting to the Amazon EKS cluster
Deploying the Windows Pod
Summary
Part 4: Operationalizing Windows Containers on AWS
Chapter 11: Monitoring and Logging
Implementing LogMonitor
Implementing log forwarding
The awslogs driver as a log processor
Fluent Bit as a log processor
Using Amazon CloudWatch Container Insights
Summary
Chapter 12: Managing a Windows Container's Image Life Cycle
Understanding Microsoft Patch Tuesday
Security patch compliance on Windows container images
Rebuilding your container image frequently
Summary
Chapter 13: Working with Ephemeral Hosts
The idea behind ephemeral hosts
Why custom AMIs
Building a custom AMI pipeline
Summary
Chapter 14: Implementing a Container Image Cache Strategy
Why is implementing a container image cache strategy important?
Container runtime pull throttling
Layer extraction is a serial operation
Layer extraction is CPU/disk intensive when Windows Defender antivirus is enabled
Speeding up a Windows container startup time
Extending EC2 Image Builder with custom components
Summary
Chapter 15: AWS – Windows Containers Deployment Tools
Deploying an Amazon EKS cluster with eksctl
Containerizing Windows applications with AWS App2Container
Deploying Windows containers with AWS Copilot
Summary
Index
Other Books You May Enjoy

Citation preview

Running Windows Containers on AWS

A complete guide to successfully running Windows containers on Amazon ECS, EKS, and AWS Fargate

Marcio Morales

BIRMINGHAM—MUMBAI

Running Windows Containers on AWS Copyright © 2023 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Group Product Manager: Mohd Riyan Khan Publishing Product Manager: Suwarna Rajput Senior Editor: Runcil Rebello Technical Editor: Irfa Ansari Copy Editor: Safis Editing Project Coordinator: Ashwin Kharwa Proofreader: Safis Editing Indexer: Rekha Nair Production Designer: Jyoti Chauhan Marketing Coordinator: Agnes D’souza First published: April 2023 Production reference:1240323 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-80461-413-6 www.packtpub.com

To my sons, Christian and Matheus, for showing me how talent and creativity evolve. To my wife, Amanda, for being my loving partner throughout our joint life journey. – Marcio Morales

Contributors About the author Marcio Morales is a principal specialist solution architect at AWS, focusing on container services such as Amazon ECS, EKS, and AWS Fargate, helping customers modernize legacy infrastructure to modern cloud-native solutions. Before this, Marcio spent his entire career as a Microsoft consultant, implementing and migrating solutions such as Active Directory, Exchange Server, Hyper-V, System Center, and Azure. Additionally, during the night shifts, Marcio worked for 8 years as a Microsoft Certified Trainer (MCT), teaching more than 2,000 individuals with official Microsoft Official Courseware (MOC) on various topics. Marcio holds more than 30 certifications, including Microsoft, AWS, Terraform, and Kubernetes certifications. He also has a bachelor’s degree in computer networking and an MBA in network architecture and cloud computing. I want to thank the people who have been close to me and supported me, especially my wife, Amanda, and my parents.

About the reviewer Joshua Brewer is currently a cloud support engineer at Amazon Web Services. Before joining AWS, he was an active-duty member of the United States Air Force, where he specialized in virtual desktop infrastructure. He has more than five years of virtualization/containerization experience. He gained an MSc in cloud computing architecture from the University of Maryland Global Campus. Josh is responsible for helping enterprise organizations successfully implement containerized workloads in AWS. I am grateful for what I get to do on a daily basis. Since I first got my feet wet in the tech industry after joining the military, I have been on a constant journey where I have had to learn, adapt, and repeat. I am thankful for all the individuals who have aided my development during this journey. I am also extremely thankful for my family, who have supported me every step of the way.

Table of Contents Prefacexiii

Part 1: Why Windows Containers on Amazon Web Services (AWS)?

1 Windows Container 101

3

Why are Windows containers an important topic? 3 How does Windows Server expose container primitives? 4 How Windows Server implements resource controls for Windows containers6

Understanding Windows container base images

7

Enumerating the Windows container image sizes 8

Delving into Windows container licensing on AWS 8 Summary9 Further reading 9

2 Amazon Web Services – Breadth and Depth Why AWS for Windows containers? Understanding how AWS Nitro impacts container performance

11 13

11

Learning about AWS container orchestrators 15 Summary18

viii

Table of Contents

Part 2: Windows Containers on Amazon Elastic Container Service (ECS)

3 Amazon ECS – Overview Technical requirements Amazon ECS – fundamentals Amazon ECS – task networking

21 21 22 27

Deploying an Amazon ECS cluster with Terraform

30

Why Terraform? Deploying the AWS services

30 35

Summary35

4 Deploying a Windows Container Instance Technical requirements Amazon ECS-optimized Windows AMIs Amazon ECS agent Right-sizing a Windows container instance

37 38 39 42

Storage43

37

Processor45 Memory46 Network46

Deploying a Windows container instance with Terraform

47

Deploying security groups

47

Summary51

5 Deploying an EC2 Windows-Based Task Technical requirements Understanding a task definition

53 54

Task placement strategies Task placement constraints

55 57

Setting up AD integration

57

53

Windows containers and gMSA integration

59

Setting up persistent storage Scheduling an EC2 Windows-based task with Terraform

62

Deploying an EC2 Windows-based task definition

66 66

Table of Contents Deploying an ECS service Deploying an Application Load Balancer

67 68

Summary72

6 Deploying a Fargate Windows-Based Task Technical requirements AWS Fargate overview

73 74

Process isolation mode Hyper-V isolation mode

75 76

Planning for serverless Windows containers

77

Fargate Windows-based task start-up time Fargate Windows-based task image pull time

77 78

73

AWS Fargate Windows-based task use cases 79 Scheduling a Fargate Windows-based task definition with Terraform 79 Deploying a Fargate Windows-based task definition Deploying an ECS service Deploying an ALB

80 81 82

Summary85

Part 3: Windows Containers on Amazon Elastic Kubernetes Service (EKS)

7 Amazon EKS – Overview

89

Amazon EKS – fundamentals

89

Control plane Data plane

90 91

Amazon VPC CNI for Windows Amazon EKS and Windows support

92 93

Personal thoughts

95

Summary95

8 Preparing the Cluster for OS Interoperability

97

Setting up the VPC CNI plugin for Windows support Avoiding pod-scheduling disruption

100

97 98

Using nodeSelector to avoid pod-schedule disruption

ix

x

Table of Contents Using taints and tolerations to avoid pod schedule disruption

100

Dynamically scaling out Windows pods 102 Summary103

9 Deploying a Windows Node Group Technical requirements Amazon EKS node groups Amazon EKS-optimized Windows AMIs Consuming an EKS-optimized Windows AMI using Terraform

Understanding the EKS Windows bootstrap Working with persistent storage using CSI drivers SMB CSI driver high-level overview Managing persistent volumes on Kubernetes

105 106 106 108

108 109 110 111

105 Deploying an Amazon EKS cluster with Windows nodes using Terraform113 Creating security groups Creating an OpenID Connect endpoint and IAM roles for the cluster Creating instance roles for Windows and Linux node groups Enabling VPC CNI Windows support Using ConfigMap to add Kubernetes permissions (RBAC) to a node level Creating a launch template to launch and bootstrap Windows and Linux Amazon EC2 nodes Creating an Auto Scaling group

113 114 116 117 117

118 118

Summary120

10 Managing a Windows Pod Technical requirements Exploring Windows host and Pod resource management Pod memory management Pod CPU management Host CPU management System resource reservations

Understanding the Runtime Class use case

121 121

Understanding Active Directory integration on Kubernetes

121

Credential specs

122 123 123 124

124

Deploying a Windows Pod on Amazon EKS Connecting to the Amazon EKS cluster Deploying the Windows Pod

128 129

129 130 130

Summary135

Table of Contents

Part 4: Operationalizing Windows Containers on AWS

11 Monitoring and Logging

139

Implementing LogMonitor Implementing log forwarding

139 143

The awslogs driver as a log processor Fluent Bit as a log processor

145 146

Using Amazon CloudWatch Container Insights 149 Summary151

12 Managing a Windows Container's Image Life Cycle  Understanding Microsoft Patch Tuesday153 Security patch compliance on Windows container images 154

153

Rebuilding your container image frequently 156 Summary158

13 Working with Ephemeral Hosts The idea behind ephemeral hosts Why custom AMIs

159

159 160

Building a custom AMI pipeline 161 Summary166

14 Implementing a Container Image Cache Strategy Why is implementing a container image cache strategy important? Container runtime pull throttling Layer extraction is a serial operation

167 168 169

Layer extraction is CPU/disk intensive when Windows Defender antivirus is enabled

Speeding up a Windows container startup time

167 170

170

xi

xii

Table of Contents Extending EC2 Image Builder with custom components

171

Summary173

15 AWS Windows Containers Deployment Tools Deploying an Amazon EKS cluster with eksctl 175 Containerizing Windows applications with AWS App2Container177

175

Deploying Windows containers with AWS Copilot 178 Summary180

Index181 Other Books You May Enjoy

190

Preface Hello there! Windows containers adoption has been increasing over the past years because today, companies need to deploy, coexist, integrate, and maintain two parallel environments in their current IT footprint: • Modern infrastructure comprises serverless, containers, IaC, GitOps, and DevSecOps and usually lives in the cloud • Legacy infrastructure comprises technologies that usually power their most critical workload in the company but rely on virtual machines and bare metal servers There are many rumors about full stack modernization, but the actual reality is that legacy infrastructure will continue to exist for years to come for many reasons. Therefore, professionals looking to help their company succeed with the cloud and modern infrastructure adoption will need to know when and how to blend the legacy and the modern infrastructure to reduce cost and achieve operational excellence. Windows containers play a crucial role in this space, allowing companies to reduce costs by moving applications from the legacy to the modern infrastructure, with low code change and offering a better application per server ratio, thereby decreasing Windows Server license costs in the long term.

Who this book is for This book targets solution architects, DevOps engineers, sysadmins, and containers experts willing to learn more about Windows containers on AWS. To learn the most from the book, readers should have basic Docker, Kubernetes, and container experience, as the book does not mean to teach you container fundamentals; instead, it is laser-focused on the lessons learned from the field.

What this book covers Chapter 1, Windows Container 101, covers how Windows containers play an important role in application modernization, followed by a deep dive into the Windows container primitives and resource control on Windows Server. Chapter 2, Amazon Web Services – Breadth and Depth, covers why AWS is the best cloud provider for running Windows container workloads. Chapter 3, Amazon ECS – Overview, covers Amazon ECS fundamentals and Windows network components, followed by a Terraform deployment code example.

xiv

Preface

Chapter 4, Deploying a Windows Container Instance, explores how to deploy and right-size ECS Windows container instances, followed by Terraform deployment code sample. Chapter 5, Deploying an EC2 Windows-Based Task, will teach you how to deploy EC2 Windows-based tasks on ECS and the options to integrate with Active Directory, as well as setting up persistent storage for stateful Windows containers followed by a Terraform deployment code example. Chapter 6, Deploying a Fargate Windows-Based Task, covers how to deploy an AWS Fargate Windows task on Amazon ECS, followed by Terraform deployment code examples. Chapter 7, Amazon EKS – Overview, helps you understand how Amazon EKS operates under the hood and discusses its Windows components. Chapter 8, Preparing the Cluster for OS Interoperability, teaches you how to operate a heterogeneous Amazon EKS cluster. Chapter 9, Deploying a Windows Node Group, teaches you how to deploy Windows worker nodes on Amazon EKS with persistent storage for a stateful application, followed by a Terraform code example. Chapter 10, Managing a Windows Pod, teaches you how to deploy Windows pods on Amazon EKS, combining best practices to use taints and tolerations, runtime class, and resource control, followed by the Active Directory integration options. Chapter 11, Monitoring and Logging, explores centralized metrics and logs for Windows containers running on Amazon ECS, EKS, and Fargate using CloudWatch Logs and Fluentbit. Chapter 12, Managing a Windows Container’s Image Life Cycle, teaches you how to keep your Windows container’s image secure by applying security patches and understanding how it applies to immutable container images. Chapter 13, Working with Ephemeral Hosts, teaches you how to operate Windows Server as an ephemeral host, the advantages of doing so, and how it plays a core role in container clusters all by leveraging an automated Amazon Machine Image (AMI) pipeline using EC2 Image Builder. Chapter 14, Implementing a Container Image Cache Strategy, teaches you how to reduce your Windows container's launch time by implementing an automated image cache strategy using EC2 Image Builder. Chapter 15, AWS Windows Containers Deployment Tools, covers the ancillary AWS tools available for deploying and operating Windows containers on AWS.

Preface

To get the most out of this book You will need Terraform installed on your computer – the latest version. All code examples have been tested using Terraform version 1.3.9 and Hashicorp AWS module 4.0. Software/hardware covered in the book

Operating system requirements

Terraform

Windows, macOS, or Linux

You can follow the official Hashicorp Terraform installation guide on the following website: https:// developer.hashicorp.com/terraform/tutorials/aws-get-started/installcli.

Download the example code files You can download the example code files for this book from GitHub at https://github.com/ PacktPublishing/Running-Windows-Containers-on-AWS. If there’s an update to the code, it will be updated in the GitHub repository. We also have other code bundles from our rich catalog of books and videos available at https:// github.com/PacktPublishing/. Check them out!

Download the color images We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://packt.link/cgYuN

Conventions used There are a number of text conventions used throughout this book. Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Something that changed from the previous EC2 Windows-based task is that Fargate uses awsvpc; thereby, we need to add the proper porting mappings.” A block of code is set as follows: resource "aws_alb_listener" "ecs_alb_listener" {   load_balancer_arn = aws_lb.ecs_alb.arn   port              = 80   protocol          = "HTTP"

xv

xvi

Preface

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold: .

apiVersion: node.k8s.io/v1 kind: RuntimeClass metadata:    name: windows-2019 handler: 'docker' Any command-line input or output is written as follows: terraform init terraform apply Bold: Indicates a new term, an important word, or words that you see onscreen. Here is an example: “Assuming you have two Kubernetes deployments, Deployment 1 deploys the frontend, and Deployment 2 deploys the backend.” Tips or important notes Appear like this.

Get in touch Feedback from our readers is always welcome. General feedback: If you have questions about any aspect of this book, email us at customercare@ packtpub.com and mention the book title in the subject of your message. Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form. Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material. If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Preface

Share Your Thoughts Once you’ve read Running Windows Containers on AWS, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback. Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

xvii

xviii

Preface

Download a free PDF copy of this book Thanks for purchasing this book! Do you like to read on the go but are unable to carry your print books everywhere? Is your eBook purchase not compatible with the device of your choice? Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost. Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application. The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily Follow these simple steps to get the benefits: 1. Scan the QR code or visit the link below

https://packt.link/free-ebook/9781804614136 2. Submit your proof of purchase 3. That’s it! We’ll send your free PDF and other benefits to your email directly

Part 1: Why Windows Containers on Amazon Web Services (AWS)?

In this part, you will get a Windows containers technical overview, covering the operational system primitives and how Windows Server isolates and implements resource control for Windows containers. In addition, you will learn how Windows containers on AWS are a relevant topic for customers’ application modernization journeys. This part has the following chapters: • Chapter 1, Windows Container 101 • Chapter 2, Amazon Web Services – Breadth and Depth

1 Windows Container 101 In this chapter, we’re going to cover the foundations of a Windows container and why it is an essential topic for DevOps engineers and solution architects. The chapter will cover the following topics: • Why are Windows containers an important topic? • How Windows Server exposes container primitives • How Windows Server implements resource controls for Windows containers • Understanding Windows container base images • Delving into Windows container licensing on AWS • Summary

Why are Windows containers an important topic? Have you ever asked yourself, “Why should I care about Windows containers?” Many DevOps engineers and solution architects excel in Linux containerization, helping their companies with re-platforming legacy Linux applications into containers to architect, deploy, and manage complex microservices environments. However, many organizations still run tons of Windows applications, such as ASP.NET websites or .NET Framework applications, which are usually left behind during the modernization journey. Through many customer engagements I have had in the past, there were two main aspects that meant Windows containers weren’t an option for DevOps engineers and solution architects. The first was a lack of Windows operational system expertise in the DevOps team. Different system administrators and teams usually manage Windows and Linux, each using the tools that best fit their needs. For instance, a Windows system administrator will prefer System Center Configuration Manager (SCCM) as a Configuration Management solution. In contrast, a Linux system administrator would prefer Ansible.

4

Windows Container 101

Another example: a Windows system administrator would prefer System Center Operations Manager (SCOM) for deep insights, monitoring, and logging, whereas a Linux system administrator would prefer Nagios and an ELK stack. With the rapid growth of the Linux ecosystem toward containers, it is a natural and more straightforward career shift that a Linux system administrator needs to take in order to get up to speed as a DevOps engineer, whereas Windows system administrators aren’t exposed to all these tools and evolutions, making it a hard and drastic career shift, where you have to first learn about the Linux operating system (OS) and then the entire ecosystem around it. The second aspect is the delusion that every .NET Framework application should be refactored to .NET (formerly .NET Core). In almost all engagements where the .NET Framework is a topic, I’ve heard developers talking about the beauty of refactoring their .NET Framework application into .NET and leveraging all the benefits available on a Linux ecosystem, such as ARM processors and the rich Linux tools ecosystem. While they are all 100% technically correct, as solution architects, we need to see the big picture, meaning the business investment behind it. We need to understand how much effort and investment of money will be required to fully refactor the application and its dependencies to move out of Windows, what will happen with the already purchased Windows Server licenses and management tools, and when the investment will break even. Sometimes, the annual IT budget will be better spent on new projects rather than refactoring 10-year-old applications, where the investment breakeven will take 5 or more years to come through, without much innovation on the application itself. Now that we understand the most common challenges for Windows container adoption and the opportunity in front of us, we’ll dig into the Windows Server primitives for Windows containers, resource controls, and Windows base images.

How does Windows Server expose container primitives? Containers are kernel primitives responsible for containerization, such as control groups, namespaces, union filesystems, and other OS functionalities. These work together to create process isolation provided through namespace isolation and control groups, which govern the resources of a collection of processes within a namespace. Namespaces isolate named objects from unauthorized access. A named object provides processes to share object handles. In simple words, when a process needs to share handles, it creates a named event or mutex in the kernel; other processes can use this object name to call functions inside the process, then an object namespace creates the boundary that defines what process or container process can call the named objects. Control groups or cgroups are a Linux kernel feature that limits and isolates how much CPU, memory, disk I/O, and network a collection of the process can consume. The collection process is the one running in the container:

How does Windows Server expose container primitives?

Figure 1.1 – How a container runtime interacts with the Linux kernel

However, when it relates to the Windows OS, this is an entirely different story; there is no cgroup, pid, net, ipc, mnt, or vfs. Instead, in the Windows world, we call them job objects (the equivalent of cgroups), object namespaces, the registry, and so on. Back in the days when Microsoft planned how they would effectively expose these low-level Windows kernel APIs so that the container runtime could easily consume them, Microsoft decided to create a new management service called the Host Compute Service (HCS). The HCS provides an abstraction to the Windows kernel APIs, making a Windows container a single API call from the container runtime to the kernel:

Figure 1.2 – How a container runtime interacts with the Windows kernel

5

6

Windows Container 101

Working directly with the HCS may be difficult as it exposes a C API. To make it easier for container runtime and orchestrator developers to consume the HCS from higher-level languages such as Go and C#, Microsoft released two wrappers: • hcsshim is a Golang interface to launch and manage Windows containers using the HCS • dotnet-computevirtualization is a C# class library to launch and manage Windows containers using the HCS Now that you understand how Windows Server exposes container primitives and how container runtimes such as Docker Engine and containerd interact with the Windows kernel, let’s delve into how Windows Server implements resource controls at the kernel level for Windows containers.

How Windows Server implements resource controls for Windows containers In order to understand how Windows Server implements resource controls for Windows containers, we first need to understand what a job object is. In the Windows kernel, a job object allows groups of processes to be managed as a unit, and Windows containers utilize job objects to group and track processes associated with each container. Resource controls are enforced on the parent job object associated with the container. When you are running the Docker command to execute memory, CPU count, or CPU percentage limits, under the hood, you are asking the HCS to set these resource controls in the parent job object directly:

Figure 1.3 – Internal container runtime process to set resource controls

Understanding Windows container base images

Resources that can be controlled include the following: • The CPU/processor • Memory/RAM • Disk/storage • Networking/throughput The previous two topics gave us an understanding of how Windows Server exposes container primitives and how container runtimes such as Docker Engine and containerd interact with the Windows kernel. However, you shouldn’t worry too much about this. As a DevOps engineer and solution architect, it is essential to understand the concept and how it differs from Linux, but you will rarely work at the Windows kernel level when running Windows containers. The container runtime will take care of it for you.

Understanding Windows container base images When building your Windows application into a Windows container, it is crucial to assess the dependencies it carries, such as Open Database Connectivity (ODBC) drivers, a Dynamic-Link Library (DLL), and additional applications. These entire packages (meaning the application plus dependencies) will dictate which Windows container image must be used. Microsoft offers four container base images, each exposing a different Windows API set, drastically influencing the final container image size and on-disk footprint: • Nano Server is the smallest Windows container base image available, exposing just enough APIs to support .NET Core or other modern open source frameworks. It is a great option for sidecar containers. • Server Core is the most common Windows container base image available. It exposes the Windows API set to support the .NET Framework and common Windows Server features, such as IIS. • Server is smaller than the Windows image but has the full Windows API set. It fits, in the same use case mentioned previously, applications that require a DirectX graphics API. • Windows is the largest image and exposes the full Windows API set. It is usually used for applications that require a DirectX graphics API and frameworks such as DirectML or Unreal Engine. There is a very cool community project specifically for this type of workload, which can be accessed at the following link: https://unrealcontainers.com/. Important note The Windows image is not available for Windows Server 2022, as Server is the only option for workloads that require a full Windows API set.

7

8

Windows Container 101

Enumerating the Windows container image sizes You have probably already heard about how big Windows container images are compared to Linux. While, technically, the differences in sizes are exorbitant, it doesn’t bring any value to the discussion since we won’t address Windows-specific needs with Linux, and vice versa. However, selecting the right Windows container base image directly affects the solution cost, especially regarding the storage usage footprint, drastically influencing the container host storage allocation. Let’s delve into Windows container image sizes. The values in the following table are based on Windows Server 2022 container images: Image name

Image size

Extracted on disk

Nano Server

118 MB

296 MB

Server Core

1.4 GB

4.99 GB

Windows Server

3.28 GB

10.8 GB

Table 1.1 – Windows container image sizes

As discussed in the previous section, the difference in size refers to the amount of the Windows API set exposed to the container, addressing different application needs. The Extracted on disk column is crucial information because, on AWS, one of the price compositions for the block storage called Amazon Elastic Block Storage (EBS) is the amount of space provisioned; you pay for what you provision, independent of whether it is used or not, thereby influencing the EBS volume size you will deploy on each container host. We’ll dive deep into this topic in greater detail in Chapter 14, Implementing a Container Image Cache Strategy.

Delving into Windows container licensing on AWS In AWS, there are two options to license Windows Server: • License included: You pay per second in the Amazon EC2 Windows cost. The Windows Server version is Datacenter, which gives you unlimited containers per host. • Bring your own license (BYOL): You bring your existing Windows Server license as long as the licenses were acquired or added as a true-up under an active Enterprise Agreement that was signed prior to October 1, 2019. This option also requires an Amazon EC2 Dedicated Host. I recommend checking internally in your organization what Windows Server 2019 license options are available and making a decision based on how much money you can save using either BYOL or License included adoption.

Summary

Summary In this chapter, we learned why Windows containers are an essential topic for organizations going through their modernization journey and why it may be a challenge due to a lack of expertise; then, we delved into how Windows Server exposes container primitives through the HCS and how container runtimes interact with the Windows kernel for resource controls. We also delved into the Windows container base images available, image sizing, and licensing. In a nutshell, the use case for a Windows container is very straightforward; if it can’t be solved with Linux due to incompatibility or application dependencies/requirements, then go with Windows, period. To add more to that, in the same way that we shouldn’t use Windows containers to run a Go application, we shouldn’t even try to use a Linux container to run a .NET Framework application. In Chapter 2, Amazon Web Services – Breadth and Depth, we will understand why AWS is the best choice for running Windows containers. You will learn how AWS Nitro improves container performance and the information you need to choose what AWS container orchestrator make sense for your use case.

Further reading • Microsoft Windows container license FAQ: https://docs.microsoft.com/en-us/ virtualization/windowscontainers/about/faq • Microsoft Licensing on AWS: https://aws.amazon.com/windows/resources/ licensing/ • StackOverflow 2022 Developer Survey: https://survey.stackoverflow.co/2022/

9

2 Amazon Web Services – Breadth and Depth Understanding a cloud provider’s core benefits and the pace of innovation around the services you are about to use is crucial for your business strategy. When migrating to the cloud, you establish a long-term relationship with the cloud provider, and your business technology will depend on the provider’s security, stability, investment, innovation, and fair pricing model. Imagine migrating to a cloud provider and, suddenly, the provider changes the pricing model or license, directly influencing your final service offering. Of course, you don’t want that. Right? Amazon Web Services (AWS) has constantly been reducing service pricing; since 2006, it has reduced prices 107 times. For example, in November 2021, AWS reduced prices by 31% in three Simple Storage Service (S3) storage classes for existing and new customers. In this chapter, we’ll learn about the pace of innovation around Windows containers on AWS and how AWS Nitro supercharges Windows container performance. Next, we’ll have a high-level overview of all AWS container orchestrators that support Windows and their use cases. The chapter will cover the following topics: • Why AWS for Windows containers? • Understanding how AWS Nitro impacts container performance • Learning about AWS container orchestrators

Why AWS for Windows containers? A lot of people ask me: Marcio, why should I choose AWS to run Windows containers if Microsoft is the one that built and owns the technology? I prefer to let you choose a cloud provider that addresses your needs and requirements based on your due diligence. In this first topic, we’ll learn about the pace of innovation around Windows containers on AWS.

12

Amazon Web Services – Breadth and Depth

AWS started supporting Windows containers in December 2017. From there, it learned about the technology, hired experts in the subject, and launched over 11 features and services in almost 5 years. The following list contains the most notable launches around Windows containers, with a drastic increase in pace in 2022: • Amazon Elastic Container Service (ECS) support for Windows, December 2017: AWS started supporting Amazon ECS with Windows Server 2016 and Docker 17.06 Enterprise. • Amazon Elastic Kubernetes Service (EKS) support for Windows, October 2019: AWS started supporting Amazon EKS with Windows Server 2019 and Kubernetes 1.14. • Amazon ECS support for Amazon FSx for Windows tasks, November 2020: This solution is excellent for legacy Windows apps that save files into local drives or network shares. • AWS Fargate for Windows, October 2021: This launch brought serverless Windows containers into the picture, enabling customers to run containers without needing to deploy and manage Elastic Compute Cloud (EC2) Windows hosts. • ECS Exec for Amazon ECS Windows tasks, November 2021: A fantastic daily operation feature allows customers to easily collect diagnostic information and troubleshoot by directly interacting with containers without first interacting with the host container operating system, opening inbound ports, or managing Secure Shell (SSH) keys. • Amazon ECS Anywhere support for Windows, March 2022: This feature brought Amazon ECS Windows tasks to anywhere, on-prem, and in any other cloud. In addition, Amazon ECS Anywhere allows customers who have already containerized Windows applications to migrate on-prem dependencies to the cloud. With ECS Anywhere, customers deploy on-prem Windows containers through the Amazon ECS console in the same way in the cloud. • Amazon Elastic Block Store (EBS) Container Storage Interface (CSI) Driver support for Windows, March 2022: Until this release, the Amazon EBS storage provider-specific code was kept in the Kubernetes project, referred to as in-tree drivers. The CSI was designed to replace in-tree drivers, drastically accelerating cloud vendor innovations in the Kubernetes storage space. • Amazon EKS supports container runtime for Windows, March 2022: Dockershim removal in Kubernetes 1.24 was published in late 2020. AWS started the containerd test since the alpha version for Windows, ensuring it would pass through all the AWS security compliances to be included as a supported runtime. • ECS Exec for Amazon ECS Fargate Windows tasks, April 2022: ECS Exec was made available for Fargate Windows tasks, an excellent addition for daily operations, mainly because AWS Fargate is a serverless solution where you don’t have access to the underlay operational system. • Amazon EKS supports CSI Proxy for Windows nodes, April 2022: CSI Proxy was introduced on EKS-optimized Windows Amazon Machine Images (AMIs) in April 2022, allowing customers to run CSI drivers such as EBS and Server Message Block (SMB) drivers on Amazon EKS.

Understanding how AWS Nitro impacts container performance

But in fact, it isn’t only about launching new features and services. AWS must ensure that reliable documentation is available and curated to help customers run, manage, and apply best practices when running Windows containers on AWS. To avoid an extensive list of these blogs and whitepapers here, check out the Appendix section of the book for more information. When choosing a cloud provider to run Windows containers, you will need to consider how close it is to the open source community and how much support you will have, even if it is out of scope. Although the Windows container community is still pretty small compared to Linux communities, and the shortage of experts on the topic is a challenge, both conditions play an essential role when choosing one cloud provider over another. I can tell from my experience working at AWS that they are customer-obsessed. For example, there was a time I had to jump into a call with AWS cloud support engineers, software development engineers, and the customer to fix a Calico plugin issue in the customer Amazon EKS with Windows node groups. Now, if you think about that for a second, AWS made support engineers, software engineers, and solution architects available to fix an issue in an open source network plugin that AWS didn’t even develop. This demonstrates the commitment and best effort a customer will have if they decide to run Windows containers on AWS. I could spend pages just talking about other cases related to Fluentd, Fluent Bit, Terraform, and more out-of-AWS-scope activities, but I want to ensure we are right on track with what is most essential—the core of the book.

Understanding how AWS Nitro impacts container performance When we think about Windows containers, the last thing that comes to mind is the hardware under the hood that powers the container. However, the combination of the hypervisor, hardware, and software directly affects the network packet flow, network jitter, latency, memory buffer, connections per second, and processing performance within the Windows container. AWS has built a system called the AWS Nitro System from scratch—a combination of a hypervisor, built-purpose hardware, and software that provides unmatched performance. The AWS Nitro System is divided into five components: • The Nitro Hypervisor • Nitro Cards • The Nitro Security Chip • Nitro Trusted Platform Module (Nitro TPM) 2.0 • Nitro Enclaves We’ll focus only on the most impactful components for Windows containers: the Nitro Hypervisor and Nitro Cards.

13

14

Amazon Web Services – Breadth and Depth

The Nitro Hypervisor is a Kernel-based Virtual Machine (KVM)-based lightweight hypervisor that manages memory and CPU allocations, delivering almost the same performance as a bare-metal instance. The AWS Nitro System offloads most hypervisor execution such as encryption, network, and storage I/O to Nitro Cards, resulting in almost the entire physical CPUs being dedicated to EC2 instances running on the hardware. Nitro Cards are composed of three different physical cards: • Nitro Card for VPC: This is responsible for encapsulation, decapsulation, security groups, limiters, and routing. One of the most significant benefits of the Nitro Card for VPC is that if you are running a Windows container host on an Amazon EC2 c5 instance type that provides up to 25 GB of network throughput and decide to upgrade to an EC2 instance that provides network throughput up to 100 GB (such as c5n), the only thing you need to do is turn it off, change the instance type, and turn it on again. Your task is done; there is no need for driver re-installation. • This gives you flexibility and freedom to move between different Amazon EC2 instances that best address your Windows container performance requirements without the need to maintain multiple EC2 AMIs, each one with a set of network drivers. • Nitro Card for EBS: This is responsible for mounting/unmounting EBS volumes to an Amazon EC2 instance. It uses Non-Volatile Memory Express (NVMe) over the fabric with the proprietary AWS protocol. The Nitro Card for EBS encrypts data before sending it to the network-attached storage (NAS) for encrypted EBS volumes. The benefit is that your Windows container host performance isn’t penalized by the encryption/decryption of the block storage, which frees up CPU and storage I/O for the containers. • Nitro Card for Instance Storage: This is responsible for mounting/unmounting and transparent encryption on local ephemeral NVMe block storage. Having Instance Storage disks is very appealing for Windows containers as you can save and process temporary data until the Windows container host gets terminated, instead of using Amazon EBS volumes for temporary data processing. All these benefits, combined with the latest Intel and AMD CPU chipsets available on Amazon EC2 instances, positively affect your Windows container host performance by increasing network packets, CPU cycles, and faster memory access, resulting in better application performance. Note I do recommend watching the following deep-dive session on YouTube if you are interested in learning about it in greater detail: AWS re:Invent 2021 - Powering next-gen Amazon EC2: Deep dive on the Nitro System (https://www.youtube.com/watch?v=2uc1vaEsPXU).

Learning about AWS container orchestrators

Learning about AWS container orchestrators On AWS, customers have an AWS container orchestrator variety, each addressing a different customer need. In contrast, you may also be undecided regarding the many options during the architecture phase. Let’s understand what the AWS container orchestrator options for Windows containers in more detail are: • Amazon ECS is a fully managed container orchestrator that is highly secure and reliable and has deep integration with AWS services. It provides built-in monitoring and logging capabilities, and adoption is easy, as the learning curve isn’t long. In contrast, it isn’t multi-cloud and doesn’t offer the same flexibility and control you will find in Kubernetes. We will dive deeper into this in Chapter 3, Amazon ECS – Overview. Website: https://aws.amazon.com/ecs/ • Amazon ECS Anywhere enables customers to run and manage container workloads on customer infrastructure or any other cloud. Amazon ECS Anywhere makes it easy for customers to run Windows containers on-premises, leveraging the same Amazon ECS instance in the cloud. One of Amazon ECS Anywhere’s benefits is shown when you need to run applications near the user due to latency or compliance requirements. We will dive deeper into Amazon ECS Anywhere in Chapter 4, Deploying a Windows Container Instance. Website: https://aws.amazon.com/ecs/anywhere/ • Amazon EKS is a managed Kubernetes orchestration service that is robust, highly secure, and a mature orchestrator for Windows containers. It supports Group Managed Service Accounts (gMSAs), CSI drivers, CNI plugins, and EKS Blueprints. Windows build servers, online game servers, and integration with existing Amazon EKS clusters are just some of the many use cases you can carry out with Windows. We will dive deeper into Amazon EKS and Windows support in Chapter 7, Amazon EKS – Overview. Website: https://aws.amazon.com/eks/ • AWS Fargate is a serverless compute for containers. It runs on top of Windows Server 2019 and provides kernel isolation without using Hyper-V isolation mode. AWS Fargate is an excellent option for customers who don’t have in-house Windows Server expertise, allowing them to solely focus on the application and forget about Windows Server management, such as patching, hardening, monitoring, inventory, and so on. We will dive deeper into AWS Fargate and Windows support in Chapter 6, Deploying a Fargate Windows-Based Task. Website: https://aws.amazon.com/fargate/

15

16

Amazon Web Services – Breadth and Depth

With all these options, choosing one AWS orchestrator over another when running Windows containers depends on many factors; let’s explore some of them: • Easy adoption: Easy adoption is an exciting topic, and I’ll make it provocative. This is all about what you really need versus what you think you need. I often talk with many customers who choose Amazon EKS as their primary container orchestrator—a decision to have in-depth control and flexibility at the expense of simplicity, which is acceptable. However, when asked why Amazon EKS, usually the answer is because of a multi-cloud strategy, but there isn’t one in place. I then take a step further and ask which Kubernetes controls and flexibility they plan to use that won’t be available in a simpler orchestrator. Usually, the answer is null. The main point here is that, typically, legacy Windows applications such as intranet/extranet websites, external APIs, and other .NET Framework applications are very predictable and won’t use all the container benefits such as scale-out/in, service meshes, and so on. There are a few exceptions, such as building servers and calculation systems, but in general, legacy Windows applications are very predictable. If you are in doubt about which AWS container orchestrator should be adopted for Windows containers, this is the question you will need to ask yourself: Do you need in-depth control and flexibility at the expense of simplicity? • Features and integrations: Let’s put the solution architect hat aside and think through a DevOps engineer or site reliability engineer (SRE) lens for a minute; deploying Windows containers will require a lot of research to define which tools are supported on Windows. For example, Prometheus and its node-exporter just work on Linux; however, on Windows, things aren’t that straightforward. First, it requires a win-exporter, which is another community solution. How a Windows container exports metrics needs to be combined in a way you can easily visualize in Grafana. Another example is Amazon Elastic Container Registry (ECR) image scanning, which helps identify vulnerabilities in your container images; however, it doesn’t support Windows images. One more example: let’s say you are running AWS App Mesh, which is a service mesh based on Envoy; again, there is no Envoy build for Windows on AWS App Mesh. As you can see, additional tools may be needed to manage a Windows container cluster, and defining the bare minimum of what you need is fundamental when you don’t have an ocean of options as you do in the Linux world. So, again, this is the question you will need to ask yourself: Do you need all the container ecosystems surrounding your Windows containers? • Open source community: The open source community plays a significant role when choosing which container orchestrator to use. As we are talking about Windows containers, we need to know which open source containers are available and how much you can rely on the community and community size. Until now, I have had great experiences working with the Kubernetes and the SIG-Windows communities; people are friendly, there is no cloud competition, and everyone there is willing to help. However, production environments have service-level agreements (SLAs) that need to be met, and implementing an open source community solution for Windows containers where you won’t have a hotline to dial in when needed puts you in a bad spot.

Learning about AWS container orchestrators

I worked with some customers who faced an Amazon EKS and Windows node groups roadblock implementation when working with an open source plugin that didn't work as expected on Windows. Meanwhile, on Linux, it wasn’t a problem. As a result, the customer had three options: ‚ Use a third-party vendor solution ‚ Develop the code to adjust the open source plugin to their needs ‚ Migrate to Amazon ECS, which offers native monitoring and logging integration Some customers have enough in-house expertise to overcome the blocker, develop their workaround, and support it. This shows how prepared they are to handle Kubernetes and Windows situations; others will need to step back and re-evaluate their container orchestrator choice and restart. The question you will need to ask yourself is this: Would you rely on community support or implement a third-party vendor solution in your production workload? • Support: Support is the hotline I mentioned earlier; when working with Amazon EKS, AWS support is limited to the Amazon EKS control plane and its components such as worker nodes, the VPC CNI plugin, EBS CSI drivers, and so on. However, community-developed plugins won’t be part of the support scope and may be part of best-effort support. AWS offers five support plans: ‚ Basic ‚ Developer ‚ Business ‚ Enterprise On-Ramp ‚ Enterprise Each one has a different price, case severity, and response time. Check out the following link to learn more: https://aws.amazon.com/premiumsupport/plans/. When in doubt about which support plan you should buy, ask yourself the following: How much downtime can my business afford without impacting the business revenue? This will directly answer which type of support plan you need. After you have read all these factors, the odds are probably not on the side of Amazon EKS. Kubernetes is a robust, highly scalable orchestrator that supports Windows very well; the drawback is the ecosystem, within which Windows containers aren’t first-class citizens. But, on the other hand, I’ve seen many customers successfully running Windows containers on Amazon EKS; they developed or adjusted all the necessary sidecar containers, purchased third-party plugins, and ran thousands of Windows Pods. In a nutshell, the right AWS container orchestrator is the one that addresses your needs by taking into consideration all the factors you have learned about.

17

18

Amazon Web Services – Breadth and Depth

Summary In this chapter, we learned why customers choose AWS to run Windows containers and the pace of innovation; then, we delved into the AWS Nitro Hypervisor and Nitro Cards and how they directly affect Windows container performance. Finally, we discussed some factors that need to be considered when choosing an AWS container orchestrator for Windows containers. Until now, we have been setting the stage by understanding Windows container fundamentals. In Chapter 3, Amazon ECS – Overview, we will start the technical deep-dive. First, you will learn Amazon ECS principles, such as container instances, task definitions, and services, which will prepare you for the upcoming chapters.

Part 2: Windows Containers on Amazon Elastic Container Service (ECS)

In this part, you will learn how to set up an Amazon ECS cluster and deploy Windows container tasks. First, you will learn how Amazon ECS works and explore its components; then, you will learn how to deploy an EC2 Windows-based task, followed by a Fargate Windows-based task. This part has the following chapters: • Chapter 3, Amazon ECS – Overview • Chapter 4, Deploying a Windows Container Instance • Chapter 5, Deploying an EC2 Windows-Based Task • Chapter 6, Deploying a Fargate Windows-Based Task

3 Amazon ECS – Overview In this chapter, we’ll learn about Amazon Elastic Container Service (ECS) and its relevant Windows components, such as container instances, services, tasks, and task definitions, and then deep-dive into task networking. Finally, we’ll deploy an empty Amazon ECS cluster using Terraform, which will be the first step in the building block exercise of configuring Amazon ECS entirely with Windows container instances and tasks. We are going to cover the following main topics: • Amazon ECS – fundamentals • Amazon ECS – task networking • Deploying an Amazon ECS cluster with Terraform This chapter will give us the fundamentals for Chapters 4, 5, and 6, which will deep-dive into the Windows specifics and deployments.

Technical requirements In the Deploying an Amazon ECS Cluster with Terraform section, you will need to have the following expertise and technologies installed: • The AWS CLI • The Terraform CLI • An IAM user account with AmazonECS_FullAccess and IAMFullAccess managed policies attached • Terraform development expertise

22

Amazon ECS – Overview

To get access to the source code used in this chapter, access the following GitHub repository: https:// github.com/PacktPublishing/Running-Windows-Containers-on-AWS//tree/ main/ecs-ec2-windows. Important note It is strongly recommended to use an AWS test account to perform the activities described in this book and never use it against your production environment.

Amazon ECS – fundamentals Amazon ECS is a managed container orchestrator created by AWS that allows customers to run containers as tasks. A task is defined in a task definition, a kind of configuration blueprint in which you specify container configurations, such as the amount of vCPU, memory, network ports, and more. Amazon ECS comprises the following components: • Clusters • Container instances • Task definitions • Tasks • Services Clusters are the logical grouping of tasks or services. Amazon ECS clusters are free of charge, and you only pay for the underlying infrastructure, such as Amazon EC2 Windows instances, Amazon EBS, Amazon CloudWatch, and so on. Figure 3.1 illustrates an empty Amazon ECS cluster and an existing VPC. When you deploy an empty Amazon ECS cluster, no resources are created inside or outside the VPC. The container instance is the Amazon EC2 instance name that works as an ECS cluster member. The ECS agent installed acts as a man in the middle, being responsible for the communication exchange between the container instance and the ECS control plane. The ECS agent also receives the request from the Amazon ECS cluster to start and stop tasks. It is a best practice to deploy container instances in the private subnets within a VPC:

Amazon ECS – fundamentals

Figure 3.1 – Amazon ECS cluster with two container instances

The task definition is a text file JSON format file that works as a blueprint for your application. First, you select the launch type compatibility (Fargate, ECS, or External, which is ECS Anywhere). Next, you set what operating system family (Windows or Linux) the task definition belongs to. Finally, you need to set the task size, which is the amount of vCPU and memory (GB) the containers inside the task can consume from the container instance.

23

24

Amazon ECS – Overview

The next step within a task definition is called the container definition, which you set to define how the container will behave, such as the container image name, ports to be exposed, vCPU and memory limits (MiB) to be consumed, health checks, storage, and logging:

Figure 3.2 – A task definition created, which the ECS cluster uses to launch a container

A task definition is immutable, and it can’t be edited once created. Therefore, if any parameter needs to be changed, a new revision must be created. Important note Configuring the task size is not supported for Windows containers; however, it must be specified. The amount of vCPU and memory (MiB) a Windows container can consume from the operating system is set in the container definition.

Amazon ECS – fundamentals

A task is the instantiation of a task definition in a container shape. A task can be standalone or as part of a service. A standalone task usually runs as a short-lived container that starts, does some processing, and shuts down – for instance, a job application. As part of a service, tasks usually run as long-running containers – for instance, an ASP.NET web application:

Figure 3.3 – Task definition instantiation through a task

25

26

Amazon ECS – Overview

Services run and maintain the desired number of tasks in an Amazon ECS cluster all the time. They do it using a service component called the Amazon ECS service scheduler, which is responsible for launching new tasks or replacing existing ones. There are two types of service schedule strategies available: • DAEMON is a good strategy if the Amazon ECS cluster is fully dedicated to a specific application or the daemon tasks need to be prioritized, being the first tasks to launch and the last to stop • REPLICA, by default, will evenly spread tasks across Availability Zones by determining which container instance in the Amazon ECS cluster supports the task definition parameters, such as CPU, memory, ports, and container instance attributes The following figure is a service with a replica schedule illustration. Two tasks were deployed, each with two Windows containers spread across different Availability Zones:

Figure 3.4 – Two tasks scheduled by replica mode through services

Amazon ECS – task networking

In this section, we learned about the Amazon ECS core components, including how the ECS control plane communicates with the container instances through the ECS agent, then how ECS uses task definitions to describe a Windows container and its resources. Finally, we learned how ECS uses tasks and services to deploy Windows containers into container instances.

Amazon ECS – task networking For Windows, task networking is limited to two modes, default and awsvpc. The default uses Docker’s built-in virtual network, a Network Address Translation (NAT) mode on Windows. In the default mode, Docker Engine is responsible for creating and managing the host network on Windows, which is built on top of a Hyper-V virtual switch (vSwitch). That doesn’t mean the Hyper-V hypervisor role is installed; instead, it only uses networking capabilities. Each Windows container is connected to the Hyper-V vSwitch using a virtual network interface card (vNIC):

Figure 3.5 – The Docker network and Windows adapters

A simple north-south workflow traffic would be as follows: 1. Multiple Windows containers run within a standalone task with dynamic port enabled. 2. The data package is sent to the vNIC attached to the Windows container. 3. The data package is sent to the vSwitch, and Windows Network Address Translation (WinNAT) port forwarding is created. 4. The data package is routed to the VPC through the Windows container instance's Elastic Network Interface (ENI).

27

28

Amazon ECS – Overview

This is depicted in the following figure:

Figure 3.6 – North-south traffic from the Windows task to the VPC

The default mode is straightforward to use. The benefit is that as it shares the single Windows container instance's ENI, the number of running tasks in a single Windows container instance using dynamic ports equals the number of ephemeral ports available between 49153-65535 and 32768-61000. Also, it launches faster because it doesn’t need to call AWS APIs to create and attach an ENI. Important note To enable a task to use dynamic ports, set Host port = 0 as part of the container definition. On the other hand, there are two drawbacks: • All tasks share the same host network namespace • Inbound/outbound traffic is controlled at the container instance ENI level, meaning all the containers running on the Windows container instance share the same network policies A good use case for the default mode is for short-lived containers that need to start up fast, process some data, and shut down. Meanwhile, awsvpc tasks allocate their own ENI and a private IPv4 address from within the VPC that the Windows container is deployed to. Under the hood, a secondary ENI is attached to the Windows container instance for each task in its network namespace:

Amazon ECS – task networking

Figure 3.7 – One vNIC and ENI per task in awsvpc mode

When awsvpc mode is set, the ECS agent creates an additional pause container within the task before starting the Windows container in order to set up a new network namespace, and attach an ENI and Hyper-V vNIC by running the amazon-ecs-cni plugin. Once completed, the ECS agent starts the Windows container within the task and plugs the Hyper-V vNIC into it:

Figure 3.8 – Task networking in awsvpc mode

29

30

Amazon ECS – Overview

The awsvpc mode offers better network performance because each task has its own ENI; thereby, there is no concurrent traffic in a single ENI as in the default mode. However, two drawbacks need to be taken into account: • The number of tasks per container instance is limited by the number of secondary interfaces the EC2 instance type supports, drastically reducing the task density compared with the default mode • Tasks may take longer to launch and terminate because the ECS control plane must handle ENI creation, attachment, detachment, and termination In this section, we dove deep into the internals of how Amazon ECS handles task networking on Windows containers. First, we learned how ECS uses the Docker default network through NAT (default) mode; then, we learned how awsvpc mode enables Windows tasks to have a dedicated EC2 ENI by using the amazon-ecs-cni plugin.

Deploying an Amazon ECS cluster with Terraform This is the first deployment topic we will go over, and it is essential to understand how this will work. I believe that filling up pages with code doesn’t make much sense; instead, the complete code is available on GitHub per chapter, and in the book, we will use code snippets to illustrate each step. Terraform offers a lot of string functions and expressions, which can be very complex to understand first-hand. Therefore, I will try to make the code as simple as possible so that you can understand the code easily if you are a Terraform beginner or an advanced developer. Decoupling your Terraform code into modules is a typical pattern used in Terraform. For example, customers usually create reusable standalone modules to deploy security groups, ELB, EC2, and so on and then merge them into the root module. To keep the code simple and avoid many inter-module dependencies, I have described the entire Amazon ECS and its components, such as IAM roles, instance profiles, ELB, EC2 instances, and so on, into one main.tf file, so you can easily find the code snippet and learn from it. Important note This book is not meant to teach you Terraform. Work experience with Terraform is a requirement to go over the deployment topics in the book. The code in the book is given as snippets from main.tf, which is available in the GitHub repository.

Why Terraform? Terraform is an open source Infrastructure-as-Code (IaC) tool created by HashiCorp; customers of all sizes and segments use Terraform to standardize their AWS environments. It uses a native syntax called HCL, which is easy to read and learn. I mainly chose to write this book and use Terraform for the deployment topics because almost all customers I’ve worked with used Terraform as their IaC tool.

Deploying an Amazon ECS cluster with Terraform

HashiCorp officially publishes an AWS provider in the Terraform Registry, which can be accessed at the following URL: https://registry.terraform.io/namespaces/hashicorp. AWS employees and the community work together to make sure the provider keeps up with the pace of innovation and the service launches AWS does throughout the year. On the provider web page, you will find everything you need to deploy AWS services using Terraform. It is very well curated, documented, and rich in examples. In this first deployment build block, we will deploy the following: • IAM roles and instance profiles • An Amazon ECS cluster

IAM roles and instance profiles Before deploying the cluster, we want to ensure that the necessary IAM roles and instance profiles are in place so the cluster components can successfully work. We will start by creating ecsTaskExecutionRole. This role will be responsible for permitting ECS agents to make the AWS API call on a task’s behalf. The most common calls are as follows: • Pulling a container image from the Amazon ECR private repository • Using awslogs to send containers logs to CloudWatch Logs AWS already provides a managed policy named AmazonECSTaskExecutionRolePolicy, which contains the permissions for the use cases described previously. The policy has the following effect and actions: {   "Version": "2012-10-17",   "Statement": [     {       "Effect": "Allow",       "Action": [         "ecr:GetAuthorizationToken",         "ecr:BatchCheckLayerAvailability",         "ecr:GetDownloadUrlForLayer",         "ecr:BatchGetImage",         "logs:CreateLogStream",         "logs:PutLogEvents"       ],       "Resource": "*"     }

31

32

Amazon ECS – Overview

  ] } The following Terraform code snippet will perform these executions: • Create an IAM role named ecsTaskExecutionRole • Attach the AmazonECSTaskExecutionRolePolicy IAM managed policy to the ecsTaskExecutionRole role • Create an assume role policy that allows ECS tasks to call AWS APIs Here is the snippet: resource "aws_iam_role" "ecsTaskExecutionRole" {   name                = "ecsTaskExecutionRole"   path                = "/"   managed_policy_arns = ["arn:aws:iam::aws:policy/service-role/ AmazonECSTaskExecutionRolePolicy"]   assume_role_policy = jsonencode({     Version = "2012-10-17"     Statement = [       {         Action = "sts:AssumeRole"         Sid    = ""         Effect = "Allow"         Principal = {           Service = "ecs-tasks.amazonaws.com"         }       },     ]   }) } In the next step, we will create ecsInstanceRole. This role will be responsible for performing actions to Amazon EC2 AWS APIs on your behalf. AWS provides AmazonEC2ContainerServiceforEC2Role as a managed IAM policy. The policy has the following effect and actions: {     "Version": "2012-10-17",

Deploying an Amazon ECS cluster with Terraform

    "Statement": [         {             "Effect": "Allow",             "Action": [                 "ec2:DescribeTags",                 "ecs:CreateCluster",                 "ecs:DeregisterContainerInstance",                 "ecs:DiscoverPollEndpoint",                 "ecs:Poll",                 "ecs:RegisterContainerInstance",                 "ecs:StartTelemetrySession",                 "ecs:UpdateContainerInstancesState",                 "ecs:Submit*",                 "ecr:GetAuthorizationToken",                 "ecr:BatchCheckLayerAvailability",                 "ecr:GetDownloadUrlForLayer",                 "ecr:BatchGetImage",                 "logs:CreateLogStream",                 "logs:PutLogEvents"             ],             "Resource": "*"         }     ] } The following Terraform code snippet will perform these executions: • Create an IAM role named ecsInstanceRole • Attach the AmazonEC2ContainerServiceforEC2Role IAM managed policy to the ecsInstanceRole role • Create an assume role policy that allows ECS to call the AWS EC2 APIs Let us check the snippet: resource "aws_iam_role" "ecsInstanceRole" {   name                = "ecsInstanceRole"   path                = "/"

33

34

Amazon ECS – Overview

  managed_policy_arns = ["arn:aws:iam::aws:policy/service-role/ AmazonEC2ContainerServiceforEC2Role"]   assume_role_policy = jsonencode({     Version = "2012-10-17"     Statement = [       {         Action = "sts:AssumeRole"         Sid    = ""         Effect = "Allow"         Principal = {           Service = "ec2.amazonaws.com"         }       },     ]   }) } We need to add the ecsInstanceRole we just created as part of the instance profile. The instance profile is responsible for passing an IAM role to an EC2 instance. We will reference the role attribute with the resource name we created in the prior step, role = aws_iam_role.ecsInstanceRole.name: resource "aws_iam_instance_profile" "ecs_windows_ ecsInstanceRole_profile" {   name = "ecs_windows_ecsInstanceRole_profile"   role = aws_iam_role.ecsInstanceRole.name }

Amazon ECS clusters Finally, we will deploy the Amazon ECS cluster. As I mentioned in the beginning, this will start as an empty cluster, so we can use it as a building block for the upcoming chapters. The cluster will have Container Insights enabled by default to fetch task resource consumption through the ECS agent: resource "aws_ecs_cluster" "ecs_windows_cluster" {   name = "ecs-cluster"   setting {     name  = "containerInsights"

Summary

    value = "enabled"   } }

Deploying the AWS services Before now, we understood the code snippets that deploy the IAM roles and an Amazon ECS cluster; however, if you want to test the deployment, access the following GitHub repository: https:// github.com/PacktPublishing/Running-Windows-Containers-on-AWS/tree/ main/ecs-ec2-windows. There is a file named chapter03.tf that contains all the resources covered in the chapter in a single TF file. Rename it main.tf and run the following: terraform init terraform apply Check the AWS services and resources created in your account, then destroy (remove) them by running the following command: terraform destroy In Chapter 4, Deploying a Windows Container Instance, we will recreate the cluster with Windows container instances. In this section, we got our hands dirty by coding in Terraform HCL. We learned about the necessary components for deploying an Amazon ECS cluster and its dependencies, such as IAM roles and policies, from code snippets.

Summary In this chapter, we learned about Amazon ECS fundamentals and how its components are related; then, we delved into ECS task networking and the options available for Windows containers. Finally, we explored code snippets that deployed the necessary IAM roles and instance profiles used by the ECS Windows container instances and an empty ECS cluster. In Chapter 4, Deploying a Windows Container Instance, we will learn about Amazon-optimized Windows AMIs, Auto Scaling groups, and capacity providers, and launch a Windows container instance inside the ECS cluster we deployed in this chapter.

35

4 Deploying a Windows Container Instance In this chapter, we’ll learn from the inside out how an ECS Windows container instance works. We will understand the available ECS-optimized AMI for Windows and how the ECS container agent plays a vital role between the container instance and the ECS control plane. Then, we will learn about the four pillars that need to be considered when right-sizing a Windows container instance. Finally, we will use Terraform to deploy an Auto Scaling group to launch Windows container instances in an ECS cluster. We are going to cover the following main topics: • Amazon ECS-optimized Windows AMIs • Amazon ECS agent • Right-sizing a Windows container instance • Deploying a Windows container instance with Terraform

Technical requirements In the Deploying a Windows container instance with Terraform section, you will need to have the following technologies installed and expertise in them: • AWS CLI • Terraform CLI • IAM user with AmazonECS_FullAccess, IAMFullAccess, and AmazonEC2FullAccess managed policies • Terraform development expertise

38

Deploying a Windows Container Instance

To have access to the source code used in this chapter, access the following GitHub repository: https:// github.com/PacktPublishing/Running-Windows-Containers-on-AWS//tree/ main/ecs-ec2-windows. Important note It is strongly recommended that you use an AWS test account to perform the activities described in the book and never use it against your production environment.

Amazon ECS-optimized Windows AMIs AWS provides customers with the Amazon ECS Windows-optimized AMIs, which are preconfigured with the necessary components such as Docker Engine, ECS Agent, and Hyper-V vSwitch, to run Windows containers as tasks successfully. There are four Amazon ECS Windows-optimized AMI variants: • Amazon ECS-optimized Windows Server 2022 Full AMI • Amazon ECS-optimized Windows Server 2022 Core AMI • Amazon ECS-optimized Windows Server 2019 Full AMI • Amazon ECS-optimized Windows Server 2019 Core AMI The Full AMI has the Windows Desktop Experience GUI installed, and the Core AMI installation is based on Server Core (only PowerShell). The main difference between one and another is the GUI shell packages: • Microsoft-Windows-Server-Gui-Mgmt-Package • Microsoft-Windows-Server-Shell-Package • Microsoft-Windows-Server-Gui-RSAT-Package • Microsoft-Windows-Cortana-PAL-Desktop-Package Not having these shell packages installed drastically reduces the amount of provisioned block storage (Amazon EBS) needed to run the Windows Server operating system, directly impacting the solution cost. Always remember, with Amazon EBS, you pay for what you provisioned. Not taking the AMI choice into consideration between one or two Amazon ECS Windows container instances isn’t a big deal; however, when running an ECS cluster that scales out/in multiple times a day with thousands of EC2 Windows, the AMI version plays a crucial role in the solution cost. In simple words, use Core AMI as much as possible.

Amazon ECS agent

When working with Terraform, one of the easiest ways to use the latest ECS-optimized Windows AMI is through Data Sources. Data Sources allow Terraform to query for data outside of Terraform and then output it as a value elsewhere in Terraform code. The following is an example AWS Systems Manager (SSM) API call to retrieve the latest ECS-optimized Windows AMI ID: data "aws_ami" "ecs_optimized_ami" {   most_recent = true   owners      = ["amazon"]   filter {     name   = "name"     values = ["Windows_Server-2019-English-Core-ECS_ Optimized-*"]   } } To close this section, I recommend using the ECS Windows-optimized AMI as a starting point. Then, you can use tools such as HashiCorp Packer or EC2 Image Builder to apply the necessary configurations and hardening required by your company.

Amazon ECS agent As explained in the previous chapter, the ECS container agent is responsible for communicating between the Amazon ECS cluster and the Amazon EC2 instance. The ECS agent sends information about the currently running tasks and resource utilization of containers from the container instance to the Amazon ECS cluster. The ECS agent also receives the request from the Amazon ECS cluster to start and stop tasks:

39

40

Deploying a Windows Container Instance

Figure 4.1 – ECS agent two-way communication with an ECS cluster

ECS Agent runs as a Windows service on the Windows container instance, and it communicates with the Docker daemon through a named pipe at \\.\pipe\docker_engine. A named pipe is a mechanism for facilitating communication between two processes using shared memory.

Amazon ECS agent

You can use PipeList from Windows System Internals to list the Windows opened pipes:

Figure 4.2 – Listing pipelines with Pipelist

Assuming we are launching a new Amazon EC2 Windows instance based on the ECS-optimized Windows AMI to be part of an ECS cluster, this instance will become a Windows container instance. We need to bootstrap the instance if IaC is used in order to add it to an ECS cluster. The bootstrap usually is done by EC2 user data by passing parameters to be included in the ECS Agent configuration. The following is the PowerShell bootstrap to join the Amazon EC2 Windows instance into the ECS cluster:

Initialize-ECSAgent -Cluster ${aws_ecs_cluster.ecs_ windows_cluster.name} -EnableTaskIAMRole -AwsvpcBlockIMDS -EnableTaskENI -LoggingDrivers '["json-file","awslogs"]' [Environment]::SetEnvironmentVariable("ECS_ENABLE_AWSLOGS_ EXECUTIONROLE_OVERRIDE",$TRUE, "Machine")

41

42

Deploying a Windows Container Instance

Let’s understand the parameters: • Cluster specifies the existing ECS cluster name. • EnableTaskIAMRole enables tasks to assume IAM role; this will make port 80 unavailable for tasks. • AwsvpcBlockIMDS is an optional parameter that blocks instance metadata service (IMDS) access for the tasks running in the awsvpc mode. • EnableTaskENI turns on task networking and is required to use the awsvpc network mode. • LoggingDrivers specifies the log format and the logging driver. • [Environment]::SetEnvironmentVariable("ECS_ENABLE_AWSLOGS_ EXECUTIONROLE_OVERRIDE",$TRUE, "Machine") creates a system variable that enables the awslogs log driver to authenticate using the task execution IAM role. On Windows, the default value is false. In Terraform, we can bootstrap these EC2 Windows containers instances using a Launch template by specifying the following user data inside the aws_launch_template resource: user_data = "${base64encode(