Building Software Platforms: A Guide to SaaS Transition with AWS [1 ed.] 9798760009685

To stay competitive and maintain an edge, software companies and IT organizations must reinvent their way of working and

307 103 13MB

English Pages [251]

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Building Software Platforms: A Guide to SaaS Transition with AWS [1 ed.]
 9798760009685

Table of contents :
Table of Contents
Foreword
Preface
What's in this book?
Who should read this book?
What will you learn?
Acknowledgments
Here we go!
PART I: STRATEGY
Chapter 1: Transitioning to SaaS
Reinventing yourself
Software as a Service (SaaS)
Building and buying software
Evolving toward a service economy
Chapter 2: Internal Software Platforms
Appreciating the mundane
Why do you need an internal software platform?
What is an internal software platform?
What are the benefits of internal software platforms?
Chapter 3: Platform Services
Designing for the people
Core Infrastructure
Lifecycle Automation Management
Core Enablement Services
Platform Design System
Platform Configuration
Chapter 4: Platform Teams
Holding fast, staying true
What is a platform team?
How do platform teams fit within the organization?
What value do these teams create?
Interacting with platform teams
Chapter 5: Platform Adoption
Leapfrogging
The Digital Platform economy
A multi-competence strategy
Building platform-based business services
Platform buy-in
PART II: PRINCIPLES
Chapter 6: Technical Architecture Principles
Favor serviceful platforms over monoliths
Favor iterations over big up-front designs
Favor asynchronous integrations over synchronous ones
Favor elimination over re-engineering
Favor re-engineering over multiplying
Favor duplicity over hasty abstractions
Chapter 7: Technology Principles
Make your platform reachable
Define everything as code
Use the cloud platform as a programmable system
Secure your platform access and data
Observability and visibility
Use open standards
Chapter 8: Serverless-first Software Engineering
Building portable software
A serverless-first approach with AWS
Notes

Citation preview

Building Software Platforms A guide to SaaS transition with AWS Pablo Bermejo This book is for sale at http://leanpub.com/software-platforms This version was published on 2021-11-22

This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and many iterations to get reader feedback, pivot until you have the right book and build traction once you do. © 2021 Pablo Bermejo

Tweet This Book! Please help Pablo Bermejo by spreading the word about this book on Twitter! The suggested hashtag for this book is #softwareplatforms. Find out what other people are saying about the book by clicking on this link to search for this hashtag on Twitter: #softwareplatforms

To Mom and Dad

Contents Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preface . . . . . . . . . . . . . . . . What’s in this book? . . . . . Who should read this book? What will you learn? . . . . Acknowledgments . . . . . . Here we go! . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

1

. 4 . 4 . 8 . 9 . 10 . 11

PART I: STRATEGY . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 1: Transitioning to SaaS . . . . Reinventing yourself . . . . . . . . . . Software as a Service (SaaS) . . . . . Building and buying software . . . . Evolving toward a service economy

. . . . .

. . . . .

. . . . .

. . . . .

14 14 19 24 29

Chapter 2: Internal Software Platforms . . . . . . . . . . . Appreciating the mundane . . . . . . . . . . . . . . . . . Why do you need an internal software platform? . . . What is an internal software platform? . . . . . . . . . What are the benefits of internal software platforms? .

. . . . .

. . . . .

. . . . .

31 31 32 35 54

Chapter 3: Platform Services . . . . . . Designing for the people . . . . . . Core Infrastructure . . . . . . . . . . Lifecycle Automation Management

. . . .

. . . .

. . . .

60 60 61 72

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . .

CONTENTS

Core Enablement Services . . . . . . . . . . . . . . . . . . . . 82 Platform Design System . . . . . . . . . . . . . . . . . . . . . . 91 Platform Configuration . . . . . . . . . . . . . . . . . . . . . . 103 Chapter 4: Platform Teams . . . . . . . . . . . . . . . . . . Holding fast, staying true . . . . . . . . . . . . . . . . . What is a platform team? . . . . . . . . . . . . . . . . . How do platform teams fit within the organization? What value do these teams create? . . . . . . . . . . . Interacting with platform teams . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

116 116 117 121 130 139

Chapter 5: Platform Adoption . . . . . . . . . . Leapfrogging . . . . . . . . . . . . . . . . . . The Digital Platform economy . . . . . . . . A multi-competence strategy . . . . . . . . . Building platform-based business services . Platform buy-in . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

143 143 144 153 157 167

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

PART II: PRINCIPLES . . . . . . . . . . . . . . . . . . . . . . . . . 174 Chapter 6: Technical Architecture Principles . . . . . . . . . Favor serviceful platforms over monoliths . . . . . . . . . . Favor iterations over big up-front designs . . . . . . . . . . Favor asynchronous integrations over synchronous ones . Favor elimination over re-engineering . . . . . . . . . . . . Favor re-engineering over multiplying . . . . . . . . . . . . Favor duplicity over hasty abstractions . . . . . . . . . . .

. . . . . . .

177 177 182 188 191 195 201

Chapter 7: Technology Principles . . . . . . . . . . . . . Make your platform reachable . . . . . . . . . . . . . Define everything as code . . . . . . . . . . . . . . . Use the cloud platform as a programmable system Secure your platform access and data . . . . . . . . Observability and visibility . . . . . . . . . . . . . . . Use open standards . . . . . . . . . . . . . . . . . . . .

. . . . . . .

208 208 210 214 219 222 223

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

Chapter 8: Serverless-first Software Engineering . . . . . . . 227

CONTENTS

Building portable software . . . . . . . . . . . . . . . . . . . . 227 A serverless-first approach with AWS . . . . . . . . . . . . . 235 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

Foreword en·tro·py | \ ˈen-trə-pē : a process of degradation or running down or a trend to disorder – chaos. We who work in information technology could perhaps be forgiven for believing that software development is utter chaos – at times. When I was young, back in the sixties and seventies, I desperately wanted to know how everything worked. I would often take apart my toys and other possessions (like a green transistor radio I had). You might say I was acting as an agent of entropy. I was always fascinated with the parts and how they functioned together. During high school, small personal computers began to become available. Commodore came out with the Vic 20 and Commodore 64. You could buy kits to build a computer that probably was less functional but seemingly, far more intricate and interesting. As we moved forward, computers became less of a curiosity for tinkerers and more of a tool to be used to generate some actual value - and not just for science and engineering. This happened because, slowly but surely, the intricacies and details of what went into the computer faded into the background while the things it enabled became far more important. This same evolution occurred with programming languages and software built from them. We went from machine code to assembler to first and second and third-generation languages. We went from data entry and batch processing to on-linereal-time processing, green screens to GUIs, and more. In an odd way, all this evolution led to a sort of reversal of entropy – negentropy – everything was and is tending to become more organized and the complexities are becoming less visible to the people most impacted. Thus it is with enterprise software and the systems built from it. The sole purpose of enterprise software for most companies is to generate

Foreword

2

business value at the lowest possible cost. That value isn’t measured in complexity or intricacy or how amazing the algorithms are at its heart. The value is measured in how it helps people get things done. For a large enterprise, all this must be done quickly, robustly, safely, and securely. Getting all that is not easy. It is not simple, and it involves a lot of work to build and manage complexity in a way that hides it entirely while still delivering value. For those of us who have been around a while in the information technology business change and evolution are nothing new. However, all that change wasn’t random and did have (does have) a discernable trajectory. That trajectory is toward a functional commoditization of nearly everything done with a computer. In fact, that evolution is toward computers themselves simply fading into the background – at the consumer level and at the enterprise level. Back when “the cloud” was first created, it was really just a bunch of servers running virtual machines with virtual networking on which you could place workloads without needing to buy that hardware. It was just a place to put the same software you would install on your local servers, but now you didn’t need to buy the hardware or manage it. It was essentially virtualization on-demand over the nascent Internet. Over time, though, all that has changed. All the details about how and where to “build” that virtualized infrastructure has led to “infrastructure as code” or the software-defined data center. Beyond even that is the idea that you just define workloads and push them out to the cloud to run it without regard to infrastructure. That is, to build serverless solutions to enterprise demands. In this book, you’ll see building a platform based on Software-as-aservice and serverless concepts can relieve developers of the need to understand the complexities of running and managing software and allow them to focus on delivering greater value to their customers. You’ll learn about how and why a platform enables, as Pablo puts it, “serviceful” development to deliver that value that is so important. You’ll learn what a platform is and why you should want one. You’ll see how the trajectory of the past has led us to this point and how,

Foreword

3

while complexity still exists, it fades into the background (into the platform) and we, as developers, can focus on what is important to our customers – delivering valuable services that make their lives better. Bill Ohnemus, MSIS DXC Distinguished Engineer

Preface What’s in this book? The idea of leveraging tools and frameworks to create better software is not new. For example, in traditional product-centric approaches, vendors pushed developers to build services that orbited around a smart integration bus with the vain hope of making the software more robust. Constrained by these proprietary technologies, this approach was far from ideal for developers to design and build their applications. Nowadays, with the emergence of standards-based computing utilities (serverless) and the novel software engineering practices that evolved with them (NoOps), we can look at these internal tools from a more developer-centric point of view. Technology leaders worldwide keep rolling out incredibly complex product-centric infrastructure schemes to deal with technical debt rather than addressing it at its core. Instead, this book will discuss how to deploy a safety net for those brave software leaders that want to embrace the cloud as a serviceful platform, invest in developers, and make their organizations evolve with them. These are leaders who don’t love failure, but they tolerate the process. That’s why there is a bit of technology strategy in this book, too, based on Wardley maps1 . You no longer need to buy a product to cover all the cross-cutting and foundational software needs. Building your stuff in-house is much better, especially if what you are creating is core to your business and sets you apart from the competitors in the marketplace. That is why internal software platforms are built in-house and invisible to the eyes of the final customers. These end-users receive the value that platforms help to unleash in the form of better functional software.

Preface

5

For all these reasons, this book is called Building Software Platforms and not Installing Software Platforms or Configuring Software Platforms.

Terminology Even at the risk of being too dogmatic, I have tried to be very cautious with the terms I used to refer to some specific concepts. Being picky with words in a language that is not your mother tongue is the next level of pedantry! Nonetheless, words do matter. A careful selection requires intentionality, especially when you want to discontinue meaningless terms attached to old ways of thinking. To that end, the following list contains a few examples of phrases and expressions that I carefully employed to express a few ideas captured in the book: • I tried to use “managed cloud services” instead of “cloud-native.” The reason is that, if we look closer, there is nothing new in those technologies that makes them innate to the cloud. On the contrary, they are based on ages-old standards! • When I described the relationship between services and the internal software platform, I suggested that services run “with” the platform instead of “on” the platform to ward off layered architectures. The use of the former preposition helps to visualize that idea. • I tried not to use “microservices” as a generic term to refer to any service running with the platform. Instead, I utilized the word “serviceful” when I referred to the characteristics of true serviceoriented architectures. To me, there is a big difference, and the “micro” prefix denotes a distinct architectural style. • I refrained from using old concepts like “System of Integration” or “Systems of Engagement.” The word “system” in this context denotes some monolithic thinking and even layered architecture. As I unfold the characteristics of serviceful architectures across

Preface

6

the first chapters, I change those terms by “integration services” and “engagement services,” respectively. As you go through the different chapters in this book, please give me feedback, and I’ll correct it if you find that I have not been honest with my own rules.

Wardley maps Technology executives use strategy tools to help their organizations move forward. Methods like SWOT analysis, flow charts, or linear roadmaps help them make decisions and answer questions. But are they going in the right direction? Movement is fine, but you also need position when it comes to designing a game plan. I am not a strategist, but I understand the importance of using the right tool for the right job. I found that Wardley maps are an excellent technique for understanding your context, developing situational awareness, interpreting the competitive landscape, and visualizing the path forward. As with any other map, it helps you determine movement and position. Wardley maps are two-dimension diagrams that combine a value chain axis (from less visible to more visible to the user) and an evolution axis (from unchartered territories to fully industrialized). Then, when you position the business elements you are modeling on the map, you unveil a visualization of where your business is at in a given moment in time, how the competition is making these elements evolve, and your paths forward. This tool is used across the book to represent the role of internal software platforms in the organization and how this architectural style can help you transition to SaaS. The good thing about Wardley mapping is that all the official documentation has been released under the creative commons share-alike license by its creator, Simon Wardley. Consequently, and although I have tried to explain every map included in the book in detail, you will be able to find a lot

Preface

7

of materials out there that will help you understand this technique better.

AWS references As we can learn from this Forbes article2 , the most significant factor in a startup’s success is timing, according to Bill Gross (founder of Idealab). Gross studied over 200 companies to come to the following conclusions: • As per this study, in 42 percent of the cases, the timing was determinant for success. • Team and execution was a critical factor in 32 percent of the companies he studied. • The idea accounted for 28 percent of the cases. • The business model meant approximately 24 percent. • Funding represented only 14 percent. Okay, but what does this have to do with AWS? The answer is that AWS masters the timing success factor exemplary, releasing their services precisely at the right time, meeting their customers where they are without being condescending about the technology. Not too soon, not too late. Just right on time. Being conscious of timing and empathizing with the people around you has another name: zeitgeist. AWS understands and respects our zeitgeist like nobody else. For example, AWS Lambda has dominated the market since 2014 thanks to an incrementally richer set of features and a decreasing billing granularity. Due to this, this cloud giant is now the number one provider of serverless computing services. But it wasn’t AWS who pioneered the serverless market amongst the top cloud providers. Introduced in 2008, Google App Engine disrupted the infrastructure services market by offering runtimes as a service. The idea was compelling, the product was well delivered, the

Preface

8

business model is proven to work, it was funded by one of the world’s largest companies, and the team was brilliant. Why didn’t Google App Engine succeed? Because it was released too soon. It was too disruptive, revolutionary, and advanced for the developers back then, for whom Google App Engine didn’t resolve any of their problems. Google didn’t consider the zeitgeist when the product was released. For this and other reasons, I like the catalog of AWS services. Therefore, this book contains an assortment of AWS innovations to help you design many of the ideas and principles portrayed across the different chapters. By using AWS’s managed cloud services in the book examples, the goal is that you have more tools in your toolbox to implement an internal software platform strategy successfully.

Who should read this book? The strategies, methodologies, references, and principles documented in this book are aimed to guide software companies to transition to a service-based economy. These are companies that build software, regardless of whether they license it or not. Cloud infrastructure utilities priced per-use (IaaS and serverless) enable new economic models for software (SaaS), allowing end consumers to subscribe to on-demand products and pay only for their use. How do internal software platforms help with that? The answer is by empowering developers to build business services that maximize the value of their customers’ subscriptions. Platforms do so by curating experiences and offering them self-service to the developers, who can now focus on creating domain-specific functionalities their customers will love to use. To that end, internal platforms introduce a new architectural style aimed at helping developers build software with efficiency and effectiveness. Be prepared because this theory is repeated ad nauseam during the following chapters!

Preface

9

This book is for anyone who loves building quality software, from developers to product managers. However, I have identified two clear target groups: • First, this book is a must-read for software engineering and architecture leaders who work in software companies moving from a product-based economy to a service-based one and want to thrive on that change. • Second, this is a refreshing manuscript for technology leaders in IT organizations who build in-house software systems sitting at the center of their businesses and are looking for new architectural styles to do it with efficacy. There is still the question of what type of software is better suited for the tenets of the new platform-based economy. And there is a straight answer: the strategies and principles outlined in this book apply to any kind of software. However, I can observe a more compelling business case for sizeable enterprise software suites composed of heterogeneous legacy systems.

What will you learn? By the end of this book, you will take the following insights away with you: • Understand the challenges of building modern software products and primarily on-demand software in a post-pandemic world. • Unfold the secrets of internal software platforms and why adopting such an architectural style can help software organizations ship better software and gain a competitive advantage. • Refresh the architectural and technological principles for building, testing, and releasing internal software platforms and business services running with them.

Preface

10

• Discover how AWS cloud managed services, focusing on serverless, can help product engineering teams build secure, resilient, and scalable internal software platforms. • Articulate the properties, advantages, and benefits of internal and on-demand software platforms so that you can build a compelling business case for your organization.

Acknowledgments I want to thank the most incredible group of women and men who helped me write this book. I owe them an enormous debt. I am starting with Bill Ohnemus, a fellow architect and tech honoree at DXC, who was kind enough to do a technical review and write a lovely foreword to this manuscript. Also, I have to give special thanks to Martin Bartlett, Chief Engineer at DXC, for his technical thought leadership. He is responsible for pushing me to break with those old ways of approaching software development. And of course, to my boss Brian Bacsu for being a role model and leading the way to change how our large company builds, tests, and releases quality software. I want to give a special appreciation shout-out to my Architecture and Engineering teammates, who happen to be friends too and with whom I spent countless fun hours talking about software philosophy. Many of the opinions and suggestions you will find in this book are the results of these discussions, so I want to give them the credits they deserve. Thanks, Enrique Riesgo, Pablo Castaño, Jesús Suárez, Aitor Echevarría, and Rubén López. Also, to my friends and connections who helped me in many ways: showing support, sharing tips, or giving me early review feedback. Thanks, Diego Alonso, Marina Cortés, Joaquín Peña, José Juan Mora and Elena Cid. And finally, thanks to my family. Without their support, this book

Preface

11

would have been another idea lost in an ocean of miscellaneous unfinished projects.

Here we go! This book is all about recognizing the need for change in software development, looking at it in the eye, and embracing it fearlessly. There are stories in this book that will help you thrive on this change, especially around reinventing yourself, appreciating the mundane, designing for the people, holding fast, and staying true. You don’t need to follow them verbatim; just think about them as a north star. What has driven and inspired me to write Building Software Platforms? Well, other books did. I have received incredibly good vibes and brilliant insights from other technology authors after reading their texts, which I strongly recommend: • Team Topologies: Organizing Business and Technology Teams for Fast Flow, by Matthew Skelton and Manuel Pais. • Good Services, by Lou Downe. • Extreme Programming Explained (second edition), by Kent Beck. • Ask your developer, by Jeff Lawson. If this book leaves you with more questions than answers, I would call it mission accomplished. I want to implant a seed of curiosity in our community to build better software. I want every software engineering and architecture leader to be thoughtful about their decisions instead of choosing technology for the fact that it is cool. I firmly believe that good software can make the world be a better place, and we can do it healthily, without burning out our human assets or causing any harm to our environment. When we are building software, we are standing on the shoulders of giants. And this book is a physical manifestation of this idea.

Preface

Enjoy! Pablo Bermejo DXC Distinguished Technologist

12

PART I: STRATEGY Software strategy includes understanding the playing field around customers, partners, competitors, and employees. First is crucial to understand your customer’s buying patterns. When and why do they buy? When and why do they build themselves? The answer to these questions will help you design your software products accordingly to those patterns. For example, are you designing your commercial off-the-shelf (COTS) to customize your customer needs fully, or do you prefer them to use your default industry expertise? Are you aggressively factoring in your vision in the software design, or do you choose to meet customers where they are and guide them toward your vision progressively? Figuring these things out is essential. You must develop situational awareness and see how to move from a product-based market to a services-based market, or at least how to challenge those playing in the latter. That is where software is leading, and if you don’t do it, some of your competitors will. During this journey, you don’t have to build everything yourself. Leaning on 3rd party products and commodities from partners will be pivotal in your strategy and allow you to focus on what is important to differentiate yourself from the competition. Make sure you have the proper methodologies, talent, and technologies to build this vision. Developers are first-class citizens in modern software organizations and must be treated as creative problem solvers. Empowering autonomy, mastery, and purpose in these development teams will set your organization for success.

Chapter 1: Transitioning to SaaS Reinventing yourself Funnily enough, the first story in a book about software like this one is not about technology but gastronomy. Concretely speaking, it is about a restaurant, not just any but the best restaurant of all time known as El Bulli. Located in Roses (a charm-filled town in the Spanish costa brava), the place had been awarded already with one Michelin star a couple of years before chef Ferran Adrià joined the staff in 1984. With Ferran’s and his brother Albert’s ideas, the restaurant’s popularity rocketed thanks to the vast number of innovations they brought into the cuisine, obtaining two additional Michelin stars in 1990 and 1997. With Adrià already at the front of the team, during the late 90s, El Bulli helped put the new Spanish gastronomy on the map and changed the status quo of the gastronomic industry with new technologies such as the kitchen siphon and new techniques such as deconstructions. All in all, Adrià’s Molecular Gastronomy concept was disruptive enough to put his restaurant in a very prominent position in an already competitive list dominated by the French chefs with their Nouvelle Cuisine. And this is when the story starts to get interesting. In 1998, and with El Bulli already enjoying a leading position amongst the best places in the world, Adrià took a decision that surprised the entire gastronomic industry (and the whole world as a matter of fact). As he expressed in many interviews3 , he felt exasperated about going from congress to congress and watching other chefs

Chapter 1: Transitioning to SaaS

15

sharing the results of their ideas while keeping the secrets, and most importantly, the recipes to themselves. The intensity in how those colleagues guarded their creations triggered Adrià’s creativity. As a professional chef, Adrià was committed to improving continuously and changing the world’s perception of this type of cuisine as a banal activity carried out by many pompous elitists who were only interested in recognition. He held an alternative vision that went beyond making El Bulli a mere lucrative restaurant and another mission than just running a business. He wanted to make a positive impact on people’s lives through the experience of having food and, as a consequence, leave a legacy. At one of those conferences, in San Sebastián, he proclaimed one of his most famous statements that kickstarted a whole movement within the industry: “The time of keeping recipes in the drawer is over.” As he said that, he started to share all his famous recipes and techniques publicly, congress after congress. Yes, precisely this is what you are thinking. To get better, Adrià open-sourced El Bulli’s recipes as he understood that the best way of protecting an idea is sharing it immediately. The restaurant was good, but they wanted to become the best. They quickly recognized that the only way of getting there was by accelerating innovation and sharing knowledge with the community to compete in identical product and practice conditions. By releasing their secrets in the open, they transformed their technologies and techniques into a quasi industry standard adopted by many other restaurants across the planet that were pursuing the same kind of success. What happened was that the gastronomic industry was starting to compete in a commoditized market. Molecular gastronomy technologies and techniques at their genesis by the early 90s became industrialized by the early 2000s and accessible by all kinds of restaurants. Molecular gastronomy became so popular and handy that all good restaurants included quite a few dishes using this technique in their

Chapter 1: Transitioning to SaaS

16

primary menu. If we look at our kitchens now in the 2020s, we will likely find some sophisticated pieces of equipment such as siphons in a shameless attempt to impress our friends with a taco foam next time they come and visit us for dinner. What happened after this master move won’t surprise you. Adrià and his team outperformed themselves to the point that Restaurant magazine awarded El Bulli as the best restaurant in the world from a global list of 50 places in 2002, 2006, 2007, 2008, and 2009, setting a record high of five times. Again, they were competing in a highly commoditized market where most restaurants worked with the same products and techniques based on Molecular Gastronomy. Not surprisingly, El Bulli was again leading in a very competitive market and differentiating themselves from their competitors using their techniques to produce exclusive and unique dishes (Figure 1). This approach helped Adrià set a new best practice in the gastronomy industry.

Figure 1: Adrià’s creativity pyramid

However, little did he know that they were about to die of success except for one person: the visionary chef.

Chapter 1: Transitioning to SaaS

17

Adrià started to witness the first signals of burnout in his increasingly mature team. More importantly, he was very frustrated that he had to reject customer after customer because the restaurant’s waiting list extended for months-worth (and even years-worth) of reservations. The team even had to cancel lunchtime service to focus on one service per day at dinner time, which raised many questions and concerns to the management team, severely impacting El Bulli’s revenue. Adrià estimated that the team couldn’t physically and mentally make it beyond 2012, so in an unexpected anticipation maneuver, he closed the restaurant in July 2011. Although El Bulli as a restaurant has closed forever, its core concept, together with Adrià’s vision, didn’t stop there. Right after he announced that the most iconic restaurant in modern history was closing at its peak, the Spanish chef revealed the details about how he wanted to build his legacy. Therefore, and in the spirit of the experimentation and innovation that led El Bulli to become the number one restaurant, Adrià’s next project consisted of spinning up a foundation to foster gastronomy innovation, learning, and sharing experiences with the community. The new entity is known as El Bulli Foundation, and it is aimed at creating and delivering new culinary experiences for its guests. When this book was written, all we knew was that this master chef wanted to persevere in realizing his vision of leaving a legacy to the world and disrupting the gastronomy industry once again. Indeed, Adrià and his team are closing the cycle of evolution, making the pendulum swing back to genesis to reinvent themselves, formulate uncertain things again, work with poorly understood concepts, and start to compete in an undefined market. They don’t love failure at this new stage, but they tolerate the process to create innovations that will help them differentiate with unique gastronomy products in the future. This idea is presented in the following Wardley map (Figure 2)

Chapter 1: Transitioning to SaaS

18

Figure 2: Representation of the evolution of El Bulli’s practices using Wardley maps

There is a hidden lesson for the software industry despite this story’s evident survivorship bias and the manifestation of privilege that entails closing a wealthy business at its peak. We are moving toward a world where technology is so abundant that companies need to evolve from product differentiation to service differentiation quickly. To make the jump, many of these companies also need to understand that the only way to make a product better is to transform it into an industry standard. In technology, that could happen by open-sourcing it. After that, companies are competing in a services market that fosters innovation to gain new competitive advantages. It took El Bulli, a company in the gastronomy business, more than three decades to preserve a leading position by going through this magical cycle, something that technology companies are doing in a decade or less. This is how fast technology is going these days, so be prepared for a bumpy ride.

Chapter 1: Transitioning to SaaS

19

Software as a Service (SaaS) “Any X as a Service simply refers to the commoditization of the computing stack from a product-based economy to a service one. End of story.” - Simon Wardley How is it that every incumbent business is a software company nowadays? We have heard the prophecy that every company is a software company4 during the last ten years approximately. Still, it is only during the previous five that we witnessed more clear evidence of its realization. To name just a few examples, we don’t put our savings in banks anymore; we use FinTechs. We don’t protect our properties by contracting insurance carriers; we subscribe to InsurTechs. Our people don’t learn at traditional universities; they join online sessions through EdTechs. We don’t drive cars anymore; we drive software on wheels.

The software development mindset As Jeff Lawson, software developer and CEO of Twilio, tells us in his book Ask Your Developer5 , “the category of problems we can solve with software is exploding.” It is as if most of humanity’s problems are resolvable by software these days, or at least with a “software development mindset,” as Lawson likes to call it. This mentality is all about embracing change. With it, business models will benefit from the continuous improvement spirit of modern software development, which is based on rapid increments and fast feedback loops (Figure 3). This agile paradigm is easier to follow when technology is at the center of the business, not just a means to support it. This approach is nothing but applied science, a clear manifestation of the hypothesis-driven scientific method, backed by Agile software development doctrines such as Extreme Programming.

Chapter 1: Transitioning to SaaS

20

Figure 3. The tenets of Extreme Programming

Without having to wait for perfection to get going, this lean mentality is changing how companies from all industries look at Build vs. Buy patterns. These companies are more willing to build the core competencies that set them apart from the competition, only acquiring non-core to their business and not critical to their capital flow. Furthermore, they don’t buy software for the set of functionalities initially included in it but for the features that the software will eventually gain through totally transparent update processes. Every sensible company consuming off-the-shelf software adapts their non-core business process to the product and not the other way around. From a buyer’s perspective, this is an exciting way of looking at software not as an asset they can capitalize on to support their business but as a means to operate it. And that’s where software as a Service (SaaS) comes into play. What systems and processes are core and non-core to these companies, then? It depends on the industry but as a general rule, whatever makes them obtain better business outcomes. For example, for an insurance company, better results mean selling more policies, whereas for a car manufacturer, it translates into selling more cars. Complementary, for a SaaS company, a core system is whatever they help obtain more

Chapter 1: Transitioning to SaaS

21

customer subscriptions. Everything else is the cost of doing business. Remember: your customers don’t buy software anymore; they now subscribe to it.

The challenges of modern software As a software organization, you are in a race against time, as your competitors and new entrants in the technology market are raising the innovation bar quickly. To stay competitive and maintain an edge, software organizations must reinvent their way of working and embark on a journey to find solutions that empower them with the right capabilities to build modern software. And deliver it as a service. And do it continuously. Today’s software engineering and architecture leaders are privileged witnesses of the emergence of new economic models at a pace that nobody experienced before. The industrialization of development tools and the commoditization of computing are catalysts for the shift in the expectations of software consumers, who are discovering a new world full of possibilities. At a high level, the following sections summarize the top five challenges that you, as a software leader, and your organization will have to face in the transition to on-demand software to stay competitive. Managed services Transitioning to on-demand software (or SaaS) will help you in moving your customers away from a capital expenditure (CAPEX) toward operational expenditure (OPEX) or outcome-based pricing models. It means your customer’s IT department will depend less on executive approvals for products like yours, paying only for what they need and adding capabilities as required.

Chapter 1: Transitioning to SaaS

22

You need to find a solution for your customers that is aligned with the principles of managed services. Hence, the first place to look at is the consumption of managed cloud services for all your infrastructure needs. As an on-demand software provider, you will own the cost of the solution in its totality, so you want to optimize your spending and focus on your core business. As a result, your software organization must remain aware of the evolution of the surrounding technology ecosystem and be open to externalize any non-core business capability to third parties who can offer those as managed services, including other internal groups. It’s a Build and Buy strategy. Cost control In line with the principle above, it is crucial to be aware of the cost implications of your technology decisions with a special focus on understanding the pricing models of the cloud provider of choice (with niche companies dedicated only to cloud bill optimizations). Cost implications may vary not only depending on the type of managed service you choose (i.e., cloud DB clusters may be expensive while functions as a service are highly cheap across different cloud providers) but also on your architectural choices (i.e., a purist servicesoriented approach would require one DB cluster per service, incurring in higher costs, while taking a CQRS approach may result in lesser costs) Live upgrades As SaaS providers, organizations pursue an evergreen, always-on status for their services even after patches, updates, or maintenance work. This challenge is enormous for some industries, such as insurance or banking, where core systems still need to go down overnight to do batch processing or file transfers. Technology teams must research and build services, patterns, and tools for developers to create business functionalities that cater to live deployments without impacting service availability or business continuity.

Chapter 1: Transitioning to SaaS

23

Low operations No technology organization wants to see their system engineers wasting time provisioning virtual machines on the cloud, handling complex container deployment schemes, or applying security patches. Instead, they want them to move up the value chain, become the voice of the customer, participate in feature contributions to the product roadmap, and help the company’s SaaS strategy move forward. To do that, you must choose an aggressive (yet very conscious and reasoned) approach to leveraging managed cloud services as much as possible. It would be more beneficial if you don’t force the development teams to use them in every scenario just for the pleasure of using the latest, most extraordinary technology. It means that instead of finding the technical justification to select a managed service by default, it would be best to identify the business justification not to use it. If there aren’t any, then yes, go ahead with your cloud service. User experience As your software organization transitions from a core systems vendor to a SaaS provider, you will also be shifting focus and investment toward a reimagined integrated suite of applications. Many studies and analyst reports show that a compelling User Experience is increasingly becoming a critical decision factor for businesses to subscribe to a particular SaaS solution over another. For example, in the article “Transforming life insurance with design thinking”6 , McKinsey describes how the insurance industry must adopt Design Thinking to foster engagement amongst the youngest population, such as millennials and Gen Y. As stated in this article, users now have greater expectations “for highly interactive digital experiences as well as fast and even instant delivery.” To that end, any engagement services built and delivered to your consumers should be designed with a set of professional design principles that embed your brand identity and culture. This is how

Chapter 1: Transitioning to SaaS

24

you provide them with a family of solutions that feature consistent and compelling user experiences.

Figure 4: The characteristics of traditional software compared to on-demand software

Building and buying software The heterogeneous space of problems that companies from all sectors face these days as they progress in their digital transformation journeys may result in mutually exclusive requirements and expectations. Being pulled in many directions is not new in the development of software products, where you are in charge of designing your abstractions and baking in your domain expertise. However, delivering them as a service has relevant and unprecedented implications, primarily related to the challenges outlined before.

What do customers want? To start with a straightforward example, let’s imagine you have a customer seeking outcome-based software subscriptions as they want to optimize their spending, so they like SaaS. At the same time, you have another customer who needs to strictly conform all the user

25

Chapter 1: Transitioning to SaaS

interfaces to their corporate branding guidelines, so they seek highfidelity. Finally, a third one needs to configure their business rules as per their core processes, so they want low-code tools for building new apps. You may even have a type of customer who wants everything! Well, as anticipated earlier, these are mutually exclusive requirements, and all these use cases would have been a challenge even for on-prem software. This challenge comes with additional ramifications for SaaS engagements, mainly because your customers do not own the software asset. You do. It means that these companies have higher expectations regarding configurability, speed, and self-service. Hence, they are unwilling to hire extra professional services on top of their subscription and consumption fees. The table below summarizes this dilemma and depicts the challenges that SaaS providers need to address: You Build Low Fidelity / Low Cost High Fidelity / High Cost

Pay per use

Customer Builds No-code

Professional services

Custom development

In short: • Customers want no-code, but they want high brand fidelity • Customers want high brand fidelity, but they want low cost • Customers want low cost, but they want high customization • Customers want high customization, but they want to pay per use Therefore, it is vital to consider the best business model and strategy for on-demand software and whether your customers build themselves or buy from you. Or, most probably, a combination of both7 .

Chapter 1: Transitioning to SaaS

26

This idea is depicted below (Figure 5) and explained further in the subsequent sections.

Figure 5. Build and Buy strategies based on the need for differentiation and development capacity. (Credits: Celent Insights)

Customers buying your software If several companies subscribe to your SaaS solution, what’s the differentiation and competitive advantage between them, especially in well-established and highly regulated industries like real estate, insurance, or banking? One could think that good on-demand software should include sophisticated visual solutions to cater to extreme levels of customization so customers can factor in whatever makes them different, either it is User Experience (UX) or business processes. However, be careful with that decision. It is important to remember that configuration tools for on-demand software come with limitations and tradeoffs since they offer speed of development at the expense of flexibility. To keep this scale tipping toward velocity, low-code tools can’t make every single software parameter configurable. Still, if you cross that line, the resulting system may become too complex for a business user, so they will need to hire professional services to extend, adapt, or customize the software to their needs.

Chapter 1: Transitioning to SaaS

27

In any case, there is a sweet spot for on-demand software products. Especially in mature (and regulated) industries, this type of software is more suitable for essential yet undifferentiated workloads such as process-heavy back-office applications that don’t need customizations because they are not strategic. In other words, things that are just a cost of doing business. This idea is totally in line with what Jez Humble (co-author of software development best-sellers and SRE at Google) expressed on many occasions8 via Twitter: “COTS is for business processes that aren’t strategic to your organization. So you should modify your business process to fit what the software does out of the box.” This is important because, as stated by Chris Swan (engineer and former Fellow at DXC Technology) in his article “Industry best practice as expressed in software”9 , if you allow your software to be highly customized, you are opening the door for your customers to move backward in terms of evolution (Figure 6). Consequently, if you keep improving your software to become a leading service, your customers won’t be able to tap into all the new benefits due to the impossibility of upgrading.

Figure 6: A visual representation of backward evolution due to excessive customization

Chapter 1: Transitioning to SaaS

28

Following this line of thought, it is entirely acceptable to make your SaaS configurable to meet your customers’ needs but within reason. It is essential to make this configurability lightweight so the software does not become so unrecognizable that it turns into a ball of mud. It has to be large enough to let your customers adapt some non-critical parameters and build small integrations independently. But they must be small enough so they can still do it self-service and at speed. Suppose your customers can’t use these configuration features to adapt your software to their needs. Then, as Jez Humble mentions, it is preferable to alter a business process to fit a software product than to have edgy low-code customization capabilities. Technical debt can pile up in many forms, and complex configurations stored in a database are among them. In summary, you need to pay attention so that your customers don’t trade efficiency for effectiveness when using configuration tools.

Customers building their software We know that leading companies compete on service rather than on product, which applies to all sorts of firms from all industries. As we have learned from the McKinsey report mentioned earlier, user experience is a crucial differentiator element to compete in the services market. It matters—a lot. Your final customers are willing to spend money in building custom user experiences that give them a competitive advantage, so a configuration capability would not suffice. Instead, you may need to provide them with the necessary building blocks and components to build these applications at speed on top of your software and do it programmatically (e.g., a business-oriented UI component framework including API consumption).

Chapter 1: Transitioning to SaaS

29

Twilio Flex10 is an excellent example of this. We can see how developers are directly in charge of the extensibility and customization of the system through code, which is the highest level of customization possible. Solutions like this exist because there is a point at which adding extra configurability is detrimental to the system. The main reason is that the overhead introduced by excessive configurations exceeds the benefits that otherwise could have been obtained more quickly through coding. Most companies are better at dealing with technical debt than avoiding it because they don’t know how to move forward in terms of evolution.

Evolving toward a service economy Do companies know what makes them different? Do they know what is core for them and what is not? Have they developed a situational awareness of the market in which they are competing? Does context drive them? It is hard to tell, but, as usual, the answers depend on to whom you are talking. That is why having a good Build and Buy strategy is key to success, and that is why connecting with the right person is so important. Although this book does not have all the answers and will probably leave you with more questions, it aims to give you a decision framework so when you are involved in that type of conversation (either with your customers or your stakeholders), you can ask the right ones. All this is of utmost importance for software engineering and architecture leaders like you. As Simon Wardley mentions in his article “Here comes the farmer …”11 , which dates back to early 2008: “A key part of this transition is that we move to a service economy that competes on service rather than on product.”

Chapter 1: Transitioning to SaaS

30

It would be best to have situational awareness and understand the market context in which you are competing. If your product is unique and you are pioneering in a new field, you compete on the product. Eventually, more and more companies will pop up with functions like yours that only flatten the competitive terrain, so you will need to start looking for another differentiator. In other words, when software products compete in a commoditized market, the best service prevails. Thirteen years after Wardley’s blog post about the topic, McKinsey also emphasizes the importance of moving to a service economy combined with other technology enablers, as described in the article “SaaS, open source, and serverless. A winning combination to build and scale new businesses”12 : “Using SaaS and serverless to free IT from infrastructure management greatly reduces the complexity of app development and deployment. This, in turn, allows tech teams to organize around products—for example, cards or loans— which brings code closer to the business.” This idea, along with its organizational impacts of the technology and the evolution of practices toward more industrialized and standardized software development, paves the way for internal software platforms. The rest of this book dives deeper into all these concepts to help software engineering and architecture leaders like you thrive on this exciting transition. Transitioning to SaaS is vital: if you don’t do it, one of your competitors will.

Chapter 2: Internal Software Platforms Appreciating the mundane When Adam Steltzner - who led the Entry, Descent, and Landing (EDL) team in landing NASA’s Curiosity rover on the surface of Mars - accepted a new mission in 2003, he knew little about the type and number of roadblocks that were going to get in his way. Or maybe he did know, as he explains in his book The Right Kind of Crazy. This passionate autobiography narrates how Steltzner’s career has been a cumulus of first-class problem-solving experiences that reached its peak in August 2012 when Curiosity touched down on the red planet. Beyond all kinds of unbelievable and onerous engineering challenges that one can imagine when you are trying to put a 2000 pound unit of hardware on the surface of another planet floating 140 million miles away, Steltzner highlights one particular aspect of the Curiosity project that is especially striking and attractive above many others. This was what he calls “appreciating the mundane,” or in other words, acknowledging that even if you have written one of the most sophisticated pieces of software in human history to guide a flying robot through space, all it takes to kill your project is misfunction in one minor utility component (e.g., a receiver, a radio, a battery) the likes of which you could have bought at your local hardware store. That’s hard to assimilate. Even one of the most complex machines known to society is nothing but a piece of gear made by the composition of multiple state-of-theart components into higher-order subsystems that, once composed together again, produce a final working system. It is a fair assumption

Chapter 2: Internal Software Platforms

32

to make, though, that those predictable elements are supposed to work, except it is inaccurate. As Steltzner advises, you have to design, build, and test your whole system for failure from the ground up, so the higher-order systems that are core to your mission rely on stable and tested infrastructure. The lesson learned from Steltzner’s experiences is that focusing on the stuff that had nothing intrinsically to do with your business function is of utmost importance for the success of your project. Even the things that you take for granted need to be designed, built, integrated, and tested thoroughly to ensure you can achieve your mission goals. Taking this lesson to the software development space, many technology utilities, frameworks, and other cross-cutting services have nothing to do with your domain-specific business logic but need to be managed anyway. The final system aggregates multiple and diverse loosely coupled components that need to work well together to produce a valuable function for the business. This observation gains particular relevance with the proposed serviceful architectural style for modern on-demand software.

Why do you need an internal software platform? The word platform means different things in different contexts. On the one hand, for go-to-market teams, it may refer to a business model by which companies offer provisioning or development utilities to their end consumers so they can build new applications and create a community around them. This business model is also known as PaaS (Platform as a Service). On the other hand, the word platform also refers to technical capabilities that lay a substrate to help software development teams create new software products.

Chapter 2: Internal Software Platforms

33

While the former concept entails that customers have direct access to the platform, the latter is invisible to the final users, decoupled from the value it helps create, and is built in-house to solve the particular development challenges of software organizations (Figure 7). These technical capabilities are also known as internal software platforms and are the subject of study in this book.

Figure 7. PaaS model vs. Internal Software Platform model

Helping developers with the right tools Let’s review some of the changes that have shaken the technology industry in recent history and how software organizations dealt with them. The commoditization of cloud compute and the proliferation (and adoption) of managed cloud services during the first years of the past decade enabled the co-evolution of other software engineering practices that helped teams build and release software faster. This progression eventually led to DevOps, a technique that has helped reduce the dependency on a centralized operations team during the software development lifecycle. Or, at least, it tried.

Chapter 2: Internal Software Platforms

34

This practice has somehow added an extra cognitive burden to developers who can now build and deploy their applications on the cloud in a quasi self-service manner. In this scenario, if you withhold the proper standards and tools from developers to do this work, you will get precisely the opposite of what you were looking for, which is letting developers spend quality time building domain-specific services. Or even worse. If you fail to run a smooth transition between development and operations, developers will be creating features on top of others that are not operationally ready. This will drastically increment the instability of your software, eventually making it collapse. Letting developers use the right tools to build software is critical to your organization’s success. At the same time, there is a risk in allowing development teams to make too many decisions about the base infrastructure and tools they need to build their services. Therefore, software organizations are now looking for ways to help these teams relocate efforts that have a low impact on the business.

Building tools for developers How are high-performing software organizations solving these risks and constraints then? The answer is by building tools to show their developers that they believe in them as creative problem solvers they are trying to help. There is nothing more motivating for a software development team than having a stable infrastructure and an efficient development process that maximize the value of their creations. To achieve this, software engineering and architecture leaders like you need to start treating every software engineering problem as per the tenets of a physics problem: data has mass, computing consumes energy, and network takes time. By realizing that these are finite resources at the disposal of development teams, you will be better

Chapter 2: Internal Software Platforms

35

positioned to design and build the necessary internal tools that remove the obstacles in every developer’s way to deliver business value. Therefore, during the past few years, we have seen how a few software organizations have started to spin up product engineering teams13 that create and support internal software platforms intending to enhance time to market, modernize infrastructure operations, and improve customer experience. Internal software platforms are helping organizations reinvent the way they build and deliver modern software, providing them with many of the capabilities they need to start competing in a service-based economy.

What is an internal software platform? An internal software platform is a collection of cross-cutting services, principles, and tools that work in harmony to allow developers to build, test, and deploy other domain-specific business services. Platforms provide the digital means - with an accent on documentation, self-service, and APIs - to enable the different development teams to create software products that end consumers will love to use. It is critical to point out that internal software platforms are not buckets of independent shared services. Instead, the platform approach is based on the balanced correlation between transversal solutions that share a common goal, thus introducing a new architectural style. Under the precepts of this new style, the various business services that run with the platform will enjoy revitalization opportunities for their architecture and technology direction. As a software engineering and architecture leader helping developers in the organization, your goal is to harvest common patterns used by

Chapter 2: Internal Software Platforms

36

these teams and industrialize them as transversal services. In other words, transform development tools and practices into commodities. This way, you can give developers a worry-less working environment so they can spend more time adding value and less time working on undifferentiated routines.

The mission of an internal software platform Software engineering and architecture leaders of today spend way too much time dealing with the problems of yesterday. For example, they are absorbed in managing complex infrastructure schemes inherited from previous leaders who couldn’t (or didn’t want) to address technical debt. The reality is that enabling a better future is directly constrained by the operational obligations of legacy systems. This idea of the Ship of Yesterday is captured in the book The Day After Tomorrow: How to Survive in Times of Radical Innovation by Peter Hinssen and documents the problem that many leaders experience with enabling the creation of new value. Dealing with a pile of inherited responsibilities is the source of unplanned work that removes you from implementing your future vision. Leaning on the concepts introduced by Hinssen’s book, internal software platforms are the Ship of Tomorrow that help developers navigate an ocean of technologies, processes, and engineering responsibilities so they can focus on creating value. With that in mind, this book aims to guide you in constructing this ship to enable developers to build better software. This enablement is possible thanks to the following solutions that platforms provide: • The rationalization of access and consumption of managed cloud services. This aspect is of utmost importance because providers like AWS counted on more than 200 fully-featured services when this book was written. Therefore, a powerful and valuable platform capability could be streamlining access to this catalog through abstractions that provide functional

Chapter 2: Internal Software Platforms

37

value (orchestrating services instead of just wrapping them) or reference configuration guidelines when direct integration is allowed. • The heavy-lifting of repetitive and low-value tasks through automation, especially for the provisioning of environments and the deployment of services on top of them. With the empowerment of developers and the transition to DevOps, developers must have the proper tooling to execute other engineering activities that surround pure coding and do it effectively and speedily. • A curated developer experience that makes software engineering an enjoyable job. This action has a substantial effect on people’s health, and it is as easy as providing good pieces of documentation for the different services that comprise your internal software platform. • The offload of the cognitive burden from developers so they can concentrate on building quality software. This offloading can be achieved by industrializing all the undifferentiated functionalities into a common internal software platform employing readyto-use cross-cutting services. The following picture (Figure 8) puts all these elements together using Wardley’s representation, using the vertical axis (value chain) to see how internal software platforms enable value.

Chapter 2: Internal Software Platforms

38

Figure 8. The mission of internal software platforms

Experiences captured in internal software platforms So what is this cross-cutting domain where internal software platforms provide their best value? In general terms, platforms enable an effective transition toward SaaS by providing transversal solutions to the most common development challenges. Although the experiences that internal software platforms harmonize are contextualized to the dynamics of every organization, we can split these solutions into the following categories at high-level: • Automation: A solution that consists of environment provisioning and release management services for the continuous delivery of composable software. It includes inventories of service

Chapter 2: Internal Software Platforms









39

metadata that can act as an environment manifest. Their goal is to present important deployment key-value properties that help with dependency management and cost tracking. Core infrastructure: A standard topology that leverages managed cloud services with a preference for serverless computing. These topologies are defined and implemented as code and include APIs, events, network, identity, access, and data management services, along with their REST APIs and SDKs. Core services: Enabling functionality to assist developers with cross-cutting concerns such as authentication, authorization, event management, file transfer, or logging. Also, this element includes the integration with observability and monitoring tools which are critical features in serverless environments, including the visualization of centralized logging and cybersecurity alerts. Management console: A set of Web interfaces for business expert teams that include configuration capabilities for the different core and business services. Design: A design system that includes designer and developer tools for building consistent and compelling user interfaces.

Listing such a bill of solutions does not mean that you have to implement all of them from day one and in that particular order. Instead, it would help to start by understanding your direct customer’s (developers) pain points and selecting an assortment of components that carry out the heavy-lifting of their undifferentiated workloads. This way, these developers can start offloading some work and focus on building better domain-specific services. That is great. But how? For some software engineering and architecture leaders, curating a set of UI components so the business service teams can build visually consistent applications is an excellent first opportunity for building an internal software platform. Also, for others, building an internal software platform would mean creating automation guidelines for integrating state-of-the-art testing and quality tools within the internal

Chapter 2: Internal Software Platforms

40

CI/CD tooling to ensure a minimum level of excellence across the board. And, of course, a few could be interested in building a standard cloud topology, so the services teams are given a landing zone where they can integrate their workloads. It all depends on the context and priority of the problems you need to solve for the developers.

Serviceful software platforms In any case, there is still the question of how a combined experience for all these specialties and technologies would look like. Although building and integrating all these capabilities should be an endeavor for you to launch iteratively and incrementally, it is important to figure out how these components would ideally work in harmony as an integrated internal software platform experience. The following picture (Figure 9) depicts this idea using Wardley mapping, a technique to create situational representations and help businesses visualize where they are and how to move forward.

Figure 9. Wardley map for an internal software platform

As Simon Wardley - the creator of this technique - likes to say: “all maps are imperfect.” So, in this imperfect representation, we can start

Chapter 2: Internal Software Platforms

41

to identify the great block of business value that internal software platforms enable. Building an internal software platform will require a combination of custom-built components (using agile techniques), commercial offthe-shelf products, and utilities from outsourcing suppliers. Consequently, the elements of an internal software platform sit at different positions of the evolution and value chain axes, unveiling a map that you can use to decide the next moves to succeed in your platform strategy. And that is what this book is all about—building software platforms. As the theory of Wardley mapping says, if something interposes in your way to creating value and it does not exist, then build it.

Platform components The essence of the internal software platform is invisible to the end consumers, and it is fundamentally custom-built because it is core to your organization’s transition to SaaS. Hence, platform services will help you gain an advantage and differentiate yourself from your competitors. They will enable service developers to build better domain-specific software through more efficient SDLCs. Of course, the internal software platform also leans on technology products such as 3rd party APIs and frameworks and infrastructure commodities to make that happen. These are some characteristics of the internal software platform elements included in the previous Wardley map: • They provide rapid learning experiences to the developers who consume and build the platform. The reason is that the enabling nature of the platform services requires continuous and short feedback loops between the different teams.

Chapter 2: Internal Software Platforms

42

• Most of the components play in forming markets, especially where domain expertise is still needed to build them. For example, this is the case for the core infrastructure topology and service lifecycle automation solutions based on serverless computing. • They help unleash business value by enabling service developers to create better functional software. This way, organizations can obtain a return on their platform investment thanks to a higher volume of customer subscriptions. • Waste and failure are undesirable yet tolerated when building platform components. This lean development process helps platform and business service development teams to get better at their jobs. Funneling innovation Be aware that the pace at which many products and commodities iterate and add new features will introduce co-evolution of practices that will challenge your custom-built solutions. This progression is especially accentuated with those cloud providers who are moving fast to a utility-based economy. You must stay vigilant and identify what components will pop up as critical parts of the internal software platform so you can build them to enable revenue streams. Complementarily, you must study what elements will evolve toward a commoditized market so you can outsource them. Internal software platforms funnel leading-edge market innovations and adapt them to your organization’s tempo.

Building an internal software platform is an exciting and rewarding enterprise-level research and development endeavor. Hence, it is of utmost importance to stay on top of the latest technology and react to any just-in-market innovation to keep helping developers do

Chapter 2: Internal Software Platforms

43

their jobs with efficacy. When commodities eventually replace the components of your custom-built platform, you still need to assess them to offer developers a new set of selected experiences so they can cope with this evolution (Figure 10)

Figure 10. The innovation funnel of internal software platforms

Serverless software platforms “So what does the future look like? All the code you ever write is business logic.” - Werner Vogels, CTO of Amazon.com The history of software architecture is one of adding layers of abstraction generation after generation with the goal of managing complexity. With that in mind, serverless computing is the latest layer of abstraction in modern computing aimed at helping you reach peace of mind. It does so by handing over a significant part of the infrastructure management complexity to somebody else, in this case, a specialized cloud provider. By going one level up in their managed services stack, the cloud provider supplies you with auto-scalable, pay-per-use, managed runtime engines for your applications (Figure 11), and they do it as a

Chapter 2: Internal Software Platforms

44

service. All this results in a new architectural style mainly driven by the economic forces of a commodity-based market.

Figure 11. Infrastructure as a Service vs. serverless

Managing complexity One common saying captures the spirit of this new ideal: “It is preferable to have a friend with a boat than owning a boat.” It sums up the outcome and impact of the commoditized computing resources introduced by serverless and the new levels of managed complexity that come with them. Of course, complexity is like energy: it does not go away; it is just transformed. It means that the complexity of provisioning, scaling, and managing runtime environments for your applications is still there but managed by the cloud platform. The cloud provider has already made some design decisions for you and created technical abstractions available as managed services to help you complete your applications more efficiently. You now need to write minimal code that fills up the space left by those cloud-managed resources to implement your business function. And this is a game-changing paradigm once it clicks you.

Chapter 2: Internal Software Platforms

45

With serverless, the cloud is an asset, and code becomes a liability.

Let’s look at it from the following perspective: once you open a cloud account, there is a rich set of ready-to-use functionalities available to you that only increase over time. Without you taking any action, the collection of functionalities that you will have tomorrow is probably richer than the one you have today and will cost you nothing. On the contrary, once you start writing code for your applications, you will incur execution costs. That’s because code in a rich serverless environment is debt. Actually, all code is debt. Coming back to the boat analogy, it is your friend who takes on all the responsibilities for owning the boat, such as maintenance, licenses, insurance, registration, or docking. Yes, you will be assigned some duties as you step in the boat if you want to sail with your friend (that’s what boat owners do), but these are minimal compared to owning the whole thing. At the same time, you enjoy the best part of it: sailing with your friend when there is a good day. It’s a trade-off but a non-zero-cost one. The economic forces of serverless Designing is making decisions about boundaries. In software development, these decisions have been traditionally influenced by functional specifications and non-functional requirements (NFR), such as performance, security, or availability. With serverless computing, there is one more NFR to consider when making design decisions: costs. Due to the pay-per-use nature of the computing utilities available in modern cloud platforms, software architects and designers can directly influence the overall software costs with their design decisions. Since the cloud provider has already made some design decisions for

Chapter 2: Internal Software Platforms

46

you and created some constructs to help you focus on efficiency, you must decide how to fill the spaces left by the cloud platform. Hence, implementing one piece of the application with one managed cloud service or another is now a critical element that you need to consider when analyzing design tradeoffs. And it can go even further. Low-level, technical design decisions at the component or function level also have a direct impact on costs. For example, an ill-designed algorithm with an unnecessary high cyclomatic complexity can incur charges that otherwise would not have happened had the logic been optimized. Building the future The name serverless does not make justice to the whole new set of technologies, practices, and values brought in by this computing model. When you use this type of technology, it comes with servers and lots of infrastructure behind the scenes, but of course, you don’t see them, and more importantly, you don’t have to manage them. The cloud provider does it for you so you can focus on writing the best business logic. This has a huge positive impact on the time to business value. Many authors refer to this new paradigm as worry-less, low-operations, or even runtime as a service. It is all about industrializing the past so developers can build the future. The main reason why the advantages introduced by serverless are relevant for building software platforms is twofold: • On the one hand, it enables very compelling engineering and architectural options you may want to build upon. It is not new cloud technology. It is just standard technology on the cloud, but the way developers can use it unfolds unprecedented ways of building quality software products, including your internal software platform.

Chapter 2: Internal Software Platforms

47

• On the other hand, and more importantly, the economics of serverless computing services align well with the spirit of managed complexity that internal software platforms espouse. You can directly translate these characteristics into software benefits, such as cost reduction, live upgrades, and lesser operations. By choosing managed services, you don’t risk cosplaying infrastructure providers instead of building your business. You can now focus on business outcomes, not technology accomplishments.

AWS serverless catalog The AWS ecosystem of serverless computing services enables a new and powerful architectural style aimed at increasingly industrializing the past so we can focus on building the future. A solid and committed community that contributes to the adoption of serverless with patterns and best practices14 backs up this ecosystem so that we can embrace this new paradigm with confidence. Amazon Web Services (AWS) provides you with a very comprehensive set of serverless computing services that span multiple technical layers and capabilities: • Utility runtimes for various programming languages with AWS Lambda, an auto-scalable service that is priced per millisecond • Fully-managed container infrastructure with AWS Fargate • NoSQL tables as a service with single-digit millisecond response times with Amazon DynamoDB • Highly available, performant, secure, and scalable object storage with Amazon S3 • API definition, deployment, and overall management with

Chapter 2: Internal Software Platforms

• • • • • • • • • •

48

Amazon API Gateway Infrastructure monitoring and observability services with Amazon CloudWatch Identity and user management as a service with Amazon Cognito Auto-scalable data streaming platform with Amazon Kinesis Fully-managed queue infrastructure as a service with Amazon SQS Highly available and reliable pub/sub messaging service with Amazon SNS Simple, reliable, scalable, and secure file system services with Amazon EFS On-demand relational database engines for MySQL and Postgres using Amazon Aurora Highly available and fully-managed state machine engine with AWS Step Functions Auto-scalable and reliable event broking with Amazon EventBridge Fully-managed, end-to-end container application service with AWS App Runner

Also, a worldwide group of AWS Serverless Heroes leads this community. They are sharing their knowledge by creating and distributing content in many shapes and formats, thus helping developers around the globe to roll out serverless-first architectures on production-ready environments. Let’s look at how AWS is disrupting software development on the cloud from another perspective. One of the greatest architects, Christopher Alexander, has developed theories about human-centric design that can be easily extrapolated to software architecture. In his book A Pattern Language, Christopher introduces the concepts of negative and positive spaces to refer to the territory left between

Chapter 2: Internal Software Platforms

49

buildings as a byproduct that, in theory, is not to be used (Figure 12)

Figure 12. Negative spaces vs. Positive spaces

Of course, negative spaces are shapeless, empty, and difficult to reuse. In contrast, positive spaces have a distinct and finite shape, and in most cases, they are designed to utilize them right away. Leaning on this analogy, what AWS has done with its serverless computing services catalog is creating negative spaces between the managed services that are ready for their customers to be filled up with their business applications in an easy, convenient, and productive way. Developers can now focus on gluing those services together with quality code to shape a final working system.

The business overlay The internal platform enables a business overlay that is created by the business service development teams. This layer is visible to your end consumers, where all the value is concentrated, built at speed, and delivered as a service. Furthermore, this business overlay still leverages the underlying technology directly, such as Web frameworks and lifecycle management on the cloud. The main reason is that the business overlay’s value is not built “on top” of the platform services, which are not an encapsulation of the cloud services but a companion that offers curated know-how. Due to its pivotal relevance, this idea

Chapter 2: Internal Software Platforms

50

is developed further in the forthcoming chapters of this book. There are numerous advantages to building business services following the architectural style of internal software platforms. Chief amongst them are: • Self-contained components are focused on a concrete business function, totally independent and decoupled from the infrastructure. • Service functions are discoverable via REST APIs, enabling universal and standards-based integration and exposure through a modern and compelling user interface. • Reduction of overall software costs thanks to the deduplication of basic and transversal infrastructure resources. • Each service can run in a container, and ideally, as one or more serverless functions. Following a serverless approach to building services means developers will take advantage of the platform’s inherent scalable infrastructure. With internal software platforms, developers can start offloading some undifferentiated work and build better business services.

Build vs. Buy patterns for the business services As per the Build vs. Buy guidelines presented in the previous chapter, services teams should focus on building the main domain logic that is core to the software organization’s business, buying other undifferentiated services from an internal software platform. It is the sole responsibility of every service team, independently, to decide whether they want to build their non-core functionalities or buy them internally. This decision will be based on every service team’s appetite for scalability, agility, and maintainability risks and their preparedness for assuming all their infrastructure work.

Chapter 2: Internal Software Platforms

51

Platforms as internal products The critical part here is approaching these challenges with a product mindset. This allows you to build the platform iteratively by adding features from a product backlog with a dedicated software engineering team. Additionally, like any other product, dedicated service teams operate internal software platforms once they are released to a customer as part of the final software. As Mathew Skelton mentioned multiple times in his book and online Webinars: “Alan Kelly likes to say that software developers love building platforms and, without strong product management input, will create a bigger platform than needed.” Treating your internal software platform as an internal product has numerous advantages, such as organizing developer requests in the features backlog or making a roadmap available at the portfolio level. As developers’ needs change and evolve, you adapt the platform accordingly. Similarly, as technologies evolve, you design the platform as a filter that curates external innovations so developers can keep up with them. This way, the different platform components can then be planned, groomed, assessed, and most importantly, designed and implemented by a dedicated software engineering team. This product mindset also brings compelling challenges you need to be aware of. Chief amongst them are the dependencies created between the different business and platform services. To put a concrete example: 1. Suppose that your software organization follows a release cadence. For example, you release new software versions to your customers every quarter. 2. During the quarterly development cycle, there is a business service team that needs stable platform capabilities to finish their functionality on time (e.g., Release N)

Chapter 2: Internal Software Platforms

52

3. If the internal software platform adheres to the same cadence constraint, the platform’s enabling functionality for that business service team has to be released at the end of the previous cycle (e.g., Release N-1) 4. It means that once a business service team identifies a need in their backlog that depends on the platform, they need to wait at least six months (2 releases) before giving it to their customers. Of course, this is a challenge that could be solved by continuously delivering the platform services to the developers at the right time. But is it really a challenge? Again, it depends on your context. On the one hand, if you already had CI/CD pipelines that catered to the continuous delivery of inter-dependent services, plugging the platform services into this framework would be smooth (Figure 13). On the other hand, if you weren’t practicing continuous delivery, this is a problem you need to resolve. It is the perfect opportunity for the platform team to introduce a new scheme of tools and guidelines to enable a fast flow of features for the platform and business services.

Figure 13. Continuous delivery of platform components

You can see the platform as a unique product within your software organization’s portfolio. The reality is that, amongst all the services, this is most probably the less functional one. Therefore, it would make sense to let specialized governance bodies like your organization’s Technical Architecture Group (TAG) or the Cloud Center of Excellence (CCoE) own this overarching artifact. This is an important aspect to consider because, as we will see in the next chapter, internal

Chapter 2: Internal Software Platforms

53

software platforms also introduce a new architectural style that needs to be nurtured and maintained by the right expertise. Going off-platform Since products and services are optional by nature, this strategy makes the internal software platform’s adoption non-mandatory. To some extent, this approach opens the door for skeptical developer users to take an off-road route, question the need of the platform, and choose the technology for their services. For example, you may find developers reluctant to use the platform’s Event Management service (if you decide to build one). Instead, they want to hold complete control of their topic subscription and queue polling by directly using the underlying managed cloud services. Also, teams might be reluctant to use your standard CI/CD pipelines and instead use their processes to build and release their business services. You need to pay attention, though. This maneuver could enter in conflict with your platform’s technology vision. Going off-route to access the underlying technology directly is not necessarily bad, especially when cloud providers rapidly expand their functionalities. However, when doing so, service development teams are acquiring additional responsibilities that they otherwise would not have to worry about using the platform. Software organizations investing in internal software platforms show their developers that they believe in them as creative problem solvers.

By introducing an internal software platform in your enterprise architecture, you not only bring in useful abstractions but also guidelines on how to use the underlying technology. Try to highlight the importance of these design guardrails to the service developers and demonstrate to them how the platform helps in enforcing best

Chapter 2: Internal Software Platforms

54

practices. Unfortunately, those skeptical developers may still see this as a drawback because of the reduced flexibility. But it is an advantage. Those constraints are not necessarily a bad thing, and factoring your opinions and technology expertise into standard platform services, principles, and tools is suitable for the developers’ efficiency in the long term. There are numerous benefits and economies of scale in consuming a common, consistent, and standard set of services.

What are the benefits of internal software platforms? We’ve been talking a lot about the properties and advantages of platforms for software organizations to build and deliver modern software as a service. However, your final customers typically do not care about how you design your internal architecture and whether you use platforms or not to build the products they will consume. So, how can you translate all this technical conversation into tangible benefits and convey them adequately to your customers? First and foremost, let’s start by redefining why an internal software platform is important to the eyes of your end customers. For them, all they will receive is a cloud-based solution that runs core business applications that enable them to digitize their customer, ecosystem, and partner experiences. As Cristóbal García and Chris Ford from Thoughtworks state in their article “Mind the platform execution gap”15 : The decision to commit to an internal developer productivity platform is an economic one. The argument in favor depends on efficiency, quality, and time-to-market benefits exceeding the financial, talent and opportunity costs incurred in its construction and evolution. If you can’t

Chapter 2: Internal Software Platforms

55

articulate the business case for your platform, you can’t responsibly adopt it.

The internal software platform business case What could this business case look like? Now, on a more detailed (yet still high level) note, platform-based modern software will help you deliver the following benefits to your final customers: • Rapid deployment of products and services: Thanks to the built-in platform automation services, customers can be onboarded and have a working environment in hours (not days, weeks, or months). This idea will help your customers capitalize on new opportunities by introducing new products or opening new markets. • Inherently secure: Internal software platforms are designed from the ground up to be secured. Using an infrastructure-ascode and policies-as-code approach, you will ensure that every environment you spin up is certified for security and compliance in any geographic region. • Inherently available and resilient: Internal software platforms leverage auto-scaling and high availability capabilities provided by the underlying cloud platform of choice. Also, platform instances and services are deployed in multiple availability zones and even in various regions. • A la carte business service provisioning: By using different catalog items, customers will be able to deploy just the business services they want at every moment, paying only by the outcomes they obtain from the system. • Complete monitoring and observability: For every business service integrated with the internal software platform, operations teams will provide a full spectrum of ITIL and Service Management processes, including customer self-service reports and automated alerts coming from the platform.

Chapter 2: Internal Software Platforms

56

• Evergreen updates: Automated feature upgrades keep all customers on the most up-to-date version of the business services integrated with the internal software platform. This process is done transparently and is covered by their SaaS subscription fee. • User experiences: Customers will receive reimagined personabased applications that include industry-specific features and 360-degree end-user views that allow them to elevate their consumer experience and extend their distribution channels into non-traditional relationships. • Ecosystem: Services built with internal software platforms offer comprehensive capabilities that help customers consolidate disparate systems and integrate new features from third-party solution providers The world’s most mission-critical platform With more than one hundred thousand developers, the United States Department of Defense (DoD) is reportedly one of the world’s largest software organizations. As Nicolas M. Chaillan (former U.S. Air Force and Space Force Chief Software Officer) states in his farewell public note titled “It’s time to say goodbye!”16 : “Timeliness is foundational […] for enabling the delivery of capabilities at the pace of relevance.” And that’s where Mr. Chaillan and his team came in creating the DoD Enterprise DevSecOps Initiative, probably the largest DevSecOps engagement in the world, within the most complex organization in the world. Not surprisingly, there is an internal software platform at the core of this initiative. This custom-built tool is Platform One17 , a wellintegrated and ready-to-use set of managed services that allow DoD users to deploy applications and solve the most common software development problems. This way, Platform One enables developers

Chapter 2: Internal Software Platforms

57

to focus on building and delivering capabilities that meet the basic needs of U.S. warfighters. Platform One is composed of the following services: • Iron Bank: Authorized, hardened and approved container repository that supports the end-to-end lifecycle needed for modern software development. • Party Bus: The environment and services that DoD users need to develop and deploy their software applications. It is basically aimed at government software programs that need rapid development of approved, working mission applications to warfighters. • Big Bang: The Platform One Infrastructure as Code package deploys a software factory in a customer-owned environment. It helps help DoD users build a custom software factory for their specific mission needs to enable faster development and deployment of their mission applications. • CNAP: Platform One’s Cloud-Native Access Point brings a full Zero-Trust technology stack enforcing device state, user RBAC, and software-defined networks. • Cyber: Platform One provides integrated cyber testing, monitoring, and event management for the infrastructure, platform, enterprise services, and customer applications. Curiously, openness is one of the most compelling characteristics of this internal tooling and has been one of the guiding principles of the Platform One team’s work at DoD. Consequently, Platform One’s reference architectures18 and source code19 are public and open to the entire world, thus becoming the most significant departmental contribution to the open-source community in U.S. history. As described in Mr. Chaillan’s article, Platform One helped the U.S. DoD tap into the benefits of internal software platforms described above:

Chapter 2: Internal Software Platforms

58

• Rapid deployment of products and services: Deliver faster, better quality, and more secure software with incredible DORA metrics numbers comparable to some of the most mature DevOps teams in the industry. Also, save 100+ years of planned program time by moving key weapon systems across DoD to DevSecOps. • Inherently secure: Bring the most advanced cybersecurity stack with the Sidecar Container Security Stack (leveraging the first widespread implementation of a Service Mesh in the USG) with Zero Trust enforcement down to the container level and Behavior continuous monitoring detection and prevention. • Inherently available and resilient: Bring Kubernetes on weapon systems, including jets and space systems, where we demonstrated that containerization was not only possible but game-changing on Real-Time operating systems and legacy hardware. • A la carte business service provisioning: Move some of the largest DoD weapon systems to Platform One. • Evergreen updates: Push over-the-air software updates to weapon systems (U-2) while flying the jet. • User Experience: Bring AI/ML capability to the jets to co-pilot the jets alongside our Air Force pilots. • Ecosystem: Engage industry more than ever by ensuring both the existing Defense Industrial Base and new startups/companies can do business with DoD faster.

Conclusion There are other notable use cases of successful SaaS providers already seizing these benefits employing platform-powered services, such as Atlassian, which runs the majority of their cloud portfolio on an internal software platform20 , with over 1000 components that span from experimental to production-ready. In conclusion, and as referenced in the Thoughtworks article mentioned above:

Chapter 2: Internal Software Platforms

59

“[Internal] digital platforms are force multipliers, so there is a fine line between developing a competitive advantage and introducing a significant productivity blocker. The decisions you make along the product lifetime will determine whether you walk on one side or the other. The good news is that just like with every other kind of software development, if you start small, empathize with your customers, learn from your successes (and your failures) and keep your overall vision in mind, you have every chance of success.” There is a critical conclusion: there are no patterns for designing and creating platforms. Consequently, there are not two platforms that have been shaped up equally. It all depends on your context. The new opportunities that internal software platforms bring to the development teams usually come at the expense of flexibility, as engineers will have less freedom of choice due to the constraints introduced by the curated experiences of the platform. However, working within pre-established constraints is nothing wrong per se, as long as these are your constraints based on your view of the world. Platform constraints cater to better productivity, efficiency, and innovation, allowing service developers to be closer to the business and concentrate on what they do best, resulting in faster time to market.

Chapter 3: Platform Services Designing for the people United Kingdom’s government spending on public services is reportedly reaching a third of the country’s GDP. This fact is what Lou Downe, former Director of Design and Service Standards for the UK Government, states in their book Good Services, a practical guide to end-to-end service design. Out of this fraction, 60% of the cost of these services is allocated to fixing failure, not necessarily in function, but in design, as Downe informs. Unnecessary manual processes and overstaffed departments are impacting UK’s taxpayers with redundant costs that otherwise could have been avoided had the services been designed purposefully. Downe and their team’s work between 2016 and 2019 brings some hope to revert this situation. Their goal is to help the UK’s government and other public and private organizations worldwide transform and digitize their businesses with user-centric services. With the release of the Service Manual21 , a brilliant public set of guidelines that outlines the principles of building services that work, Downe contributes to making the world a better place. This manual is a leading source of information to guide all organizations to build utilities that support their users to prosper in the digital age. In a sense, UK’s Service Standard22 and Service Toolkit23 (and especially how they have made it easier for the government to undertake digital transformation) have heavily inspired this book and the principles that come with it. Consequently, the list of platform services presented in this section can be seen as a reference to compose a target,

Chapter 3: Platform Services

61

ideal internal software platform. Also, these platform services capture the principles that emanate from the Service Manual with the hope that you can help developers build software that your final users will love.

Core Infrastructure “Move fast with stable infrastructure.” - Mark Zuckerberg (2014) Having a standard infrastructure is pivotal for the success of internal software platforms. For example, Facebook’s approach to software development became very popular between the end of the 2000s and the start of the past decade, as demonstrated in the by-then famous motto “Move fast and break things.” Many companies and start-ups tried to imitate the unorthodox (and somewhat hacky) way of building and shipping software behind this idea, only to find later that only Facebook could operate like Facebook. And that was true until 2014. That’s when Zuckerberg realized that not even Facebook could run like that anymore, as they became a more mature company that was serving thousands of developers worldwide. Having a stable and standardized infrastructure to support their business operations was more important than breaking things. A standard infrastructure topology based on managed cloud services is the physical manifestation of the platform’s core. We need to be careful with layered architectures so that domain-specific services are not deployed on top of that platform infrastructure. Instead, they are deployed with it, side to side. As a result, the platform’s core infrastructure introduces a new architectural style that provides a minimal framework that is simple and non-intrusive: the platform architectural style.

Chapter 3: Platform Services

62

The Platform Architectural Style We can see a tendency in large enterprises to adopt a serviceful approach to building software, an architectural style fully aligned with the principle of smart endpoints and dumb pipes24 that this book portrays. The service-oriented enterprise is an idea that has been around for decades (as we can notice by the many bibliographic references) that reached its peak of popularity during the early 2000s thanks to the Service-Oriented Architecture style (Figure 14)

Figure 14. Evolution of architectural styles

Service-Oriented Architectures (SOA) While the concepts introduced by SOA were compelling and valid (and still are), the way they were rolled out caused this approach to software design to fail miserably. The main reason this happened is that the infrastructure technology was not commoditized as we

Chapter 3: Platform Services

63

know it today. Despite punctual and home-grown efforts to execute infrastructure-as-code on top of custom infrastructure APIs, the product industry stepped in, competing between them with proprietary technologies. To pursue product differentiation, vendors promoted an architectural style where developers shoved the integration and business logic into the Enterprise Service Bus, a hub-and-spoke component that acted as an intelligent pipe. What happened later is already history. Microservices Architecture Building on top of the lessons learned from the SOA fiasco, the microservices architectural style started to get traction due to new enabling automation technology and a more industrialized computing capability. Of course, this paradigm also brought some challenges, such as the proliferation of unstable infrastructures and heavy duplication across services. Microservices caused many developers to take on extra duties and leave their core business unattended as they were wasting time managing undifferentiated infrastructure work. And it is in this transition of offloading undifferentiated work from the service development teams where many software organizations find themselves today, paving the way to the introduction of internal software platforms. Platform Architecture Looking at the platform architecture picture introduced earlier, we can notice that the difference in the heights of the lines representing the service infrastructure is intentional and aims to highlight the optional nature of the platform services. Don’t mistake the constraints introduced by the platform services with creating unnecessary abstractions on top of what the cloud provider is already giving you. Instead, internal software platforms

Chapter 3: Platform Services

64

should embrace the cloud fully and at the same time expose the details of the underlying infrastructure that it rationalizes for developers. Two reasons support this approach: 1. Especially with serverless, we must see the cloud platform as a system ready to be programmed. Hence, it does not make sense to create artificial encapsulations that only subtract value. 2. You don’t want to be in the business of building and maintaining yet another generalistic framework layer that spans across the whole spectrum of managed cloud services abstracting everything away. With this style, developers can lean on cross-cutting platform services and at the same time fully leverage all the underlying cloud platform capabilities to implement the business logic of their services. This way, you always let these developers write code in a standard fashion, where standard refers to the widely accepted cloud technology. Otherwise, the business service development teams would be taking on an unnecessary overhead by writing non-standard code based on artificial encapsulations or unnecessary abstractions. This overhead will end up being more expensive than just writing the code twice in the unlikely event that you need to change the infrastructure of the applications, as explained by Gartner in their article “Why adopting Kubernetes for application portability is not a good idea”25 . A developer-centric approach

The concept of an internal software platform as an architectural style is not new. However, this whole strategy of putting developers at the center differs from how we used platforms in the past. Those approaches focused on providing product license optimization and monolithic systems of integration without caring about how software engineers built or integrated services with them. Now, you no longer buy the best product technology for developers by default. Instead, you build it so they can solve core business problems.

Chapter 3: Platform Services

65

Software engineering and architecture leaders like you have all the necessary technologies and practices at hand to make the dream of internal software platforms come true. These include: • Commoditized computing and runtimes as a service. • Infrastructure APIs and infrastructure-as-code tooling. • Co-evolution of automation tools toward NoOps. • Serviceful architecture practices. Putting all these elements together gives us a visual model of how and why the platform architectural model is starting to emerge as an efficient style for creating better software. On this occasion, the following Wardley mapping representation in a horizontal axis helps visualize the co-evolution of practices that emerge from technological progress (Figure 15)

Figure 15. Wardley map of the platform architectural style

Again, it is the individual responsibility of the different service teams whether they want to build their non-core infrastructure functionalities or buy them internally from a software platform. Of course, this book will assist you in advocating for the latter.

Chapter 3: Platform Services

66

Access to the underlying managed cloud services must be regulated by the rest of the principles and tools that you are putting together as part of your internal software platform initiative. To that end, ensure that you provide business service developers with proper guidelines when they take an off-platform path to their architectures. This way, you can ensure integrity across the board in critical domains such as security, compliance, and operational excellence.

Concerns with the platform architectural style In any case, as you read these lines, you may have raised a few red flags. Well, let’s try to address a few of them. Lock-In You are worried about vendor lock-in. Yes, it is understandable, but you shouldn’t. Remember that your organization is now transitioning toward SaaS, so you own the costs of the assets in their totality. For that reason, you may not have a portability requirement to multicloud (or on-prem) nor the need to share software code between multiple platforms. Technology executives of today focus too much on concerns related to vendor lock-in. However, your end customers care about business outcomes, not deploying sophisticated infrastructure clusters or designing complex multicloud schemes to escape lock-in. This is what one executive at a 100-year-old global insurance company explained in this DXC Technology’s article titled “It’s time to do Cloud Right”26 : “There’s a cognitive burden. You can have a serverless team focused and taking on reducing costs of transactions, speed of claims, and the tasks at hand, or you can have a team focused on the intricacies of container distribution. We’re an insurance company. We care about customer outcomes, not infrastructure clusters.”

Chapter 3: Platform Services

67

In any case, even in the unlikely event that you have a portability obligation for your on-demand software, you shouldn’t be worried about vendor lock-in either. The main reason is that serverless computing is entirely based on industry standards and protocols (in many cases even open-sourced) that, combined with evolutionary design patterns, allow clean and easy interoperability between providers. This feature is critical in a service-based economy such as modern cloud computing, and it is covered in detail in the last chapter of this book. Cost-driven micro-optimizations Due to the consumption-based nature of serverless computing, design decisions made by software architects and designers can directly affect the overall costs of the on-demand software. Adding another critical factor (such as costs) to the design decision mix can unnecessarily complicate things and lead software architects and designers to focus on dangerous micro-optimizations. Indeed, there is a risk in making design decisions that look for minimal cost improvements at the expense of application maintainability, debuggability, and traceability. This topic is covered in detail with a concrete example later in the Technology Principles chapter of this book. Service meshes You’ve read the word services multiple times, and you are skeptical about architectures based on microservices. Also understandable, and there are excellent reasons to be wary about them. It is important to stress that the properties and benefits of internal software platforms are not especially applicable to one architectural style or another. On the one hand, you could implement a platform in a monolithic application as a core module that captures all the crosscutting concerns for the developers. On the other hand, you could also build an internal software platform as a core system following the

Chapter 3: Platform Services

68

precepts of serviceful architectures where core services are bounded in context, loosely coupled, and independently deployable. Although these are just deployment considerations and the same principles can be applied to various schemes equally, the references and examples portrayed in this book embrace the characteristics of serviceful architectures with a preference for serverless computing.

AWS reference architecture In the spirit of a serviceful approach to on-demand software, the core infrastructure of the platform lays out a minimal yet solid foundation that allows the deployment and integration, quickly and easily, of the services that will work with it. This foundation is nothing but a collection of primary infrastructure resources that are necessary for any service to run on the cloud, regardless of whether they are based on serverless or not. These structural components aim to provide cross-cutting capabilities concerning (amongst others) security, availability, and networking so that service developers do not need to include them in their servicespecific infrastructure. As a result, development teams can retain cleaner and faster plugand-play deployments of their business services and eliminate the multiplication of primary cloud resources across the solution, thus reducing costs. As an example, the following picture describes a reference architecture for a platform core infrastructure using AWS’s managed cloud (Figure 16)

Chapter 3: Platform Services

69

Figure 16. AWS reference architecture of the platform’s core infrastructure

This reference design pivots around the idea of putting an Amazon Virtual Private Cloud (VPC) instance at the center. The main reason is that having a VPC at the heart of the platform is a best practice that facilitates a more accessible and more secure integration between external agents (users and systems) and the services in the platform. These AWS resources created for the platform’s core infrastructure facilitate a smart deployment pattern for every service running with the platform. By doing this, you are withdrawing a big worry from the developers, who no longer need to manage these cross-cutting concerns and can focus on designing service-specific infrastructure to run their business logic such as compute, edge distributions, tables, or queues. The platform microkernel In a sense, this is the same principle behind the architectures employed for some operating systems. You can think of these components as the microkernel of the internal software platform with one big difference: this microkernel does not create a layer of abstraction on top of the underlying cloud platform services. Instead, it provides a pre-integrated standard topology of core resources that acts as a landing zone for the business services to

Chapter 3: Platform Services

plug into. These core resources are very standard in any cloud solution and altogether provide a minimal ready-to-use architecture. These resources are: • Private subnets across multiple availability zones where developers can drop service-specific workloads. Although there is a preference for serverless, not everything can run using this architectural style. Service development teams may need to configure relational databases using Amazon RDS or containerized applications using Amazon ECS in these subnets. • Public subnets across multiple availability zones as a futureproof mechanism for those services that may need access to 3rd party APIs over the internet. This resource includes NAT Gateways together with an Internet Gateway, which is associated at the VPC level. Again, this helps to offload developers from creating these components repeatedly when they deploy a new service with the platform. • Pre-defined route tables and security groups for security and access control of the resources within the VPC. Also, a central Amazon S3 bucket could automatically store all the flow logs. • Load Balancers for routing service requests through the VPC, so workloads are distributed efficiently and automatically. This includes Application Load Balancers (ALB) in the public subnets exposing publicly available components, such as user interfaces that require legacy server-side rendering frameworks. Also, ALBs and NLBs need to be created on the private subnet to allow other components to access the technical APIs exposed by the services running on Amazon ECS or AWS Fargate, respectively. • A pre-configuration of Amazon API Gateway to manage every service API exposure through the platform. This configuration consists of the creation of an integration route

70

Chapter 3: Platform Services

71

with the VPC through a Service Endpoint. This way, service teams can focus on declaring their API definitions using a standard mechanism such as OpenAPI and then deploy their APIs in the gateway without dealing with other lowlevel routing configurations. All this helps in controlling and securing access from external clients to the business services. • DNS and hosted zones on Amazon Route 53 to discern this particular platform instance from other instances. By managing such configurations at this level, the necessary certificates will be created and stored automatically and transparently on AWS ACM. As a result, you will be able to assign meaningful domain names to both public ALBs and API Gateway endpoints.

Isolating environments with VPCs One of the essential benefits of having a VPC at the center of the platform’s core infrastructure is that it helps to isolate different customer instances from each other. This critical point has remained untouched thus far: you may not want to start designing a multitenancy service for your on-demand software at the application level from the start. Doing this would bring significant architectural and technology ramifications into light that would need to be considered apart. Also, multi-tenancy at the application level may become a large regulatory issue. In any case, serverless solves the same problem, with a lot more benefits and fewer issues and risks. The principles, techniques, and recommendations described in this book are inclined to leverage multi-tenancy at the infrastructure level by consuming AWS’s managed cloud services. Also, having a VPC at the core of the platform will help with this necessary segregation.

Chapter 3: Platform Services

72

Working with VPCs and all the aggregated cloud services and components creates logical boundaries that resemble the concept of traditional environments in serverful architectures. The platform’s microkernel helps shape the basis of an actual environment in the context of managed cloud services, making this idea more tangible. This construct is explored in detail in the following section.

Lifecycle Automation Management “DevOps is an engineering culture of collaboration, ownership, and learning with the purpose of accelerating the software development lifecycle from ideation to production.” Emily Freeman. As a SaaS provider, your organization seeks software that features an evergreen status with transparent updates to both the platform and the business services. To reach this level of confidence and maturity, you will have to implement a solid release automation process that enables developers to deliver software continuously.

Road to continuous delivery As Dave Farley, co-author of the Continuous Delivery book, stated during the episode “The foundations of continuous delivery” in the “Shipt It” podcast: “Developers and organizations are usually disdainful of processes because they assume that software is an exercise carried out by geniuses. And that’s not true because a lousy process will break good people every time.” The figure of the hero engineer who solely keeps the software and infrastructure running is not a sustainable role for the development

Chapter 3: Platform Services

73

of your internal software platform (and any software in general). Healthy teams construct healthy platforms. It means that even if you take the best software engineers to build your software, you can’t guarantee that the end product will be successful. You need processes. Processes help improve the chances to create quality software, and that’s precisely a core engineering property: learn from the mistakes and put in some corrections to make it better in the next iteration. Again, as Dave Farley mentions in the podcast episode discussed above: “We, as technologists, often get excited about technology, forgetting that the hard part of software engineering is people.” And especially how you coordinate them to integrate all their work in progress and deliver it continuously. For that, you will need tooling such as boards and pipelines, but that’s not the important part. How you use them is more critical. As you mature as an engineer, you will realize this more and more: it’s not about the code, it’s not about the design, it’s not about the tools, it’s about the coordination between people.

The only way to achieve the desired fast flow evangelized by Agile practices such as Extreme Programming is to remove the boundaries and handovers between teams. Instead of having separate development, operations, QA, and release teams, you need to think about having just one group responsible for building and releasing software to production. Leveraging all the benefits of serverless to accelerate the speed of development to end up with piles of deployment requests waiting to

Chapter 3: Platform Services

74

be processed by the operations team is something you need to avoid. Many organizations have followed this mantra since Werner Vogels, CTO of AWS, introduced the famous you build it, you run it phrase already back in 2006: “Giving developers operational responsibilities has greatly enhanced the quality of the services, both from a customer and a technology point of view. The traditional model is taking your software to the wall that separates development and operations, throwing it over, and then forgetting about it. Not at Amazon. You build it; you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. This customer feedback loop is essential for improving the quality of the service.” - Werner Vogels.

Automation in a platform-based architecture Insisting on the analogy with operating systems design, a lifecycle automation solution in a platform-based architecture has close connections with traditional package managers. Similar to operating systems and other plug-in-based software, a package in this context is a self-contained unit that provisions the specific infrastructure that the service requires for running its business logic. The equivalent of a package in the context of cloud-based software would be the combination of the deployable artifacts that compose a given service and the scripts that perform the actual deployment. Therefore, the lifecycle automation management solution is a system that helps in producing the service artifacts, creates the packages, stores them in a central repository, enhances them with metadata, controls their versioning, manages their dependencies, and allows developers to perform a-la-carte deployments.

Chapter 3: Platform Services

75

DevOps Principles The following principles will help you and the development teams align with this view and come up with an efficient build and release process to build and deploy services with confidence. • Include both source code and binary artifacts under version control so that you can keep track of changes over time. We recommend using a distributed system for source code management (e.g., Git) so you can work on changes in parallel as a team before merging them, even when the network is unavailable. Use meaningful commits messages to describe the changes you are making (your future self will be thankful for that). Avoid having multiple long-lived branches and keep the mainline in a releasable state (e.g., favor Trunk-based development over GitFlow or GitOps). Only use short-lived branches for features and fixes that can be merged into the mainline once complete and tested. • Use tools and platforms that allow developers to collaborate and contribute efficiently to developing the business services (e.g., GitHub). Incentive contribution mechanisms such as Pull Requests and use them as coaching opportunities. In that line, enhance source code repositories with all sorts of documentation that helps developers understand how they can contribute to the project, including Licensing, Notice, and Contributing guidelines. • Attempt small, incremental changes to the source code and integrate them often with the mainline. Write unitary and integration tests for your features and run them locally even before submitting a contribution to the mainline. You can configure the version control system so that the Continuous Integration (CI) pipeline automatically runs build, unit tests, integration tests, and code quality scan upon code commit (i.e., continuous integration).

Chapter 3: Platform Services

76

Also, you can design a build process for quick feedback and ring-fence the services’ unit and integration tests by mocking all external dependencies. Use deployment preview techniques when available and applicable. You should be able to perform an additional quality check of your generated artifacts before deploying them in a natural environment. • Create infrastructure-as-code repositories to run deterministic and consistent deployments that spin up the necessary cloud infrastructure to run the services and, at the same time, get all the artifacts deployed. Write fast and straightforward postdeployment tests that are executed automatically upon any deployment to exercise the service and its essential dependencies. You will have to work under the assumption that all the artifacts generated from the mainline are potential release candidates, so you must deploy services often and obtain quick feedback. Also, use auditable and traceable deployments to ensure you know what version of the services is running on each environment every time. • It would be most excellent to have multiple deployment environments (e.g., SDLC stages) so you can phase the rollout of the software and ensure it is adequately tested. Configure your infrastructure-as-code repositories so that the CD (Continuous Deploy) pipeline automatically runs deployment, smoke tests, and performance tests upon artifact generation (i.e., continuous deployment). If you are working from a feature branch, deploy services on a cloud sandbox. Alternatively, if you are working from the mainline, deploy them through a lifecycle of environments where they can be promoted and tested until they reach the release candidate status. This process will help you make sure a version of the software is not deployed to a later environment before being deployed and tested at an earlier stage. All these principles and the tools supporting them are highly available and accessible in the software development industry, to the point that

Chapter 3: Platform Services

77

some are even quite a standard. Despite this industrialization, how do you use those tools and the custom pieces you add on top of them make this solution a good platform capability. Provisioning serverless environments This new paradigm based on a serviceful architecture running on serverless computing challenges the traditional concepts of environments since everything is composed of little and independent building blocks, including the infrastructure. Providing lifecycle automation management functionalities for the development teams will help them not worry about these new constructs, heavy-lifting all the environment provisioning, and promoting their artifacts from their workstations to production. This type of automation could be considered a core capability of the platform and something you may be interested in building for the developers (Figure 17)

Figure 17. Lifecycle automation for platform-based architectures

Chapter 3: Platform Services

78

Examples of these functionalities that you would like to factor in your lifecycle automation solution include: • Provision of environments through infrastructure-as-code scripts that include the creation of platform instances so the development teams can focus on building services and deploy them easily on a platform-based architecture. A platform instance, in this case, is a construct composed of the platform’s microkernel and the rest of the core enablement services. • Deployment of services with the platform following a package management model. Every package points to the deployable artifacts of each service and the infrastructure-as-code scripts that create the necessary service-specific cloud resources to run those artifacts. • An inventory of environments and package metadata that can act as a manifest presenting important deployment key-value properties that may help with service dependency management, deployment traceability, and cost tracking. • Continuous Integration and Deployment pipelines, including standard stages, ensure deterministic deployments of highquality services across the entire SDLC. • Metrics and KPIs to measure the speed, risk, and quality of the services you are releasing. As Ramón Medrano, staff SRE at Google, explains in his Twitter account, a few metrics are very relevant27 . These include the percentage of the time with a green push pipeline, latency of the build tests, total latency of the push pipeline, number of commits per release, or number of releases per week. • Other minor functionalities, especially around source code version control, including dependency agents, issue templates, and other bots that directly integrate the code repositories with the rest of the lifecycle automation management solution. It is important to assess how critical are these functionalities to the

Chapter 3: Platform Services

79

SaaS transition that they need to be custom-built. Aren’t there any offthe-shelf or even as-a-service solutions you can use to deploy services on a platform-based serverless architecture? The answer is yes. This is a space where AWS has recently entered, rechallenging the Build and Buy decisions. Pay a lot of attention to whether or not these industrialized solutions give answers to all the new needs that are just starting to emerge in your organization with the transition to platform-based software. Again, as per the corollaries of Wardley mapping, if there is an obstacle in the way to creating business value and enabling capital to flow, then build a solution around that obstacle.

Lifecycle automation with AWS In their mission to commoditize the future of computing, AWS is seizing these new opportunities and values created around internal software platforms by introducing new managed cloud services. The main goal of those services is to respond to the co-evolutions of software development practices that come with the platform architectural style. Concretely speaking, they are introducing capabilities such as AWS Proton to assist engineering teams with the automation of platform-based environments and services. AWS Proton As per Amazon, the AWS Proton service enables platform teams to give developers an easy way to deploy their code using containers and serverless technologies, using the management tools, governance, and visibility needed to provide consistent standards and best practices. AWS Proton introduces fascinating abstractions and constructs that may be helpful to manage on-demand, platform-based soft-

Chapter 3: Platform Services

ware instances for your customers: • Environment Templates: These are infrastructure-as-code CloudFormation templates describing shared infrastructure across services. Ideally, this abstraction could be a template for your standard infrastructure composed of the platform microkernel and core services. • Environments: These are actual infrastructure resources created on AWS through those templates. In other words, this is your internal software platform instance. • Service Templates: These are infrastructure-as-code templates that define the specific infrastructure resources needed by each service, along with the code components to be deployed with it, such as AWS Lambda functions and their API Gateway HTTP endpoints exposure. • Service Instances: Actual service infrastructure and artifact deployments created through the service templates. The result of a service deployment is a running service instance on AWS. One of the most exciting parts of AWS Proton is managing the lifecycle of these service instances running with the platform in an automated way. This new managed cloud service does that by integrating every service instance (running on AWS) directly with their associated code repository (such as GitHub). Then, through an AWS CodePipeline centralized control, changes to a particular branch on the source code will automatically trigger a new deployment upgrade of the service instance on a specific environment. This lifecycle automation is, precisely, what we would expect to manage platform-based software at scale. We will see how AWS keeps expanding this type of functionalities for this managed cloud service very rapidly during the coming years. By improving the continuous flow of software development with these end-toend lifecycle features that span from source code integration to

80

Chapter 3: Platform Services

81

observability, you will be able to ship services much faster and do it continuously. It is a fantastic way of connecting the business with the infrastructure directly in the value chain.

Evolving platform’s lifecycle automation management with AWS services The world of cloud services is shifting very quickly under our feet. A set of capabilities that didn’t even exist a few months ago are now built promptly by providers who are hitting the market with good products that fill those spaces, especially in the automation field. AWS moved fast with the introduction of AWS Proton, which also brings exciting conclusions to the platform’s Wardley map introduced in previous sections. By looking at this strategy artifact, we can extract that to enable a flow of capital (that goes from the most invisible components in the chain to the most visible ones), you will have to build some custom components along the way if there is a gap in the industrialized market. Before AWS Proton, almost nothing existed as a COTS product or service, at least to the level of integration and industrialization that AWS can promise. For that reason, many software organizations have built in-house solutions for managing an automated lifecycle of service-oriented software that leverages serverless computing through infrastructure-as-code scripts. With AWS Proton, you now have an integrated solution that includes CI/CD tooling, infrastructure-as-code execution, code management, and environment provisioning based on service catalogs. You don’t need to build one yourself for the platform if you think this is an outsource competency. The following updated Wardley map for the ideal platform depicts this idea (Figure 18)

Chapter 3: Platform Services

82

Figure 18. Platform’s Wardley map using AWS Proton

Core Enablement Services “Your user defines what your service is.” - Lou Downe As introduced earlier, a platform is a collection of services, principles, and tools that work in harmony to offload developers from the responsibility of building cross-cutting functionalities. Consequently, as we know already, internal software platforms are custom-built tools that enable these developers to focus on creating business services. It makes total sense that the core of the internal software platform is also architected under the tenets of the serviceful (yet serverless) architectural approach to software development espoused so far. These cross-cutting services (also referred to as core enablement services or just core services) will feature similar pluggability properties as the business services. They are integrated and deployed with the platform’s microkernel in a loosely coupled manner (Figure 19)

Chapter 3: Platform Services

83

Figure 19. Platform Enabling Services

Just enough encapsulations So, what do these services look like? In general terms, you may not want to be too prescriptive with the number of core services you deploy with the internal software platform. This prescription, though, is about the type of core services you will have to provide. To figure this out, you must develop situational awareness to understand your particular context and the developers’ problems you need to solve. This way, you will also execute a reasoned technology strategy for these core services that will help you determine when to build and when to buy them. As a rule of thumb, a good core service provides useful abstractions, either by orchestrating various underlying managed cloud services and other 3rd party APIs or by simplifying their developer experience. Also, a good core service adds functional value to this abstraction, which is then contextualized to the actual needs of the service development teams in the organization. Hence, your job as a software engineering and architecture leader is to create valuable experiences and make them accessible to the developers. These are just enough encapsulations.

Chapter 3: Platform Services

84

Of course, this is not trivial and may require concise analysis and discussions with the platform architects, who should pay special attention to avoid building purposeless wrappers on top of the managed cloud services. Those wrappers are nothing but ineffective abstractions that do not create any tangible benefit compared to the underlying services they encapsulate, pushing you down the rabbit hole of re-implementing and maintaining a widely adopted standard. Remember, you are not in the business of re-building yet another generalistic framework layer that spans across the whole technology spectrum abstracting everything away. Instead, you are interested in creating an internal software platform that is fully tailored to the needs of the developers.

Simplifying developer experiences Even when building abstractions does make sense to you, there are still essential remarks to point out so you can be as effective as possible in the design of these core platform services. As bad as it sounds, and as a general principle, try to favor duplicity over hasty abstractions. In other words, and using Martin Fowler’s terminology, it would be best to support harvesting frameworks28 over foundation frameworks29 . Creating platform core services means abstracting solutions that solve real problems for the developers. Applying this principle ensures that you don’t abstract too soon and create generic platform core services without understanding the actual use cases. Instead, try to encourage business services to provide solutions to their problems at first and, once you spot strong duplicity across the board, harvest the necessary parts as a new platform core service. This principle is of utmost importance and is covered in detail later in this book. Now, how much value do these core services have to create? Do they need to develop new and disrupting functionalities so you can say

Chapter 3: Platform Services

85

they add value? Of course, it all depends, but they do not necessarily need to always come with a completely new range of capabilities. As mentioned earlier, platform core services are fundamentally aimed at providing a curated experience to the developers in your organization. There is a lot of value in creating an abstraction that simplifies the developer experience of a complex underlying managed cloud service or 3rd party API and adapts it to the particularities of the service development teams.

AWS Examples Since the functionalities and abstractions provided by the platform’s core services are contextualized to the needs of every team and organization, the following use cases outlined in this section must be taken as examples for guidance rather than as a reference architecture for implementation. When designing these core services using the AWS cloud platform, you need to consider its iterative and incremental approach to adding new features and creating new ones, and of course, the speed at which it does so. In this spirit of agile development, it is natural to see new AWS services released with a minimum set of viable functionalities and a list of known limitations that may or may not be addressed later. These limitations will push you to make design decisions and cover the gaps with a custom solution when you put together an architecture for your core service. You would need to fill the spaces left by the managed cloud services with a custom design until AWS fills those gaps with an out-of-the-box feature (if they ever want to do so).

Chapter 3: Platform Services

86

Event Management Core Service Let’s start with the first example of a core service that abstracts the orchestration of other underlying AWS services. To provide a loosely coupled design for your software where all the services are integrated solidly and reliably, you may be interested in following the precepts of an event-driven architecture described through the following assumptions: • Services are integrated asynchronously, employing a publication and subscription (pub-sub) mechanism fuelled by events raised and consumed by services. • This pub-sub mechanism should guarantee the delivery of the event messages to every subscriber. • Services must remain loosely coupled from the platform and all the specific infrastructure required to implement the pub-sub mechanism. • Service teams are offloaded to create any infrastructure required for event handling, thus mitigating any risk of duplication. • Services implementations are agnostic of the platform’s pub-sub integration mechanism of choice. They need to receive events from other services without the need for any active polling. An excellent solution to this problem using serverless computing would be configuring Amazon EventBridge with the necessary rules to route events between the different services that compose your software. However, some important limitations may affect this design decision using Amazon EventBridge, such as the impossibility to send messages between services sitting in different regions (only a few are supported) and the restriction that allows only five target services per rule. As documented by AWS, a possible solution would be to configure Amazon SNS as a unique target for all the rules and then fan out the response to all the subscribed services. That introduces another

Chapter 3: Platform Services

87

tradeoff, as Amazon SNS does not guarantee delivery, so you would need to submit a queuing solution to the equation such as Amazon SQS, so target subscriber services do not miss messages when they are offline. To keep the implementation of the services unaffected by this approach and avoid an intrusive solution, you would need to introduce a sort of central component that pushes the actual event to the subscribers, such as a small AWS Lambda function. The resulting Event Management core service would look something like the one described in the following picture (Figure 20)

Figure 20. Platform Events Management Service

This use case is an example of a core service that adds orchestration value. It exemplifies the importance of using managed cloud services (especially serverless computing) when making design decisions. By choosing services such as Amazon SNS, AWS Lambda, and Amazon SQS for your custom solution, you minimize both the cost of implementation and the future cost of change in the likely event that AWS fills the gaps themselves and adds these features to Amazon EventBridge. Identity Management Core Service The following example describes a use case where you may need to build a core service that wraps an underlying managed cloud service and, while it doesn’t add much functional value, it does solve a user problem. Suppose you need to provide a group of developers or even customer admin users with a Web user interface to create new users

Chapter 3: Platform Services

88

for their applications in a self-service manner. While adding users to a user pool is something that Amazon Cognito allows to do easily from the AWS console or the service commandline interface (CLI), these types of actions can’t be granted to those users who are demanding a less technical solution. While your initial reaction would be to leverage AWS Amplify SDK for JavaScript using a popular client framework such as React, the situation gets complicated since this library does not include an admin function to create users programmatically30 . Fortunately, Amazon Cognito SDK for JavaScript does have such a function with the caveat that you can’t use it from a Web browser as this client can’t be assigned to a role in AWS IAM with the permissions to operate against the user pool. With this background, one of the most sensible solutions to solve this challenge is creating an AWS Lambda function with the necessary role permissions to access the Amazon Cognito user pool. Then, include the required logic to create a user in that pool using the Amazon Cognito SDK and expose that functionality through an API endpoint exposed on the Amazon API Gateway to be invoked from the browser. The following picture describes this solution (Figure 21)

Figure 21. Platform Identity Management Service

The result is a platform core service whose objective is not to become a general-purpose core service competing with Amazon Cognito. Instead, it is a service that developers can reuse multiple times for

Chapter 3: Platform Services

89

multiple applications. Platform Utility Libraries The last use case in this series of platform service examples describes a scenario where you may need to provide developers with a simplified experience compared to the vanilla approach. In other words, you may need to create an abstraction that helps them relieve some of their cognitive burdens. Let’s imagine for a moment that you are providing a centralized platform functionality to store service and application metadata as key-value pairs in the AWS Simple Systems Manager (SSM) Parameter Store. This data follows some conventions and structures that allow a distinct parameter tree per environment running on the same AWS account. This way, services running in that particular environment can put or read values in these parameter manifests to exchange information with each other, which may be necessary for the implementation of their business logic. It would be something like this: /environments/env1/global/dns /environments/env1/global/api_key /environments/env1/services/svc1/endpoints /environments/env1/services/svc1/log_bucket /environments/env1/applications/app1/user_pool_id

As you could imagine, programmatic access to SSM is something that AWS provides out of the box. It is included in the standard SDK, fully documented, and of course, widely adopted. However, the code developers must write to get access to a specific parameter stored in an environment tree may be repeated multiple times across different business service implementations, including the exact parameter name in the tree as per the pre-established conventions and structures.

Chapter 3: Platform Services

90

For this reason, you may be interested in providing a utility library that provides a functional abstraction to help developers reduce the amount of boilerplate code needed to operate with manifest parameters for every environment. For example, the following generic code represents a developer using the standard AWS SDK functions to access a concrete SSM parameter. The ssmpath is a string representing its full path in the tree. const AWS = require("aws-sdk"); const ssm = new AWS.SSM(); // Sets SSM parameter object const ssmpath = "/environments/env1/global/dns"; const params = { Path: ssmpath, Recursive: true, WithDecryption:false }; // Gets environment DNS name const getParamsResponse = await ssm.getParametersBy\ Path(params).promise();

In this example, every developer needs to remember the conventions established for the environment tree structure, such as the subpaths environments and global. Instead of doing this, you could create more functional abstractions that help developers write more declarative, understandable, and simple code.

Chapter 3: Platform Services

91

const Platform = require("platform-sdk"); const environmentName = "env1" const environment = new Platform.Environment(enviro\ nmentName); // Gets environment default API Key const apiKey = await environment.getAPIKey(); // Gets environment DNS name const dnsName = await environment.getDNSName(); // Gets user pool ID for application ID1 const userPoolID = await environment.getApplication\ ("app1").getUserPoolID();

Platform Design System “My aim is to omit everything superfluous so that the essential is shown to the best possible advantage.” - Dieter Rams Design Systems are becoming the industry’s response to building modern digital products, especially in on-demand software. They provide a set of principles, guidelines, and tools aimed at helping designers and developers in building visually consistent and compelling user interfaces that adhere to a particular branded visual identity. A few key characteristics of Design Systems include (but are not limited to) the following aspects: • They help manage the branding and design of software products at scale within the entire organization thanks to a collection of reusable UI components and, more importantly, design principles.

Chapter 3: Platform Services

92

• They include tools for designers, so coming up with homogeneous user interfaces and experiences across a family of products can be done quickly and efficiently. • They include tools for developers so that the transparent enforcement of the established visual guidelines in the applications they create can shorten the development times. • They are themeable and configurable, meaning that the applications built with the Design System can be made to look like other brands, always within the boundaries of the established design principles and constraints to preserve the original desired visual identity. • They are extensible so that other designers and developers can build new reusable higher-order components based on the primary catalog, such as business-oriented controls that meet industry-specific user needs and functionalities. For all these reasons, and even though UI and UX design tend to be an underestimated afterthought and a somewhat diminished practice, a Design System is a valid platform component. The main reason is that they help designers and developers focus on building business engagement services by providing an opinionated catalog of UI elements and principles that align with the branding mission of the organization. If this was not enough, the development tools that come with Design Systems help rationalize the many Web development frameworks that exist. They do so by including a simplified programmatic interface tailored to the application developers’ specific needs. These characteristics of Design Systems make them suitable to become a key component of your internal software platform.

Ecosystems and application marketplaces Moving from a product-based economy to a service-based one starts by acknowledging that the secret sauce to your success has nothing

Chapter 3: Platform Services

93

to do with what you do but how you do it. At first, it is hard, but no magic embedded in your product source code makes it irresistible to customers. Yes, you do stuff, so do your competitors. It turns out that how you present your products to your customers and partners matters most.

To progress through the evolution line, one of the most effective strategies is to make your technology open to the entire community. In the last instance, that means open-sourcing your product so, if you are already enjoying a leading position, you have good chances to become an industry standard. In any case, an excellent first step is making your APIs open so they can be accessible by your customers, partners, and of course, competitors. When you make such a move, you think that you have something good, and by making it open, it can be better. But how is that possible? Well, by making your APIs and the UI frameworks available (for example, by publishing them on a developers portal), you are expanding the ecosystem of players that can collaborate with you. This way, customers and partners can contribute with innovations built with your technology, and in some cases, combine their technology with yours. What this means to business applications is revolutionary. At first, building business applications could be considered a core activity because they will help you differentiate from others in the marketplace. Eventually, other companies will pop up and create the same business apps as you do, thus setting new competitive space. How can you then anticipate before the next move happens? The answer is by making it yourself. It is of utmost importance to create a product marketplace based on your technology, where partners build the business applications

Chapter 3: Platform Services

94

based on your open APIs, so you don’t have to do it yourself. By enabling this level of community contribution, you will place yourself in a position to offer these business applications to your customers offthe-self. It means that you can relocate your talent to start building the next set of innovations that will make you different again. This idea is represented in the following Wardley map (Figure 22)

Figure 22. Evolution of business apps

Looking at this map, we can see how there is a point of inertia that is preventing your business applications from evolving because, at first glance, there seems to be no need. Your organization is exceptionally successful in selling these applications. At this stage, it is hard to convince the executives that you need to make a move and start reimagining how you build and deliver these applications to the market. But the reality is that you are dying of success, and you won’t notice until it is too late. As explained before, opening the business APIs and letting partners and external contributors build new applications is vital for your future. As the Jevons paradox31 has taught us, new higher-order systems and innovations will emerge from this move, enabling you to compete in an elevated playfield that will boost the consumption

Chapter 3: Platform Services

95

of your APIs.

Client frameworks The end of 2020 was an incredibly eventful week for the Web development community due to several major technology announcements. Chief amongst them, and almost simultaneously, it is worth highlighting Facebook’s React Server Components32 (RSC) and Basecamp’s Hotwire33 releases. While these two sets of libraries feature similar capabilities for building modern Web user interfaces, they do so from opposite approaches. RSC and Hotwire provide the Web development community with new and better solutions for a growing number of use cases and higher expectations in the world of user interface development. Of course, arriving at this type of consensus is not always possible without a bit of controversy in an increasingly polarized community. And the reason is that these two technologies revived the old debate about where the Web UI rendering should happen: client-side vs. server-side. The pendulum swings back and forth It’s not the first time that we hear this analogy that software architecture trends behave as a giant Foucault’s pendulum34 , swinging back and forth between the different practices that are considered as the best at a given point in time. And Web development is not an exception to that, swinging between client-side and server-side rendering approaches35 , as depicted in the following picture (Figure 23)

Chapter 3: Platform Services

96

Figure 23. Pendulum of Web development frameworks

1. The initial stage represents the early days of the Web when only a few people with host admin rights could build static web pages that browsers rendered. That was purely client-side rendering based on HTML and CSS, which soon fell short in a world where the users wanted to interact with real-time data through these Web interfaces. 2. This is how we moved from Web pages to Web Apps. Seeking more dynamic behaviors, users demanded we move the HTML generation to the server to make it closer to the data. We came up with the MVC pattern to avoid coupling the view generation with the actual data fetching, and we brought all the major programming languages onboard (PHP, Java, .NET, etc…). Unfortunately for the users, Web Apps were still behaving as Web pages, and the navigation and user experience was suboptimal, especially for large applications. 3. We gradually started to move things back to the client-side again. These were the old Web 2.0 days where we built Rich Internet Applications (this is how we called sPAs back then),

Chapter 3: Platform Services

97

leaning on the experience and lessons gained from the serverside rendering approach. Yes, we moved the MVC to the client. This pattern allowed for a much better UX and a more desktoplike behavior, which our users demanded. JavaScript development exploded, and the proliferation of hundreds of Web frameworks kicked off, establishing a new normal in Web development that still spans our current days and, as you can imagine, not without tremendous challenges. The complexity of developing and building SPAs with some of these frameworks has grown exponentially to the point that sometimes it is unmanageable, both from a physical (bundle size) and organizational (team size) point of view. 4. And this is where things start to get interesting. Conscious of this problem, those teams who supported server-side rendering started to refine this approach based on the lessons learned from the pendulum’s previous state. In other words, by leveraging all the good things about the client-side’s smoother application navigation and user experience while keeping the performance and efficiency of the server-side. Frameworks like LiveView36 or ASP.NET Blazor37 took this opportunity to make outstanding improvements concerning the server-side rendering state of the art. And this is the area where Hotwire is playing today. 5. Hotwire’s approach may not be enough for most UX-demanding Web Apps. In other words, managing the UI state on the server and streaming that back to the browser via Websockets together with some JavaScript sprinkles may not satisfy all the business requirements and user expectations for reactive user experiences and interaction-intensive interfaces. This is where React Server Components will sit tomorrow. Alternatively, we can look at this evolution pattern from another angle by using a model created with Wardley mapping (Figure 24). In this view, we can see how new Web development techniques emerge from other more established practices while still evolving,

Chapter 3: Platform Services

98

thus leveraging the lessons learned to start from a more mature stage in the evolution line.

Figure 24. Wardley map representing the evolution of Web development frameworks

Choosing the right technology So which technology is better? It depends on your context. On the one hand, building Web Apps (using Ruby, Python, Elixir, or Java) that work well with minimal JavaScript is perfectly okay. However, this approach may not satisfy all UX expectations. Suppose the team can leverage their existing backend skills and ship the Web App twice as fast and cheap as someone with a slightly better UX. In that case, that could be a reasonable tradeoff in a competitive market, and that is something to aspire to if you are trying to run a successful business. A development team with this technology background might show some resistance to doing everything with JavaScript, so they can spend more time developing in the languages they know instead. Again, this would make sense for some use cases for Web Apps where user interaction is expected to be light. And this is where Basecamp wants to position itself with Hotwire: the most crucial part of their message is that they want to empower small teams.

Chapter 3: Platform Services

99

Hunting for simplicity is not wrong and will never be wrong. But we need to be aware of the tradeoffs. It is agreeable that “pursuing JavaScript for everything38 ” as the default new norm could harm the Web dev technology ecosystem. Again, there are no one-size-fits-all solutions. Pure SSR comes with a tradeoff since you typically sacrifice a bit of UX at the expense of lighter bundles. The server-side approach will never be as good as the client-side to re-render the UI based on user interactions. When building modern, reactive Web Apps with multiple user interaction and feedback points, rendering race conditions or component cascade updates are caused by a shared mutable state. React figured out how to solve this with deterministic view renders39 based on an immutable UI state. And, this is the critical part, React manages action-based rendering as close as possible to the user: the browser. With the introduction of RSC, the whole point is that there is no more render lock-in to move across the boundaries as our constraints change. React Server Components are the mechanism that lets you do data fetching and run some logic without shipping it to the client at all. As Dan Abramov said: “Rendering to HTML is still a useful optimization on top of the first render before JavaScript loads.” On the other hand, building Web Apps using React with minimal backend code is perfectly fine. Also, if the team has a strong JavaScript background and the architecture supports this model, shipping client-side rendered web Apps with the user at the center and outstanding attention to UX details is a good tradeoff. It depends on your context. Things like the combination of an existing team structure, skills, budget, priorities, and user needs may affect what technology fits better in every use case. But it doesn’t end there.

Chapter 3: Platform Services

100

UI architectures There is one more condition that probably has the most considerable influence in the decision of picking up the proper client framework, and that is your existing architecture. This metaphor that software architecture is like archeology40 is right on point, stating that 90% of an architect’s role is classifying technical debt and refactoring for non-interrupting replacement of an existing system when building a new one. When making a new Web application, you will likely have to do it on top of existing systems that do not fit exactly, but you depend on. The following diagram depicts the typical architectures on top of which you may need to develop new Web applications (Figure 25)

Figure 25. Different UI architectures

1. The Monolith. This is where you would start if you were building a new system from scratch. Monoliths, per se, are not bad. The problem comes when large teams without the proper methodologies and tools start working on a monolith, affecting their ability to deliver value to the market. In any case, if this is your situation, SSR with Hotwire could perfectly fit the bill, especially if you are building a CRUD application (which could

Chapter 3: Platform Services

101

be 80% of the time within the 10% of greenfield architectures). Pay attention to how to scale it without losing agility. 2. Microservices fit better as a decomposition pattern for existing systems in companies with a large software suite. If this is your context, building SPAs with React consuming JSON APIs could ideally be a good tradeoff as long as the teams are well structured, are skilled in JavaScript, and the proper methodologies and automation tools are in place. 3. Backend for the Frontend could be a good solution in some cases. It is like converting a Web App into a monolith by assigning a specific backend just for it, instead of consuming multiple REST APIs. As this backend will be dedicated to this Web App, it is acceptable to couple them together. It means that you could still return JSON from the backend and build a SPA or instead do some SSR and return HTML specific to that channel. There is nothing wrong with that. Again, it depends. 4. Content Negotiation41 is a standard HTTP-based development pattern used a lot with REST APIs. It makes it possible to return different versions of a resource based on client-side headers such as content-type or user-agent. It means that there is nothing wrong with a microservices context if the REST APIs return HTML based on content negotiation to be glued together in the client. Of course, we would need another technique to glue these HTML fragments together on the client, but that’s a different story. The point here is that there is nothing wrong with the approach if it fits your needs. The frontend developer role This whole pendulum discussion opens up another more profound debate: the separation between frontend and backend, including development roles. These are two different development specializations that need to be treated with equal importance, regardless of your context and architecture.

Chapter 3: Platform Services

102

• In the SPA world, the role of frontend engineer requires more JavaScript (which is nothing but a first-class programming language), and therefore, you need to design and skill the teams accordingly. You may also arrive at the same conclusion due to your strong commitment and desire to build better user experiences. In any case, frontend developers are first-class citizens within the spectrum of engineering roles working with declarative UIs built with JavaScript frameworks. • In the SSR world, frontend developers can work on plain HTML, CSS, and vanilla JS and then pass on their templates to backend developers to inject their data binding annotations and hydration using their backend language of choice. But again, there is a clear separation of roles involving a completely different set of technologies. It may be that the role of a full-stack developer does not make sense anymore. Regardless of your context and architecture, frontend developers are a necessary specialization. We can’t naively think that someone can be good at HTML, CSS, and backend programming since that way of thinking diminishes the importance of UX in Web Apps. And that is bad. Although the term full-stack developer started to be used when we began to talk about the MEAN stack (all JavaScript), what the community meant was just JavaScript Developers. There is nothing wrong with that. It denotes a bit of underestimation of the language as if it was a toy not ready for prime time. In any case, the fact that we could use the same programming language across the stack does not mean that one can be good at every stack layer Even within the context of the same programming language, the term full-stack developer is not very accurate.

Chapter 3: Platform Services

103

Platform Configuration This section talks less about developers and more about end customers who embark on your SaaS adventure. This type of user comes with higher configurability expectations driven by the self-service nature of a subscription-based engagement. To fulfill those expectations, business service development teams will need to include a few configuration capabilities for all the services to allow the business users to adapt your software to their particular needs quickly and efficiently. Those configuration capabilities should be light abstractions presented in a User Interface, expressed in business terms, using a business language, and encapsulating all the technical implementation details of the underlying services.

Configuration vs. Customization As discussed in the first chapter of this book, development teams will want to build flexible services. To that end, they will be tempted to offer as many customization features as possible to the end-users to factor in all their business-specific needs. However, product leaders need to be aware that the more flexibility these teams offer through the configuration capabilities, the less they are helping customers progress in industrialization. And customization has its limits. Extreme customization comes at the expense of putting your organization’s position as a SaaS provider at risk. Service development teams might end up transforming these business configuration applications into a low-code tool where everything is possible, even building completely new applications that have nothing to do with the business domain. We have some excellent examples and references of tech companies who have jumped fully into this type of work, such as Netflix, which

Chapter 3: Platform Services

104

built a tool called Conductor that allows you to develop processes and workflows that implement the business logic of the services. As anticipated, this tool is generic and has nothing of the video streaming business embedded in it. In exchange for putting their time and effort into this type of work, Netflix has donated the tool to the Open Source community in an attempt to make this product an industry standard. In any case, business services leaders will have to ask themselves if they want to be in this type of business or, on the contrary, provide customers with a lightweight configuration tool.

Application development tooling Making decisions about application development tooling for end customers is one of the most critical responsibilities for business service development leaders. There are many factors and cost-benefit assessments to consider before embarking on such an endeavor. The following picture summarizes these tradeoffs (Figure 26)

Figure 26. Development tooling spectrum (Credits: Jesús Suárez Ardid)

In other words, configurability and flexibility are mutually exclusive features when designing application development tools.

Chapter 3: Platform Services

105

Configure an application On the one hand, developers have the option of building a lightweight configuration tool full of business-specific abstractions and constructs. This tool will help customers create new applications at speed, but it will constraint the range of which they can create. For example, this speed usually comes at the expense of testability, brand fidelity, and overall functionality control. These are configuration tools full of opinionated functionalities based on data extracted from years of industry insights and analytics. This is an opportunity to include business abstractions and make all the configuration options default to the industry knowledge that makes your software organization different from the competitors. This way, customers can buy your software off-the-shelf, configure it lightly to their needs, and build small components to integrate it into their architectures. Write an application On the other hand, there is the option of letting customers write their own applications using your technology. In this case, there is a lot of flexibility for them to build the applications they want, but it comes at the expense of speed and, most probably, cost. This option is a clear differentiation point for your organization. Here, business services teams wouldn’t be letting customers create those applications on their own. Instead, service developers would create domain-specific frameworks and SDKs so customers can do it programmatically. In other words, your customers will be buying components from your software organization to build the products they want. Danger zone Finally, the danger zone is in the middle of the configurability spectrum. It is a gray area where the application development tooling

Chapter 3: Platform Services

106

provides generic configuration with no canned domain expertise. Here, end-users can’t build the applications they want because the tool is not as flexible as writing code. Also, it won’t offer speed of development because it is still technical and does not contain business constructs for the experts. The real problem with this range of configurability is that it septs dangerously into the visual customization field. The reason is that development teams will be tempted to offer as many configuration features as possible in an attempt to find a generic solution for the end-users. There is no clear differentiation for your organization at the danger zone. The competition is fierce in the marketplace, where generic low-code tools are already abundant. If you or your organization are evaluating options in this space, it would be better for you to recommend a tool from the market.

Platform Management Consoles Offering contextualized configuration capabilities to your end-users is instrumental to the flexibility and extensibility of your software. Many on-demand software providers follow the idea of including a platform management console as the entry point for these SaaS configuration tools. This console is a centralized user interface that aggregates the complete set of configuration applications for business users into a one-stop-shop. The concept of building a management console for your internal software platform promotes the idea that each service team (core and business) is in charge of providing its own set of configuration Apps and APIs. For example, a core service team would be giving configuration capabilities for Identity and Access Management or UI white-labeling. Complementarily, a business service team would be offering functionalities such as regionalization for their engagement services, data model extensions, or product definitions.

Chapter 3: Platform Services

107

It would be beneficial for your role as an internal software platform provider to remain focused on enabling configurations for the platform core services. At the same time, try to develop the principles and tools for the business services teams to build and deploy their business-specific configuration applications. Reusing and amplifying your own engineering experiences is an advantageous way of helping your organization’s developers.

Promoting configuration data It is essential to mention that the data artifacts generated by these configurations (e.g., themes, product definitions, dashboards) are subject to be promoted through the lifecycle of environments. You may be interested in offering your end customers a centralized capability to configure once (i.e., probably on a downstream environment) and push that configuration data through an automated promotion cycle. This way, the data package will reach the live environment, including automated tests during the process. This is especially important when you are offering multiple extension points, so your users can create configuration data in one place, test it at various stages, and deploy it automatically when they are ready.

Platform Microfrontends Microfrontends is an architectural style that promotes the design and development of independently deliverable and deployable Web applications that are ultimately composed into a greater whole, also known as the container application. Thanks to its flexible nature, it may be an exciting architectural style to consider for the development of Web applications with your

Chapter 3: Platform Services

108

internal software platform when the context and the circumstances apply. In the spirit of software composability and service-oriented teams, microfrontends offer developers a golden opportunity to build and deliver platform Web applications more efficiently. Under this new model, the platform console is the container, and the configuration applications are deployed as microfrontends. As Martin Fowler’s team brilliantly explains in the article “Micro Frontends”42 , there are several benefits acquired by embracing this style as well as a minimum list of constraints applicable to the microfrontends approach, which are summarized as follows: • Incremental upgrades: You should be able to update, refactor, or rewrite an application in pieces. This enables continuous delivery of features in a much more agile way. • Simple, decoded codebases: The source code of each microfrontend will be, by definition, much smaller and consequently more cohesive, maintainable, and scalable. • Independent deployment is essential since every microfrontend should have their build and deploy life cycle totally decoupled from other applications. • Autonomous teams: As a higher-order benefit and in line with the advantages of serviceful architectures, the microfrontends style favors independent groups and decentralized decisions. • Loose coupling: Microfrontends should be built using standard Web tech and deployable as standalone applications if needed. A developer building a user interface shouldn’t know whether their application will be deployed as a microfrontend or not, enabling legacy apps to be integrated into the main container more efficiently than following other approaches. • DRY principle: Try to avoid the repetition of code (even through the use of libraries) for cross-cutting platform concerns between different microfrontends (i.e., log-in, theming). This may cause

Chapter 3: Platform Services

109

inconsistencies between the different UIs (i.e., different behaviors by using different versions of the libraries) With this background, it is foreseen that multiple implementation approaches emerge to comply with these principles at different levels. As a reference, this book describes three implementation approaches that may help you adopt this architectural style for your platform management console and configuration applications in a practical way. These methods are introduced during the following sections in reverse order of compliance and design complexity (Figure 27)

Figure 27. Types of microfrontends

Chapter 3: Platform Services

110

Build-time microfrontends One approach to implementing microfrontends is to publish each Web application as an independent bundle or package and then have the principal application (the container) include them all as dependencies at build time. Then, during the container’s build process, a new final bundle is produced that you can deploy. This approach is sound, but it is dangerous. It is risky, mainly because it pushes you to use the same Web framework across all applications. Also, this approach does not meet all the requirements and does not bring all the benefits you might be looking for. With it, you won’t be able to upgrade parts of the final Web application independently, as everything needs to be re-built, re-bundled, and re-deployed together. For this reason, all the implications of this implementation approach need to be carefully considered before going ahead with it. • Incremental Upgrades • Simple, decoupled codebases • Independent deployments • Autonomous teams • Loose coupling • DRY Hyperlinked microfrontends With this approach to microfrontends, Web applications are built independently following their lifecycle and deployed on the same platform environment. In this particular case, the aggregation of the different Web applications is done via hyperlinks so that the container application becomes a hub with links to each microfrontend. Both the container and the microfrontends share the same domain (as they

Chapter 3: Platform Services

111

live in the same environment), so they are just paths under the main container. The container application can obtain these links to the different Web applications dynamically (e.g., via an API) to consult what Web microfrontends are available in that particular environment. It is a great way to decouple both the container from the different Web applications deployed as microfrontends, allowing you to build all types of applications without knowing one another. In other words, hyperlinked microfrontends don’t know they are microfrontends. Developers do not have to code their applications in a specific way to be deployed as microfrontends because it is just a deployment characteristic of every Web application. However, due to the excessive flexible nature of this architecture, you may indeed end up with a bit of duplication of code between microfrontends. This is especially true for cross-cutting concerns such as theming and authentication, leading you to a small risk of inconsistent behaviors and user experiences between the different microfrontends. This is partially mitigated by using platform-wide libraries containing this cross-cutting functionality, but even so, it could the that the microfrontends use different versions of the libraries. In any case, this is an excellent choice for the majority of the cases, as it meets almost all of the requirements: • Incremental Upgrades • Simple, decoupled codebases • Independent deployments • Autonomous teams • Loose coupling • DRY

Chapter 3: Platform Services

112

Embedded microfrontends Precisely as with hyperlinked microfrontends, Web applications following this method are built and deployed independently on the same platform environment, each featuring its lifecycle. The main difference of this approach is about how microfrontends are aggregated together into the same container application. In this case, there should be a business reason: microfrontends must be displayed together embedded in the main container layout instead of being rendered as hyperlinks. The benefits of this approach match those from the previous level, plus in this case, the container implements the cross-cutting functionality, leading to a reduction of duplicated code across the different microfrontends: • Incremental Upgrades • Simple, decoupled codebases • Independent deployments • Autonomous teams • Loose coupling (see important notes below) • DRY You need to be very cautious when moving from the hyperlinks approach to the embedded one. The only real benefit is a more compelling user experience at the expense of moving cross-cutting functionality to the container. This approach could put into compromise the contracts between the microfrontends and the container. It could bring unnecessary complexity to the Web applications and the container itself and force developers to consciously code their applications as microfrontends to be aggregated together later. This is a significant tradeoff to consider since you may still want to build and deploy microfrontends as standalone Web applications

Chapter 3: Platform Services

113

outside the container, integrate legacy apps into a container, or of course, use standard Web development technologies and frameworks to build those Web applications (i.e., you still want to use Create React App to build Web applications)

The AWS Management Console The AWS Management Console is a tangible manifestation of the principles and patterns exposed in this section. Essentially, it provides AWS customers with a Web application that acts as a central hub to access the various configuration user interfaces for every AWS service in the catalog. The configuration applications are integrated into the console through hyperlinks while they share common navigation points, such as the side navigation and the header. After navigating through the console accessing the available managed cloud services, one can tell that the vast majority of their configuration applications follow a microfrontend style, where every service team is in charge of building (and even deploying) their management user interfaces.

Thin business abstractions It is very noticeable how these teams are gradually adopting the integration of the new AWS style guide at different paces. This is an exciting point since keeping a consistent user experience across an extensive suite of applications and at the same time maintaining massive levels of autonomy for each service team is a challenge to a cellular organization like AWS. Consequently, the AWS Management Console is still a representative display of this challenge, despite the recent efforts to deliver a uniform visual identity across the whole suite of management user interfaces. It is essential to highlight how the disparate Web applications integrated into the central AWS Management Console offer configuration functionalities to the final users through powerful business abstractions. As explained earlier, these are thin configurations using a business language. It just happens that the business, in

Chapter 3: Platform Services

this case, is managing IT systems, and the users are engineers in their vast majority. These functionalities principally consist of ready-to-use knobs that encapsulate complex technical details of the managed technology that lays under each service. This is how AWS presents users with a simplified interface using a familiar language that is easy to understand.

Default configurations matter Although offering these guardrails is a very illustrative representation of meeting customers where they are, AWS also uses them gradually to move customers toward the future vision. They do that by upgrading the management user interfaces of the services and giving more visual prominence to the features they want to promote. This is precisely what happened with AWS Lambda and how the service configurations have been evolving in time. If we look back at 2016, shortly after it was launched, the service faithfully espoused all the principles of The Serverless Compute Manifesto43 presented to the public in AWS’s re:Invent 2016 event. This manifesto reads like this: • The function is the unit of deployment and scaling. • No machines, VMs, or containers are visible in the programming model. • Permanent storage lives elsewhere. • Scales per request; Users cannot over or under-provision capacity. • Never pay for idle (no cold servers/containers or their costs). • Implicitly fault-tolerant because functions can run anywhere. • BYOC - Bring Your Own Code. • Metrics and logging are universal rights.

114

Chapter 3: Platform Services

Adhering firmly to this manifesto caused AWS Lambda to be too purist in its early releases, its capabilities too forward-thinking, and it didn’t help AWS meet their customers where they were, at least in 2016. Over time, the cloud provider gradually started to refactor AWS Lambda’s configuration Web application to support a more relaxed view of serverless44 to foster its adoption. In this case, they ensured that all the knobs defaulted to the new features that AWS wanted you to use when you started to configure an AWS Lambda function.

Defaults matter when you try to alleviate the perceived complexity of the things you are building.

115

Chapter 4: Platform Teams Holding fast, staying true Managing one of the most successful teams in the history of soccer (and actually across any other athletic discipline) puts extra pressure and responsibility in a coach’s baggage. Still, a high-performance team requires a skillset full of solid leadership principles along with substantial situational and self-awareness. These competencies are what the United States National Soccer Federation saw in Jill Ellis when she was promoted to coach the women’s national soccer team in 2014, a squad that was already enjoying a leading position in the global ranking after winning London Olympics gold medal two years before. This was a challenge that Ellis was ready to take on eagerly. As a matter of fact, and only after one year at the front of the squad, she led the team toward a new glorious achievement by winning FIFA Women’s World Cup for the first time in 16 years. Beyond that moment, Ellis’ story is one of success and failure management. Leading a team to become the best is tough, but keeping a winning team at the top is even more challenging. It requires pushing the envelope to levels that anyone has not reached before. Excellence is the habit of giving your hundred percent, so keeping your performance sustainably high and turning it into a routine is a challenge like there is no other. Despite her experience, this was a hard-learned lesson for Ellis, who couldn’t lead the team to a new success in 2016 when the Swedish team eliminated them in the Rio Olympics quarter-finals. This hiccup

Chapter 4: Platform Teams

117

in the team’s trajectory soaked them into a minor crisis during the following years, requiring a total reboot of a squad that entered into a new experimental phase. However, in those critical moments after failure, true leaders like Ellis emerge by being faithful to one’s core principles and empowering the human part of every professional to make the most of them. And that’s precisely how the US Women’s national soccer team reemerged in preparation for the next challenge, the FIFA Women’s World Cup in 2019. Ellis saw beyond the professional athletes that formed the squad and connected with their most human part by making them participants in the vision and creating spaces to leverage the national team as a platform to fight for their personal causes. Feeling Ellis’ support entitled the team to represent other women worldwide in their claims for equal pay, thus raising the morale and consequently the performance of a team that reached glory again when they won a second consecutive world cup in 2019. This extraordinary achievement allowed the US Women’s national soccer team to hit the mark of becoming winners of half of the world championships to date.

What is a platform team? Platform teams are product engineering teams that build and support internal software platforms—that easy. The key messages here are, first, that they are considered a product team by the organization. The artifacts they create are managed per the same rules and policies as any other software product in the portfolio; and second, they are considered an engineering team that builds, tests, and delivers software. It just happens that this piece of software is an internal enabler tool for other teams to build with. Now, let’s look at how to achieve that from a different prism, describing what a platform team is not:

Chapter 4: Platform Teams

118

• A platform team is not the old operations team re-labeled as a platform team. Be precautious about preserving existing silos and legacy teams so you don’t put lipstick on a pig. Not all companies know how to spin up an entirely new startup organization to cater to the unique platform architectural style, especially if you need to work with legacy ticket-driven organizations and systems. • A platform team is not a new shared services team that builds and maintains independent and unconnected services. There might be a tendency in other product teams to interpret this new component in the organization’s portfolio as a bucket of services at the disposal of their development teams. You must pay special attention to this viewpoint.

Platform teams are internal service providers that build and operate internal software platforms.

It is all about the developers The mission of your internal software platform team is to reduce the amount of undifferentiated work carried out by the service development teams. This way, developers can consume foundational capabilities in a self-service manner to focus on solving business problems. These capabilities include setting up CI/CD pipelines, spinning up environments, deploying services on the cloud, or even consuming infrastructure APIs that facilitate their daily jobs. Platforms do that by rationalizing services, principles, and tools offered by other 3rd party providers (especially cloud providers) and adapting them to your organization’s particular needs. You shouldn’t be in the business of competing with other commercially available services with your home-grown solutions (e.g., Kubernetes, Jenkins, Google Material Design System, or Amazon Cognito). Instead, it will

Chapter 4: Platform Teams

119

be more valuable to create a catalog of opinionated services and offer them as a simplified experience. As we already introduced before, developers will be the main customers of your internal software platform. Hence, you need to put yourself in their shoes and figure out how they will benefit from it. Building services, principles, and tools for developers is complex, and the Developer Experience is a fundamental discipline that you will have to pay a lot of attention to. For that reason, you will likely need to bring a team of experienced architects and engineers with you to start shaping up the genesis of what your platform should be and evolve it from there.

Structure of an internal software platform team Individual skills are essential, but you may want to design the platform team as an integrated suite of different, autonomous, selfsufficient sub-teams to keep your mission sustainable over the long term. You can split the internal software platform into sub-product domains (e.g., automation, security, core infrastructure, etc.), preserving full end-to-end ownership within each sub-team. You may also be interested in following an Open Source governance model45 for these autonomous squads. To that end, you can assign one author and a couple of highly skilled core maintainers to every field. Then, identify a group of trusted contributors from within the platform team who can swarm around tasks to build those domains’ features. An inner-source model will allow the platform team to grow organically and spread acceptance within the organization.

The critical part of spinning up a new platform team is deciding on the aptitudes and attitudes of the team members. Building a platform

Chapter 4: Platform Teams

120

is an ambitious goal requiring broader skills than just providing infrastructure services to developers, so here is a recommendation: • You will need people with the appetite to research and develop the latest technology and collaborate with the developers. These team members are individuals with a pioneering attitude, including problem-solving, problem-restatement, and out-of-thebox thinking. It is vital to meet developers where they are and help them walk the path to the new vision while resolving all the issues during the journey. • At the same time, you need people with a settling attitude who understand the consequences of introducing a continuous flow of changes into existing systems. These are typically engineering profiles with an operations mindset who look after software properties such as scalability, testability, reliability, and observability. How can you get those skills then? The decision on whether to train existing development teams or hire new talent depends on your organization’s starting point. Designing the right mix is the real challenge. Maybe you need to shop around the existing product engineering and operations teams to nurture the platform team with those aptitudes and attitudes. Complementarity, you could make the platform open for contributions outside its core team and accept Pull Requests from other internal committers. Specialized skills It is recommended that you don’t make too many fixed assignments within these squads, especially for those specialized profiles (e.g., UX designers, infrastructure-as-code developers), so every sub-team can benefit from their unique skills (Figure 28). You will likely have to make those assignment decisions based on aptitudes, individual

Chapter 4: Platform Teams

121

career plans, and a critical assurance that the contributors take full ownership of the services they are building.

Figure 28. Structure of a platform team

It is important to not mistake those specialized transversal skills with a new shared-services team. And more importantly, don’t mistake those skills with your entire platform team. The fact that they can help multiple squads does not mean that they must be separated as a new entity in your organization. Your organization will require your magic to find the balance between fluid assignments and ownership, especially for those with full-stack engineering skills.

How do platform teams fit within the organization? “Great platform teams can tell a story about what they have built, what they are building, and why these products make the overall engineering team more effective.” - Camille Fournier46

Chapter 4: Platform Teams

122

As internal software platform teams standardize and become enablers of high-performing software organizations, you will need to pay attention to their relationship within the internal structures. You can start designing the genesis of a platform-based organization by applying Conway’s law in reverse. It means that starting from a desired service-oriented enterprise architecture (made of business and core services), we will build teams around those components and let the organization slowly adapt to them. The influence is introduced from the operational model (interactions and integrations between services and their APIs) into the organizational model (communications between departments) and not the other way around. Related to this point, pay special attention to layered platform teams. That is usually a byproduct of re-shuffling the organization and dividing teams by technology layer, calling the Platform Team the one sitting at the infrastructure level. Instead, companies should look for vertically-sliced teams structured around business services where every team possesses end-to-end ownership of the capability (i.e., from UI to data). Those are the teams that will be building services that eventually will run with your internal software platform. Let’s look at the different roles needed for a platform team and how they fit within the organization.

Engineering Managers “The problems you are solving as a manager really aren’t about you.” - Sarah Drasner As introduced earlier, a platform is an internal enabler software piece for other teams to build with. It means that platform team members play an integrator role across multiple groups, giving them a company-wide view that others (even technology executives) don’t have. That is why engineers in platform teams usually embody not only the platform’s vision but the whole organization’s. Making

Chapter 4: Platform Teams

123

their work relevant and purposeful is the only way you can build something of substance. And this is a double-edged sword. What makes these engineers successful is what could make them be biting their nails. Although this is true for every product engineer, platform engineers are in the spotlight because of their key responsibilities. The moment they don’t feel comfortable with the company’s core mission and values, they will protest with their feet. As bad as it sounds, measuring the engagement level of these employees will give you a very accurate pulse of the company’s health. In a disruptive environment such as constructing an internal software platform, moving forward in a rapidly-changing atmosphere is fundamental to success. Hence, the main objective of a platform Engineering Manager is to build and preserve a healthy (both physically and mentally) environment for the developers working on it so that healthiness can be translated into quality software. The pillars of high-performance teams: autonomy, mastery, and purpose Within this hectic context, Engineering Managers need to make sure that working on an internal software platform is individually rewarding for the software engineers contributing to it. As it turns out, that’s only possible if they are granted significant levels of autonomy and trusted as creative professionals. They need to see how the software components they are building are helping other developers do their jobs and, consequently, allowing the company to move forward. Remember, platform engineers must be given problems to solve by other developers and not just tickets to fix by managers. When engineers are trusted this way, a mutual trust relationship between the engineers and the company is established, enabling out-of-thebox thinking and giving rise to very innovative solutions.

Chapter 4: Platform Teams

124

We don’t inherit the architecture from the managers, we borrow it from the developers.

Product Managers “If you want to build a ship, don’t drum up the men to gather wood, divide the work, and give orders. Instead, teach them to yearn for the vast and endless sea.” - Antoine de Saint-Exupery It will be helpful for you to assess the principles introduced by Manuel Pais and Matthew Skelton in their book Team Topologies as to how platform teams fit within the organization’s operating model. This book summarizes very well how modern software organizations should arrange themselves for success. They specifically contemplate the emergence of internal software platform teams as digital transformation catalysts, recommending several topologies and interaction models between the platform and the services teams. The book, along with all the online content and material created around it, is full of brilliant insights and proposals that will help in your endeavor of setting up a platform team. The concepts and methods presented by Team Topologies pivot around the idea of the platform as a software product, a principle that this book espouses and develops further during the forthcoming sections. This point of view emphasizes the concept of putting developers at the center as users of the platform. Then, does this mean that you have to find and assign a Product Manager to manage the platform’s roadmap? Well, again, it depends. Suppose your organization already caters to this type of role in the operating model. In that case, it may be a good idea to elevate the platform team to the same category and the rest of the product teams

Chapter 4: Platform Teams

125

in the portfolio. If it doesn’t, you will have to figure out the best way to manage the flow of features, release dates, and user expectations according to the existing organizational constructs. Since the internal software platform is probably a new asset to the organization and wrapped within some research and innovation aura, it is of utmost importance to select a suitable platform leader, especially during the initial phases of the project. Early on, it is enough to have somebody who can run the platform team as a startup, understands developer minds, explores ideas figuring things out, and makes progress by experimenting. Later on, after the hypergrowth phase, you need leaders who excel at separating the wheat from the chaff and be ruthlessly decisive about the next steps in the strategy. If the same person can play those two types of roles, that would be a fantastic win for you.

Software Architects “All software architecture is design, but not all design is architecture. Architecture represents the significant decisions that shape the form a function of a system, where significance is measured by the cost of change.” - Grady Booch The institution of a platform architectural style in the organization will also open up a few transformational opportunities for the business services that run with it (i.e., the rest of the product portfolio). This idea gains particular relevance as we suggest that internal software platforms should be based on managed cloud services with a preference for serverless computing. And that can be something too disruptive for some legacy services. Then, how can you help those service teams transform and adapt to the new architectural principles (and technologies) without introducing any business disruption? How can you run a bimodal agenda that allows you to build the new while tackling the old? The answer is that

Chapter 4: Platform Teams

126

you need a robust software architecture governance at the product portfolio level outside the platform team itself. The Software Architect Elevator As Gregor Hohpe - Enterprise Strategist at AWS - describes in The Software Architect Elevator book and his article “Would you like architects with your architecture?”47 , we can think of the following models of doing architecture in a team: 1. We can find the Benevolent Dictator model on one side of the spectrum, where an architect decides, and the teams execute. 2. On the other side of the spectrum, we have the inmates running the asylum model, where nobody is in charge of the architecture, which seems implicit. No organization leader in their right mind would deploy a team topology like this. 3. In the middle, we have the architecture without architects model, where significant design decisions are taken based on every team member’s judgment. As Hohpe points out in the referred article, it may be necessary to temporarily follow the first hierarchical model during the initial phases of the platform rollout due to its innovative and exploratory nature. However, there are always some cautions to consider. An outsider ignoring the architectural disruptions and principles introduced by the internal software platform should not play this role. Instead, the recommendation is to let the internal software platform architects do it. In time, as service teams grow in experience and advance toward the technology roadmap adopting the new architectural principles, you would want to move to a more sensible “architecture without architects” model. This idea is fundamental in a serverless environment where the cost of change is almost neglected.

Chapter 4: Platform Teams

127

Software Engineers “Serverless does not mean architecture-less.” - Gregor Hohpe While software architecture is very much needed, that does not mean that we need people playing software architect roles explicitly in the teams. Especially that kind of top-down architects sitting in an ivory tower and throwing technical specs over the fence, expecting the engineering teams to get the job done. Those days are over, and the walls between individuals do not exist anymore, as we could experience with the rise of DevOps. And this is where things start to get interesting. As Simon Wardley states in his research report about “How organizations are changing”48 , the evolution of computing from products (e.g., servers or virtual machines) to utilities (e.g., cloud) introduced a coevolution of software engineering practices aimed at helping teams to build and ship software faster. This practice was eventually coined as DevOps, a technique where organizations don’t need a centralized group of operations to gain speed to market. Back in recent days, we are starting to see how this very same pattern emerges again. Concretely speaking, we are witnessing the evolution of component run-times from products (e.g., App servers or DB clusters) to utilities (e.g., serverless). This industrialization introduces a co-evolution of practices closing the gap between architects and engineers, equipping the latter with the necessary techniques and tools to make structural decisions at an increasingly lower cost of change. Managed Services Thinking This emerging engineering practice is all about Managed Services Thinking in contrast to more traditional Systems Thinking. As David Anderson describes in his article, we are starting to hear about “The

Chapter 4: Platform Teams

128

Evolution of the Serverless-First Engineer”49 , a type of architecturewoke profile that can make architectural decisions based on personal judgment. In other words, these are engineers who can swarm around complex tasks and do not need a specialized team of architects to design resilient and scalable systems. The following Wardley map represents this idea (Figure 29)

Figure 29. Wardley map representing the emergence of serverless architecture practice

If we develop this idea further, what would the future of software architecture look like then? • The evolution of cloud technology keeps bringing new efficiencies, such as auto-scalability, managed run-times, and millisecond pricing. • These new capabilities are triggering a shift in engineers’ behaviors who no longer want to cosplay cloud providers. Nonetheless, serverless computing makes architecture decisions for you utilizing baked-in assumptions and defaults. These abstractions empower software engineers to make structural decisions at a meager cost of change. • At the same time, while these new practices require less spe-

Chapter 4: Platform Teams

129

cialization, they share the same goals as more established and traditional methods (e.g., Systems Architecture and Serverless Architecture are identical, but one is more next-gen) • Finally, new needs will emerge from these novel techniques, and unique value opportunities will arise naturally. For example, cloud vendors may need to provide guidelines for well-architected designs that engineers can easily translate to policies-as-code. Also, new sets of tools may emerge to help engineers better understand the cost impact of their design decisions (e.g., FinOps). The goal is to assist engineers willing to unleash and automate these new value opportunities within their build and deploy pipelines. The following Wardley map introduces the co-evolution of architectural practices brought in by serverless computing (Figure 30)

Figure 30. Wardley map representing the values enabled by serverless architecture

In summary, we might very well be at the gates of another significant shift in the way we design, build, and deliver software. In the coming years, we’ll see organizations starting to adopt this recommended model and adapting their talent acquisition strategies accordingly in

Chapter 4: Platform Teams

130

need of serverless-first engineers. With the help of engineers, design feedback will flow better amongst balanced teams, decisions will be validated faster with code prototypes, and tradeoffs will be better documented to mentor junior engineers. All this is something good for Software Architecture as a discipline. Yes. These are exciting times for software architecture. Nowadays, starting up a new software product (or even a company) involves almost a 0-day technology decision if we fully lean on managed cloud services. However, there is a risk that these decisions are based on inertia (or cognitive bias) and not on a proper assessment of the actual business requirements. The law of the instrument is a real thing, and “if the only tool you have is a hammer, you will treat everything as if it was a nail.” That’s why it is crucial to consider and analyze all the tradeoffs of your technology decisions, so you don’t architect yourself into a corner. You don’t need an army of architects. Instead, the development teams need to be conscious of the design tradeoffs and how the architecture needs to evolve, adapt, and refactor to harvest re-use as it occurs.

What value do these teams create? What is fascinating about introducing a platform-based architectural style is witnessing new types of talent and attitudes emerge thanks to the industrialization of cross-cutting functionalities and the heavylifting of undifferentiated work. In more traditional monolithic and serverful methods, teams have to deal with a more mature (and even dull) technology market with well-defined practices and standards. Now, the introduction of a

Chapter 4: Platform Teams

131

new cloud-based serviceful approach is helping you in moving the workforce up the value chain to pick up roles that have to deal with a higher frequency of changes, uncertainty, and most definitely, innovation: • Operations teams are now in charge of designing parts of the core platform infrastructure based on their feedback from real customers, implementing those components using infrastructure-as-code scripts. Also, new opportunities will pop up for introducing innovations in observability and monitoring for serverless applications, even with the possibility of building their custom tools while this is still a forming market. • Service developers will see themselves spending more time closer to the business, building domain-specific solutions for the final users, and handling repetitive and undifferentiated tasks less frequently. • A new platform team will emerge between those two groups, creating new channels to connect the business with the infrastructure effectively. This new team will enable developers and engineers to focus on building, testing, releasing, and operating the services that compose your on-demand software. To that end, developers will consume platform capabilities in a selfservice manner. Platform teams help developers create and unleash value for the organization as they become the primary users of the platform services, guidelines, and tools. The following Wardley map depicts this idea, identifying three types of experience areas where an internal software platform can have a very positive impact: the operational experience, the developer experience, and finally, the user experience.

Chapter 4: Platform Teams

132

Figure 31. Wardley map showing the impact of the internal software platform on team experiences

Build vs. Buy patterns for the software platform As mentioned earlier, the mission of the platform team is to carefully curate a catalog of opinionated services and offer them as a simplified experience. How does a platform team decide when to build a new service and buy one from the underlying cloud platform to achieve the desired level of customer experience? There is no easy answer. It all depends on the context of every problem you are trying to solve and who the user is. Here is a decision framework, though: unless it is strictly necessary, you don’t want to be in the business of building and maintaining a wrapper service on top of a widely accepted standard or technology. For example, if you select Jenkins as your CI/CD tool, you wouldn’t want to abstract the access to that service to the users (i.e., developers and operators) through a custom-built UI or API. It just does not pay off, it is not your core business, and it is not sitting in the critical path of your capital flow as a platform product. Instead, you may want

Chapter 4: Platform Teams

133

to open access to the tool and build a shared library accessible by all engineers in the organization to build pipelines as code in an easy and standardized way. On the flip side, and as another example, if you are following an Event-Driven Architecture as an integration style for your software products, it is very likely that you will be leveraging some of the underlying cloud platform services. Hence, it may be helpful to wrap all the logic for topic registration, topic subscription, and event publication with an HTTP API for the sole reason of allowing external systems to integrate with the platform without exposing all the infrastructure implementation details. Building this core service is a piece of work that, together with its associated language-specific SDKs, can perfectly fall under the platform team’s scope. It will offer a simplified and empowering experience to internal and external developers from partners and clients. There is also a Build vs. Buy decision at this level, the platform development level.

The Operational Experience: toward NoOps With the introduction of managed cloud services (with a focus on serverless) as a technology foundation, the internal software platform team is better positioned to promote a way of working closer to NoOps than to a more traditional DevOps. There is no operations organization involved in provisioning and deploying services during the software development lifecycle. There is no need for the developers to interact with these teams to get things done. Also, there is less time spent doing operations tasks or explaining what needs to be done to someone else. All that developers need to do is to focus on writing services and configuring a build pipeline. This way, their components are pushed to an artifact repository and deployed into a platform environment using

Chapter 4: Platform Teams

134

infrastructure-as-code scripts. As a result, this development process will brutally improve your software’s time to market. As we introduced previously in this book, your customers will indirectly benefit from this low operations method brought to light by the internal software platform. System engineers will spend less time managing legacy infrastructure, administering container fleets, or applying security patches. Instead, they will now spend quality time observing and monitoring the platform’s operations, focusing on cybersecurity and performance, and contributing to product roadmap based on customer data.

The Developer Experience: first-class citizens Developer Experience (DX) has nothing to do with how many years a developer has worked as a professional. Instead, it is a specialization of User Experience (UX) when the user is a developer. For this reason, you can perfectly apply many of the fundamental aspects of the UX discipline (if not all) to the DX field to turn every developer’s work into a joyful experience. The scope of this discipline encompasses diverse programming assets such as system APIs (HTTP or programmatic) and developer tools, introducing several factors to be considered: • Flat learning curve. It means that the technology is easy to adopt, and developers can be productive very quickly after using it for the first time. • The right level of abstractions. The API or tool provides utility functions and syntactic constructs to help developers translate their intents into bug-free code. • Good documentation. There is comprehensive and easy-tofind documentation available to developers that includes intelligible references and examples.

Chapter 4: Platform Teams

135

• Community support. There exists a professional, responsive, supportive, and inclusive community behind the platform, with fingertip **access to every developer. These factors exemplify why Developer Experience is so important when building your internal software platform, as they contribute enormously to its adoption. Be careful you don’t die by a thousand paper cuts. Even the tiniest flaws in realizing these elements (i.e., an undocumented API breaking change or lack of feedback after a feature request) may completely ruin your platform strategy and all the business value that it enables. Developers are now your direct customers and first-class citizens in the new software organization. Hence, you want to make sure you meet them at the right spot, and you give them good reasons to trust your system and your architecture, most probably over theirs. Internal software platforms must be designed to delight their customers - the business service development teams.

Like any other product, platforms need marketing strategies and campaigns to reach specific target audiences and convey a message that fosters adoption. Indeed, marketing for developers is an exciting emerging field. To that end, several companies are creating specific roles in charge of this activity, known as Developer Advocates (or Developer Relations). A Developer Advocate’s job is to foster community adoption and build an army of product ambassadors by creating marketing strategies, acting as customer zero, producing content, and especially collaborating with engineering teams to help them move forward. By doing so, Developer Advocates can harvest all those experiences and harness the enthusiasm of early adopters to enrich the internal software platform they represent.

Chapter 4: Platform Teams

136

This whole idea of a developer-centric internal software platform is revolutionary. It focuses on helping software engineers with in-house, cross-cutting, independent, and loosely coupled services instead of the traditional product-centric approach that cared much less about that. To gain acceptance for your internal software platform, try to favor on-the-field engineering guidance over managerial mandates. Working closely amongst colleagues in pair programming sessions can help development teams to gain confidence using the platform. Also, it will help you drive adoption from the bottom up. A knowledge gap may exist between the platform and business service development teams, making things harder. Still, the ideas introduced by the Developer Experience discipline and all the roles that come with it will help you bridge those gaps. Nothing beats having buy-in from the developers you are trying to help. That’s the level of sponsorship you are looking for.

The User Experience: Programmable Engagement Services You may be thinking that Engagement Services is a euphemism for other more familiar terms such as applications or even UIs, and you are not far from the reality. However, there is something that those two terms tend to hinder or even take for granted, and that is the importance of user engagement in on-demand software. As introduced earlier, offering superior user experiences to the final users is helping companies gain a dominant position in a marketplace where the vast majority of the new business transactions happen online. Hence, choosing the right technology strategy can be a matter of life and death in business terms.

Chapter 4: Platform Teams

137

As software organizations transition to on-demand software, there are a few challenges that need to be solved. Chief amongst them is figuring out how to help customers differentiate in the User Experience arena, especially when many of them buy the same software from you. Although the obvious solution is to make the software configurable, this configuration comes with a cost derived from the assumptions and tradeoffs that software designers factored in the code when this was built. Not every system parameter can be made configurable, and if you push this configurability to the extreme, the resulting system is very complex to be managed by a business expert. The constraints naturally embedded in every low-code configuration tool do not help those customers whose primary concern is building a compelling customer experience for their final users. And, in a postpandemic world, these types of customers are increasingly becoming more frequent. You can’t boil the ocean, and you can’t factor in every single UX configuration parameter. That’s why, most probably, your customers are interested in building this capability rather than buying it from you. The following diagram (Figure 32) depicts this whole idea, where your customers are responsible for programming their core business Apps, and you provide the rest of the components in the technology stack.

Chapter 4: Platform Teams

138

Figure 32. Programmable engagement services are the tip of the technology spear

What does that mean to you as a software house? There is a sweet spot in your strategy to focus on building configurable software products for your customers’ undifferentiated processes, especially in mature and regulated industries. Regardless, you may also be interested in creating the tools for them to build next-generation engagement services themselves. Your SaaS solution should indeed include built-in user interfaces with carefully designed experiences; it just happens that you are likely not exposing those interfaces to your customers’ final users. For those cases, you don’t want to leave your customers alone in their endeavor of building a competency that may be crucial to their success. How can you do that? The real boost in your on-demand software strategy will be creating the necessary development tools so your customers can build their core user experiences on their own. You could give companies the means required to set themselves apart in the marketplace, and that’s something you can do with programmable engagement services.

Chapter 4: Platform Teams

139

The idea is straightforward: on top of the platform’s Design System, you could create a series of business-oriented UI components that give developers the necessary out-of-the-box visual capabilities and prebuilt integrations with the software APIs. This way, you could help those companies accelerate the creation of their core user experiences by employing pre-packaged functionalities. At the same time, you allow them to design these experiences at their own pace. Why is this important? Although the Design System component catalog may help your customers’ developers implement their user interfaces, this may be too low-level for them. They still need to make design decisions about how to group them all on the screen, and, more importantly, they need to integrate these components with the proper service APIs so you can map the data back to the screen. This is indeed better than manual development, but you can do better for them. Remember, your customers are very interested in building domain-specific functionalities for their user interfaces, so creating that type of abstraction will help them succeed.

Interacting with platform teams The way platform teams work is straightforward and practical: 1. They collaborate with service development teams to understand their challenges and the problems they need to resolve. 2. They harvest common technology patterns across the different development teams within the organization to maximize the benefits of reusability. 3. They industrialize transversal solutions, which are then offered as managed services to the development teams to build domainspecific software.

Chapter 4: Platform Teams

140

Team Topologies provide suitable constructs to help us visualize how the service development teams and platform teams collaborate and how these interactions evolve. Precisely, as Matthew Skelton has mentioned multiple times, the evolution of groups and their relationships is a crucial aspect of this methodology. The following picture uses Team Topologies standard shapes. It depicts how the interactions between teams evolve as the platform matures, starting with close collaboration and moving toward the industrialization of managed services.

Figure 33. Interactions between different teams using Team Topologies

We have a few elements in this Team Topologies diagram worth explaining: • The internal software platform is a unique product within your organization, as it becomes the physical manifestation of the principles of a new architectural style. Establishing a close relationship between the organization’s architecture governance bodies and the platform team is essential to the mission. While the former team facilitates the organization’s technology vision and direction, the latter enforces those principles through development tools and cross-cutting services. • In the spirit of iterative and incremental development, internal software platforms must start as small as possible, solving real-

Chapter 4: Platform Teams

141

world problems. To that end, the best way to approach this pivotal endeavor is by collaborating closely with the development teams. There is no other way, and platform team members are avid collaborators. • After this period of collaboration, the platform team creates cross-cutting solutions (and tools) and offers them to the development teams as managed services. This approach enables a worry-less environment for these development teams, who can now focus on their domain-specific responsibilities. • Also, once all the industrialized platform services are deployed to final customer environments with the business services, a team of Site Reliability Engineers (SRE) takes responsibility for operating these components on production. In the spirit of true DevOps, it is important to note that the SRE team is not a siloed and independent group. Instead, it is a tribe of people with specialized operations skills nurtured from the development and platform teams.

Mastering industrialization At this point, it is worth highlighting an excerpt from McKinsey’s “SaaS, open-source, and serverless” article mentioned in the first chapter of this book. The following fragment focuses on the importance of the dynamics between the platform and the stream-aligned teams described earlier: “Once a successful greenfield project has been completed, organizations can start scaling serverless and looking into migrating legacy systems. Here, one failure mode is inconsistent approaches and duplicating experimentation between different silos. The best success is seen when organizations invest in a platform team to develop the serverless best practices and provide them via in-house building blocks “as a service” to the rest of the organization. One of

Chapter 4: Platform Teams

142

the benefits of serverless over previous cloud technologies is the granularity of the encoding of best practices. With the right tooling and approach, many common application components can be standardized, audited, and reused with ease, further improving the productivity of the tech teams.” This pattern repeats for every stream-aligned team, who will adopt the internal software platform at different paces as the platform matures and keeps industrializing and curating developer experiences. The challenges of platform adoption are discussed in detail in the next chapter of this book.

Chapter 5: Platform Adoption Leapfrogging Leaning on the rapid advancements of technology, the open government doctrine became popular in countries worldwide during the first decade of this century with the goal of increasing transparency with their citizens. Chief among those efforts, the Open Government Initiative in the United States was introduced by President Obama’s administration in 2009 when massive amounts of information were made accessible through Web portals (and even APIs) to the general public, who started to demand the data that was being used to represent them. In this spirit of openness, Aneesh Chopra was appointed as first CTO of the country in 2009, embodying the principles of this initiative exemplary and launching numerous programs to connect the administration to citizenship and spurring innovation through entrepreneurship. However, the blast radius of Chopra’s actions and influence reached unprecedented heights that went beyond his own home country. Early in his tenure as CTO, Chopra coordinated the first US-India Partnership on Open Government50 to harness modern technology to enhance democratic accountability and directly impact people’s lives, especially in rural villages with no access to technology. After this moment, Chopra kicked off a few positive pilot projects in India (such as installing broadband connectivity in remote areas) that allowed country areas to leapfrog many levels of intermediate steps

Chapter 5: Platform Adoption

144

in economic and educational development, avoiding some of the 20thcentury mechanisms for delivering services and going straight to the 21st. Leapfrogging has its connotations within the competitive market space too, and it is related to the idea that a company can leapfrog another more successful one. It is not about leveling the playfield anymore, but instead going beyond competitors and establishing a new way of doing things, typically through some architectural or even radical innovation. That is how you open a new market. To help with this, the architectural style introduced by internal software platforms supports those software organizations seeking this transformational boost to jump from the product-based market directly into the services-based market. Doing it in the right way will set the difference between an organization that thrives on change and another one that cannot cope with evolution.

The Digital Platform economy As software organizations transition toward on-demand software, implementing a platform-based architectural style can be an arduous endeavor. It would be nice if every team started their platform adoption journey from the same place, and if that is a greenfield without previous conditions, that would be ideal. But the reality is different. Past bad experiences drive leadership teams to talk about internal software platforms with disdain. For example, imagine spending years optimizing a CI/CD pipeline to improve the SDLC for the developers only to find that nobody is using it because they don’t know it exists. Or, as another example, imagine the organizational frictions created by relabelling the old ticket-driven operations teams as the new platform team only because now they operate on the cloud.

Chapter 5: Platform Adoption

145

These bad experiences create frustration and make some technology executives question why transitioning to on-demand software is crucial to the organization or how internal software platforms help with that journey. For that reason, and in the spirit of the widely adopted agile principles of the software industry, integrating a new internal software platform is an effort that you must approach iteratively and incrementally. It consists primarily of classifying technical debt and refactoring for non-interrupting replacement of the business services.

Figure 34. Types of innovation. (Credits: Viima blog)

This entire chapter focuses on the adoption of internal software platforms as an architectural style, emphasizing the idea of the platform’s microkernel as a catalyst for incremental and architectural innovation51 (Figure 34)

Chapter 5: Platform Adoption

146

If development teams can’t operate in a self-service manner after introducing an internal software platform, it means that they are not unleashing all their potential.

Legacy replacement Although adopting a platform-based architectural style may not look like a legacy replacement exercise in the first place, there are attractive technical revitalization opportunities for the business services. The architectural and technology principles espoused by this new system (especially those related to serverless) will introduce high levels of disruption to systems running without significant design changes for years. Implementing a new platform strategy triggers a compelling event that brings good opportunities to renovate and improve the architecture of those services.

Applying one change at a time may be challenging when you introduce a new central piece in the architecture in charge of vital and cross-cutting concerns such as authentication, authorization, logging, monitoring, and potentially even new integrations. To that end, good legacy replacement patterns can help you approach this task with more confidence and certainty. Chief amongst those patterns, “Branch by Abstraction”52 provides a well-documented technique for gradually making changes to software architectures without business disruption while the change is still in process. Although this pattern was initially conceived for applying low-level changes in the source code, this book describes how you can use it at a larger scale for systems integrations (Figure 35). Based on the assumption that all the existing business services are exposed through HTTP APIs, the idea behind this pattern is very simple:

Chapter 5: Platform Adoption

147

• Introduce the platform as a new component in the architecture that captures and proxies all the interactions with the legacy business services by acting as an innocuous and transparent new API. • Incrementally change the business service clients to point to the new component API that, behind the scenes, still routes to the old one. • Gradually change, upgrade or replace the business service with new components. • Replacing the legacy business services with new components also means that they may be exposed through a new HTTP API version through the platform, so the clients need to be updated too.

Figure 35. Martin Fowler’s visual representation of the Branch by Abstraction pattern

The question of how business services are distributed through the platform to the final end-users following this pattern is described in the coming sections. You will find a model based on three levels of maturity accompanied with reference architectures using AWS’s managed cloud services.

Chapter 5: Platform Adoption

148

As you will notice, these sample architectures are illustrated using the platform microkernel introduced in the previous chapter. This pattern is augmented with additional components such as Amazon CloudFront distributions for content caching or Amazon Cognito for user authentication. Also, a core integration service enables end-toend connectivity between the client components and the technical APIs of the business services. This solution must still be seen as a reference architecture. It does not necessarily represent a target state for many readers since every internal software platform is different and contextualized to the needs of the developers in the organization. The patterns documented in this chapter will help you reach the desired serviceful architecture iteratively and incrementally. Furthermore, due to this sample platform’s theoretical (and somewhat academic) nature, all the diagrams included below are intentionally highlevel and do not show detailed integrations between the components. Internal software platforms are culture-changers

External Business Services The most straightforward approach is usually the first step that business service development teams take to integrate their business service with the internal software platform. It involves two easysounding steps: 1. HTTP API enablement of the business service. 2. Client Apps exploiting those APIs. This is an external business service.

Chapter 5: Platform Adoption

149

In fact, if the business service doesn’t have a UI, then obviously, only step 1 above is required. We can consider this type of service external because the service is running outside the boundaries of the platform’s cloud core infrastructure. In this scenario, the platform is introduced merely as an integration framework providing a minimum of API and monitoring management capabilities. At the same time, there is a marginal disruption to the business service team, who keeps taking care of how the service is built, released, deployed, and operated. While you most certainly would like to see that the business services move toward the principles that your internal software platform espouses regarding AWS technology and architecture, that is not fully necessary for first-pass integration.

Figure 36. External Business Service

The business service components themselves remain outside the development and operational frameworks of the internal software platform. Usually, that means they stay in their existing environments for these activities. They expose the main functionality through technical APIs managed by the platform through its solution based on Amazon API Gateway. The analogy is including an API from a partner over which you have no control - you ask just that the API is accessible through HTTP and that you have an endpoint to which you can route API requests that come through the platform. This means that the business service provider team will implement

Chapter 5: Platform Adoption

150

their build, release, deployment, and operations methods. • Using the platform’s build pipeline is not mandatory, yet recommended to build and push artifacts to the repository automatically. • The use of the platform’s deploy pipeline is not mandatory, and services teams can deploy their business services using the deployment tools and procedures.

Managed Business Services The second step in the business service maturity model offers an intermediate approach between entirely external (initial) and fully integrated (target). It involves three criteria: 1. HTTP API enablement of the business service. 2. Client Apps exploiting those APIs. 3. Business Service components are built and released following the principles and procedures from the service teams but deployed and supported by the platform operation teams on the AWS cloud. This is a managed business service because the service is still running outside the boundaries of the platform’s cloud core infrastructure. Although the business service provider takes care of how the service is built and released, it is now deployed on an AWS cloud environment that can be easily peered to the platform infrastructure and managed by the centralized platform operations team.

Chapter 5: Platform Adoption

151

Figure 37. Managed Business Service

This design brings better integration capabilities to the business service since you can deploy it on its own AWS account. This way, the business service can now leverage a more extensive set of the platform’s capabilities regarding networking, logging, data, monitoring, events, and other procedures thanks to the integration capabilities between the AWS accounts. Also, the result of this approach is that the APIs are still available via the platform’s API management solution with Amazon API Gateway and routed to the actual implementation in the second AWS account. This facilitates API authentication and authorization implementation, leveraging the platform’s Identity Management solution based on Amazon Cognito and custom AWS Lambda authorizer functions. The point here is that the managed business service implementation in question is now built and released by the business service provider team. At the same time, the deployment is considered to be operated by the platform’s team. Therefore: • Using the platform’s build pipeline is not mandatory, yet recommended to build and push artifacts to the repository automatically. • The use of the platform’s deployment pipeline is mandatory, even though the central platform operations team will use infrastructure-as-code scripts to get the business service infrastructure defined and get the artifacts deployed.

Chapter 5: Platform Adoption

152

Platform-Native Business Services The most potent form of integration, and the one to encourage the most, is the full cloud enablement of the business service components exploiting all the platform’s services, tools, and principles (which at the same time enforce various overall architectural and development standards and practices without being over-intrusive). Criteria to achieve this level of maturity include: 1. HTTP API enablement of the business service. 2. Client Apps exploiting those APIs. 3. Service components are built, released, deployed, and operated using the platform’s build and deploy pipelines. 4. Service components are upgraded or replaced by features built with serverless computing. The result is that the business service components can be deployed and managed together with the internal software platform. That means, for example, that the service has to be cloud-enabled (deployable in the cloud), with a very high preference (possibly even an insistence) upon at least containerization and serverless architecture, cloud-managed load balancing, HA, and DR.

Figure 38. Platform-Native Business Service

Chapter 5: Platform Adoption

153

Platform-native business services are fully hosted with the internal software platform and take full advantage of digitally-enabled means provided by its core services. It also means that the software must adopt the existing platform and business services for functionality it catered for itself in the past (e.g., authentication/authorization, messaging, logging, monitoring, events, file transfer). You will also have to offer onboarding guidelines to cover more and more of these aspects and, coupled with that, maintain a proper curated inventory of existing services.

A multi-competence strategy This platform adoption model based on three stages of maturity for the business services can help software organizations transition toward SaaS more effectively. Consequently, the challenge presented to you as a software engineering and architecture leader is figuring out how to build the vision while tackling the legacy, which is typically still the primary source of revenue. Your organization still sees most of its legacy software products as cash cows. That is not bad per se since a healthy portfolio needs the right mix of products in each category. However, as a software provider, you will have to support customers sitting at different stages of this maturity model. While new customers receive the latest shiny versions, upgrading existing ones without business disruption is a challenge like there is no other.

The innovator’s dilemma This territory is fascinating, and it is known as the innovator’s dilemma, which is not unfamiliar to other engineering companies (software and non-software) dealing with shipping products sitting at different points of sophistication. Despite the noticeable differences compared to software development, there are similitudes with the type of engineering challenges that SpaceX is experiencing:

Chapter 5: Platform Adoption

154

• This leading space manufacturer needs to send stable payloads out to space using Dragon (their cargo spacecraft), so they need to focus on reliability, safety, and reducing deviation as much as possible. • At the same time, SpaceX is introducing production-ready innovations with Falcon (a family of reusable rockets), so they are okay with failure as long as it does not happen in critical parts. • Finally, they keep running iterative and incremental experiments with the Starship system (their futuristic and fully reusable orbital vehicle) in an exemplary display of agile engineering focused on embracing change and reducing its cost. They don’t love failure, but they tolerate the process. There is a lesson for modern software organizations embedded in this exemplification too. We can work backward and transfer this pattern into the platform adoption problem space by leaning on one of the multiple artifacts that Simon Wardley (the creator of Wardley mapping) presents through his public book and other online contributions. The model in question guides organizations to think and act upon context, helping them understand the differentiating value they can get from various methodologies when applied at different industrialization stages. The following picture (Figure 39) describes this model.

Chapter 5: Platform Adoption

155

Figure 39. Engineering competences. (Credits: Simon Wardley)

We can obtain several conclusions by matching this model to the adoption pattern presented earlier. However, one outcome stands above all: you need to select the right competence for the right job. There is no one-size-fits-all when it comes to software engineering practices.

Outsourcing practices External services are still your organization’s principal source of revenue at the start of the transition and will most probably remain so during the initial stages. You will have to support various customer instances and add new features to keep your competitive status. Hence, adhering to the mid-term roadmap you have already committed and communicated is of utmost importance. Development teams need to focus on reducing deviation from this roadmap and ensuring the maximum quality of service for the services supporting the customer’s businesses. That is vital to these

Chapter 5: Platform Adoption

156

development teams even after first-pass integration with the platform. Whether your organization follows a Six Sigma approach to achieve this goal or not, that’s a process implementation detail.

Lean practices Managed services will present the first transition challenges since these are the first pieces to move closer to the platform’s cloud-based architecture. It can be seen fundamentally as a lift-and-shift exercise without major technological transformations. Even with legacy systems, it is very probable that you could run some of them on cloud virtual machines without much disruption. Of course, a few quick wins for leveraging cloud managed services will pop up during the integration with the platform, such as containerizing applications or migrating to managed load balancers on the cloud. In contrast, others will remain more complicated, as it happens with data migration to cloud storage, which requires its dedicated project. Moving business services to the managed stage will help development teams learn and identify other challenges in decomposing services into smaller pieces for better platform integration and adoption of managed cloud services. Therefore, these teams must pay attention to any degradation in the services provided to your existing customers during this transition. And this is where you and the service development teams must be careful. Failures may be acceptable in those cases where there is an exchange for future-proofing and learning opportunities that help you advance toward the vision, as long as those failures do not happen in critical parts. Otherwise, you will get yourself involved in customer escalations that will only bury you under waste and set you apart from your final goal.

Chapter 5: Platform Adoption

157

Agile practices Platform-native services are the most potent form of platform and serverless integration. They don’t necessarily have to be built and deployed as microservices. Still, they have to be small, with a clear functional bounded context, loosely coupled between them, and highly cohesive. The different business service teams have the autonomy and control to decide the granularity and size of those services following the indications of the portfolio architecture group, which should be in line with the principles espoused by the platform. This maturity stage is both the target state for the existing business services and the initial step for the new ones built under the canons of this architectural style. Hence, they are built using agile practices to embrace change while reducing costs. When development teams reach this point, you will be realizing and materializing your vision. Migrating legacy systems that still run as monoliths and decomposing them into smaller platform-based services is complex. It is a whole science in itself. There are plenty of materials documenting patterns of legacy displacement, such as online articles written by Martin Fowler’s team53 and classic books such as Eric Evan’s Domain-Driven Design. To set you for success, this book recommends going through those materials to grasp a deep understanding of how to approach this task with confidence.

Building platform-based business services For the business service development teams to tap into the virtues of internal software platforms, this chapter offers a couple of guiding principles to follow when designing and implementing their domainspecific services.

Chapter 5: Platform Adoption

158

Building software is simple, but building simple software is the hardest thing there is.

Designing for user impact “Our responsibility as engineers is to ensure that the software we put out into the world is helpful, not harmful.” Angie Jones Every service should be designed by agile teams who exercise best design practices (such as Design Thinking and Domain Driven Design) to capture business requirements and develop a helpful solution for the users. In a serviceful architecture like the one we are promoting in this book, each service should ideally solve a specific problem, and only one. In other words, each service should do one thing and do it well. But, how to define that one thing? Of course, there are many levels of abstraction from where you can look at this, but basically, we are talking about clearly defining the bounded context of the services from a business point of view. Those bounded contexts are characterized by having all the entities and components delimited inside clear functional boundaries. As a result, business services are highly cohesive and loosely coupled with others. You would not like to be too prescriptive from a platform perspective for much of the internal architecture and design of the components that form up a business service. This prescription though is about how those components interact with other services running with the platform and how they get deployed, operated, or managed. Those aspects indeed will affect the architecture and design of the business service.

Chapter 5: Platform Adoption

159

The creative imperative still remains to allow the service developers the freedom to produce the best componentry that fits their needs. As a side effect, it relieves the platform team of the arduous (and unnecessary) responsibility of policing the minutiae of software standards and practices.

Factoring in your organization’s domain expertise As domain and technology experts, your organization finds itself in a privileged position to factor all its market expertise into the functionalities of the business services, as this is what differentiates you from the competition. This expertise should be offered as the default option to the end-users. It would be beneficial for service development teams to identify what parts of this expertise should be configurable. This way, developers can extract these functionalities out of the core implementation in case customers want to tweak them for their needs. As a rule of thumb, make the common case easy by providing a very thin and high-level configuration abstraction to configure things that frequently change from customer to customer. Then, allow the exception possible and offer a second level of configuration abstraction to apply more advanced changes. In both cases, business service teams should consider adding their view of the world as constraints in these configuration abstractions offered to the users. Amazon’s API Mandate Again, Amazon is a flagship reference in this space too. The legend tells that back in 2002 and in pursuit of new levels of business scalability, Jeff Bezos issued what is known as Amazon’s API Mandate, a backbone to the company’s technology evolution. Putting technical details aside, the mandate lays out the founda-

Chapter 5: Platform Adoption

160

tions to establish a pure service-oriented enterprise, a principle that the company still espouses today and became one of the most referenced materials in modern API architecture literature. This mandate included the following points: • All teams will henceforth expose their data and functionality through service interfaces. • Teams must communicate with each other through these interfaces. • No other form of interprocess communication will be allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network. • It doesn’t matter what technology they use. HTTP, Corba, Pub/Sub, customer protocols - doesn’t matter. • All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and prepare to expose the interface to developers in the outside world. No exceptions. Whether or not Jeff Bezos issued this famous memo to Amazon employees, the value of the principles outlined in this manifesto has been demonstrated empirically and proven to be essential. It is valid for the success of Amazon as a company and served as a foundation for how AWS builds and delivers its infrastructure services with an API-first approach today.

With Conway’s Law in hand, service-oriented enterprises stimulate the creation of serviceful software architectures. Whether those services are designed as independently deployable artifacts or modules of a more monolithic system is an implementation detail to resolve by software architects and engineers. Also, services can communicate to each other through processes within the same runtime or by payloads

Chapter 5: Platform Adoption

161

over the wire. A straightforward interaction interface between teams leads to highly cohesive and loosely coupled software systems, two of the most essential software design properties since it is a thing.

Building Minimum Viable Products “It’s not always possible - or wise - to develop an entire service all at once, but there is a big difference between creating a usable service then improving parts of it, and creating a desert island of a product and expecting your users to swim to it” - Lou Downe Business services teams should not aim at designing and building big cathedrals from the beginning. It does not mean that these development teams should start coding without analyzing the problem they want to solve, but instead that they shouldn’t try to solve all the issues at once. Especially problems that they don’t have yet. Development teams should follow an incremental and iterative method. They can deliver the most critical service pieces with the highest value during the first versions, thus solving real problems. Remember, an MVP is something releasable that you can give to your customers. It may not be perfect (e.g., a service without a UI but with a good API), but it helps your users solve their challenges. Whole teams As much as possible, your organization must allocate vertically-sliced teams that take full end-to-end ownership of the business service, from the UI to the storage. To that end, it will be very beneficial

Chapter 5: Platform Adoption

162

for development teams to follow an Open Source approach and identify authors, core maintainers, and contributors for the services. This way, development teams can receive multiple contributions from different members from different parts of the organization. The ownership of these teams should be embodied by the authors and core maintainers who also hold accountability for the service roadmap, whereas contributors can come and go. In general terms, it would be best to favor mentoring and crosstraining over throwing documentation over the fence with the vain hope that other engineers produce a sensible deliverable with it. Instead, and by the course of practice, engineers have found mentoring tactics very productive. You now have a great opportunity to foster this practice thanks to an organizational topology where the platform team is in charge of doing most of the R&D and producing enabler features for the business services to use. And this is very powerful. It means that you have the opportunity to spread your technology vision and teach others how to design and build their services to make them compatible with the new platform-based architecture. Consequently, mentoring junior team members should be the main form of talent investment in your organization instead of unnecessarily increasing the number of software engineers assigned to a team. Think about scaling development teams vertically rather than horizontally.

Developing business services for the internal software platform would typically mean working with many unknowns due to its iterative and innovative nature. These uncertainties need to be figured out during spikes or research activities even before committing to deliver a business feature. It may be that the underlying internal software platform partially supports what the

Chapter 5: Platform Adoption

163

service team wants to do. Alternatively, it may be that you first need to build an entirely new capability in the platform to support a business need. AWS Lambda Reference Architectures AWS promotes and facilitates the development of proper serviceful architectures54 , giving software engineers and architects the technology means to do it efficiently and at scale. Concretely speaking, the cloud provider advocates for serverless computing services to implement smart endpoints and dumb pipes55 so that engineers do not have to worry about managing the provisioning, security, and scaling requirements for the software’s underlying infrastructure. Among such technologies, AWS Lambda and its inherent eventdriven nature lead software engineers to write business logic as small functions that typically react to different kinds of events. To help these engineers shape their MVPs under the tenets of this particular architectural style, this section captures and illustrates the most crucial reference architectures using AWS Lambda. API-driven

One of the most common use cases when building modern services is when a user needs to access a Web application that calls an API that exposes business functionality that accesses a storage layer to read and write data. The following picture (Figure 40) shows a design using AWS components for this use case, where the user interacts with an Amazon CloudFront distribution that serves the UI component from Amazon S3. After the user is authenticated using Amazon Cognito, the Web application makes API calls to the APIs exposed on the Amazon API Gateway, which routes the request to the business logic component running as an AWS Lambda function. This function interacts with Amazon DynamoDB to read and write the necessary data.

Chapter 5: Platform Adoption

164

Figure 40. API-driven service pattern

Event-driven

Another repeated use case in AWS serverless computing is when a business service needs to execute some business logic after another managed cloud service raises an internal event. This event could capture things like putting an object on an S3 bucket, raising a CloudWatch alarm, or even creating a new notification in Amazon SNS. The following example (Figure 41) shows an AWS Lambda function automatically triggered by Amazon S3 after another business service puts an object on a bucket. During this process, AWS generates an internal event object that contains all the necessary data about the event source and is passed down the chain so the AWS Lambda function can use it as an input for its business logic.

Figure 41. Event-driven service pattern

Chapter 5: Platform Adoption

165

Data-driven

This is a pivotal pattern as data-intensive systems keep gaining relevance in our architectures, especially for real-time data processing. In this use case (Figure 42), there is typically one or more data producers, a central component that ingests and stores this data in the form of streams (on Amazon Kinesis), and a business logic component (on AWS Lambda) that processes those data streams to accomplish some sort of transformation for later analysis (on Amazon Redshift)

Figure 42. Data-driven service pattern

As we can see through these patterns, AWS Lambda is a convenient managed cloud service that facilitates developers to create software products following an iterative approach. Using AWS Lambda, they start small at the function level and add more functionality incrementally. Don’t try to shove a monolith into an AWS Lambda function. Remember that this is what they are, small functions aimed at doing small transformations (and not transportations as AWS remembers us), so they should do one thing and do it well. Instead, if you see yourself in a situation where you are turning your Lambda function into an extensive system that does many things, split the service logic into multiple AWS Lambda functions if that is necessary. AWS App Runner AWS Lambda is not a silver bullet.

Chapter 5: Platform Adoption

As it turns out, it might not be the right technology choice for some scenarios due to its limitations in terms of payload size and execution time by design. Also, your software might need to give solid answers to multiple use cases that respond to heterogeneous usage patterns where user requests vary in number, volume, and temporal distribution. It is crucial that as part of the architectural analysis of an application you include some cost analysis and projections based on usage patterns and volumetrics, such as the “Economics of Serverless”56 study elaborated by the BBVA team. This way, you can pick the right managed cloud service from the AWS catalog for each of them. If AWS Lambda does not fit the bill, look at other serverless technologies from AWS’s inventory, such as AWS Fargate or even AWS App Runner, two outstanding solutions to run highly available and auto-scalable applications on the cloud. Being able to switch technologies for the business service at a meager cost of change is, precisely, one of the most remarkable benefits of using serverless computing. By diving deeper into AWS App Runner, we can learn how this is a promisingly effective managed cloud service. In short, it blends the good features of AWS Elastic BeanStalk and AWS Fargate, allowing you to go from an existing container image or even source code directly to a running service automatically in minutes. Released in May 2021, App Runner is an outstanding achievement by AWS that allows developers to implement MVPs and move quickly from idea to working product. Due to its integrated build, deployment, and execution services, AWS App Runner will connect the business with the infrastructure and the infrastructure with the final user by shortening the feedback loops tremendously. And that’s precisely one of the points to analyze in-depth if development teams are assessing this technology to run the business services: AWS App Runner blurs the lines between the software’s build and execution architectures. If you connect AWS App Runner to your source code but already have your own build architecture, AWS App Runner will not use it. Again, as with any

166

Chapter 5: Platform Adoption

167

other abstraction, you will need to consider all the constraints and trade-offs that come with it. In any case, with managed services such as App Runner, AWS invests in giving engineers the necessary capabilities to improve the continuous flow of software development with end-to-end lifecycle features. These functionalities span from source code integration to production monitoring in a managed-services environment. This approach is a very efficient way of connecting the business with the infrastructure in the value chain, from business requirements to working software.

Platform buy-in As mentioned earlier in this book, it would be best if you favor a natural and organic adoption strategy based on on-the-field engineering guidance over managerial mandates. However, your organization’s reality may lead you to other less green pastures. An internal software platform, like any other product, is in constant evolution. Once you embark on this endeavor, you will always be engaged in making it better.

Based on Sparkbox’s57 intelligent maturity model for Design Systems, we can extrapolate two primary adoption paths for internal software platforms. • Bottom-up: This is when a small engineering team starts a shadow initiative to solve a problem for themselves or close collaborators. In this case, platform founders may be working off the radar without funding or executive sponsorship.

Chapter 5: Platform Adoption

168

• Top-down: With this approach, your organization’s leadership team kick-starts the platform initiative with the necessary support and funds. This may result in a large dedicated platform team working under more scrutiny and higher expectations. What’s the best approach? It all depends on your organization’s context. In any case, the adoption path would significantly impact how the platform evolves and matures, especially during the early stages. This way, the buy-in attitudes expressed by users adopting the platform as a managerial mandate would differ from the reactions obtained when the initiative starts from the trenches. However, as the platform matures and the benefits become more apparent for everyone, the difference in user attitudes tends to lessen (Figure 43)

Figure 43. Different platform adoption paths

Chapter 5: Platform Adoption

169

Platform adoption priorities On many occasions, using a platform is a matter of trust, so many developer users will be asking themselves: do I trust the platform code more than I trust mine? Do I prefer to lock in the platform’s architecture rather than mine? As a software leader, you will have to make those developers answer affirmatively to those questions. Regardless of the adoption path, your top priority when starting an internal software platform initiative is to increase user adoption by creating fast feedback loops with the developers. Again, as per Sparkbox’s maturity model: “An increase of users is a measure of success and fuel for your continued growth.” Persuading users to accept change is another critical aspect of adopting an internal software platform. When a new development team starts using the building blocks provided by the platform, they are essentially changing how they work. Although this change will be better for them and your whole software organization in the long term, that would not be evident at first glance. Of course, this change takes time, so it would be best to work with the different organization’s stakeholders to manage expectations on the way to full platform adoption. Finally, try to prioritize valuable improvements to foster platform adoption. This is, essentially, a strategic move so you can ponder the mutual benefits for both the platform and the service development teams. It will help you make a strong business case for adoption regardless of the adoption path you are taking. The challenge, though, is identifying reasonable functionality requests and balancing the tradeoffs between effort and value.

Chapter 5: Platform Adoption

170

Platform adoption attitudes “The word ‘adoption’ as applied to a style of software development has all the wrong implications. Taking on a style of software development does not cover or eliminate your pre-existing problems.” - Kent Beck Getting buy-in from your customers (the developers of the different service teams) won’t be easy. Yes, building software for developers is complex, so the spectrum of users from the service development teams adopting the internal software platform can go from skeptical to ambassadors, with many neutral users sitting in the middle of the range. The visual representation of this adoption pattern could follow a Gaussian distribution, as illustrated in the following picture (Figure 44)

Figure 44. Distribution of types of platform users

Skeptical users: The farmers Skeptical users remain detached from the organization’s mission by strengthening and ringfencing the silos they own to satisfy their thirst for power. These users are usually high-end technologists with a “not

Chapter 5: Platform Adoption

171

invented here” attitude, which is not bad per se. Critical feedback can always serve you well in moving forward, as long as this criticism is positive. They are usually worried about vendor lock-in (even when the vendor is internal), which means they prefer to lock into their architecture rather than yours. Consequently, try to avoid conflict as much as possible, or you will see yourself involved in an organizational battle royale. If this is not possible, try to get yourself into a spot where the conflict could be creative rather than detrimental by remaining very close to this type of user in a kind of technology counselor capacity. Skeptical users put a lot of effort into the platform game, but only for their own benefit to demonstrate that it is an unnecessary tool.

Neutral users: The campers These users encompass a wide range of the adoption spectrum, from those who prefer to wait and see to those pushed to use the platform due to managerial mandates. It is essential to understand what is behind this neutral position. It may be bottling up some frustration from how traditional software was released: you get a first buggy version zero, and then you need to wait months to get the fix through a big-bang update They are typically product engineering middle managers who enjoy a comfortable position and are not adamant in introducing change. This range of users is also known as the frozen middle, and it is the place where most of the product innovations get stuck. Indeed, the most challenging part of the transition to on-demand software is the transformation of the organization. Modern software is an asset that gains value over time. By consuming a software product like the platform that is delivered continuously, developers are not buying software for the initial features of the

Chapter 5: Platform Adoption

172

system but for a projection of the ones they will get transparently, frequently, and at no cost. Making those users understand this can help them gain more trust and confidence in your initial versions of the internal software platform. Neutral users stay vigilant after an internal software platform is introduced in the architecture and only react when they see an immediate benefit.

Ambassadors: The pros These users are developers (and even technology executives) who fully embrace the internal software platform, not necessarily due to its new technology. Instead, they adopt the platform because they understand the economics of industrialization and managed services and how that can enable them to build better software. Part of your job is pushing the normal distribution to be less normal and more leaning to this type of user. You need to carefully cultivate a group of diverse and engaged subscribers who will help you move forward. An essential aspect of working with people sitting at this side of the user spectrum is that you need to be careful with over-reliance on the platform team, especially from a subset of ambassadors who are actually skeptical in disguise. These are development team members that push too many things to the platform without any other reason than off-loading part of their jobs. This way, they can keep themselves in their comfort zones, ignoring whether the requested functionality is a core differentiated competency that brings value for them or not. You could quickly address this situation by bringing Build and Buy premises into the discussions. This will help service development teams realize when it is better to build themselves and when it is better to buy from the internal software platform.

Chapter 5: Platform Adoption

173

Ambassadors are committed professionals who understand what is at stake by introducing an internal software platform as a transformational asset to the organization.

Internal software platforms and the Theory of Constraints Looking at platform adoption from the Theory of Constraints perspective (a concept used multiple times across the software development industry) reveals exciting challenges. Thanks to the rationalization of access to services, the selection of developer experiences, and the heavy-lifting of cloud infrastructure, service development teams are now in the spotlight for building and releasing their software faster to the market and with better quality than ever before. With the introduction of an internal software platform, the identified constraint has been elevated, automated, and shifted to other teams that may not like this notoriety. If their reaction is still over-relying on the platform team to accomplish parts of their unique and core mission, then you will see yourself implicated in a vicious circle. Depending on the dynamics of your organization, the platform could be perceived as a source of trouble, something that is significantly more probable in the absence of sponsorship. As Kent Beck says: “If you don’t have executive sponsorship, be prepared to do a better job yourself without recognition or protection.”

PART II: PRINCIPLES “Engineering tradeoffs and stupidity are indistinguishable if you are unaware of the tradeoffs.” - Eric Elliot Building, testing, and releasing business services for the internal software platform is not trivial. It might be a sophisticated environment made of moving parts that are in constant evolution. Remember that developers are the primary users of the platform, and you will want to ensure that they not only adopt it but are also productive and happy when doing so. That’s why it is so important to curate a good experience for the development teams. You must empower all developers from both platform and business services teams and make them participants in technical architecture decisions. They are first-class problem solvers, and as such, they should be aware of the costs associated with each technical decision, no matter how big or small they are. This book captures a comprehensive list of basic, nonsense-free software development principles and practices to help your organization transition to SaaS and spin up high-performance teams. These principles lean heavily on Extreme Programming (XP) ideas, and their essence can be captured using the acronym LESS (Figure 45)

175

PART II: PRINCIPLES

Figure 45. Platforms are LESS

That’s right. Internal software platforms are LESS: • They are deliberately light as they fully embrace the underlying technology exposing all its technical details to their users. Platforms create just enough encapsulations that add value, not subtract it. • They enable developers to create domain-specific business logic thanks to the industrialization of cross-cutting functionalities and the heavy-lifting of undifferentiated work. Platforms are the Ship of Tomorrow that help developers navigate an ocean of engineering responsibilities. • They are based on the serviceful precepts of modern software, and they are composed of highly cohesive and loosely coupled services. • They are simple and tailored to the real needs of every software organization’s developers. They start from the ground up, providing valuable and simple abstractions, and they evolve from there by harvesting common patterns into reusable services. This north star is showing the way to building quality software. However, please don’t take these principles as technical absolutisms (actually, that’s a meta-principle!) as they are meant to act as a

PART II: PRINCIPLES

176

guardrail. Therefore, do not feel obliged to implement all the practices and rules described ahead verbatim but instead, feel free to pick only the ones that resonate with you and your context and jettison the others. The main goal of the following chapters is to provide answers to software engineering and architecture leaders who, like you, are projecting to spin up an internal software platform as a catalyst to modern, on-demand software transition. After you go through all of them, you will end up having lots of questions, and that’s a good thing. It is well understood that every technical decision depends on the context and should be driven by a profound tradeoff analysis. Look at these principles as a decision framework that helps you get back on track when you get the sense that you and the different development teams are going offpiste.

Chapter 6: Technical Architecture Principles The technical architecture of your internal software platform must be simple, trustworthy, and scalable. Developers must find value in using the platform and enjoy the experience while doing it. This chapter outlines a few basic and general architectural principles to take into account for platform design.

Favor serviceful platforms over monoliths “Every line of code represents an ethical and moral decision.” - Grady Booch As promoted by modern programming58 and architectural59 approaches, software products (and internal software platforms are no exception to that) are composed of small functions, into components, into services, and finally, into systems. Using this paradigm, internal software platforms are at the system level. Taking a serviceful approach does not mean you should compose the platform as a collection of tiny independently deployable microservices and integrate them back through synchronous connections over the wire (i.e., The Distributed Monolith). Instead, make sure you purposefully design and identify the proper granularity for the crosscutting services while embracing the principles of loosely coupled architectures. These principles include crucial software design aspects such as decoupled codebases, independent deployments, incremental updates, and autonomous teams.

Chapter 6: Technical Architecture Principles

178

Every service deployed and running with the internal software platform should be reachable through two types of APIs, preferably adhering to REST principles: • Firstly, a functional API gives consumers access to the business logic and data. • And secondly, a configuration API allows business users to configure specific service properties that affect their core behavior. These APIs can then be implemented either as containers or functions on a cloud platform. In any case, let’s remember that you shouldn’t see these functions as full-blown backends in our preferred serverless context. Instead, they are small components that implement a subset of the service’s business logic.

Backward compatibility Remember that the primary user of a platform service is a developer writing code to solve a business problem. These developers will want to access platform services through their APIs since they may depend on them to complete the business logic they are creating (i.e., they need some data or execute some cross-cutting functionality the platform provides). Therefore, to help developers with efficacy, you need to give them stable platform APIs. Web interfaces are to the user experience what APIs are to the developer experience. Unluckily for you, building user experiences for developers is way more complicated than for final customers, and there are numerous questions you need to answer along the way: • How do you practice the purist Agile mantra (ship fast, fail fast) when building a platform for developers? • What if you shipped an MVP of the API, and you find out that you have failed miserably, but you have hundreds/thousands of developers already using it?

Chapter 6: Technical Architecture Principles

179

• Is it possible at all to have backward compatibility and avoid deprecating APIs? • Is it possible to do that while keeping things simple for new learners while being consistent for existing developers? • What if you need to change your service API and introduce breaking changes? Well, you don’t. API breaking changes are a last resort. Hence, it would be beneficial to use design mechanisms to cater to backward compatibilities, such as the expand-and-contract technique60 . This pattern helps implement breaking changes to an interface by splitting the change into three distinct phases: expand, migrate, and contract. • The idea is to expand the attack surface of an API by adding to its interface the new elements you want to introduce, such as new resources or parameters. • Along with proper documentation, this allows developers to familiarize themselves with the new options for a grace period during which you leave the old elements in the interface but marked as deprecated. • After this time, typically when a critical mass of users has shifted to the new parts of the interface, you retire the old elements from the interfaces, thus giving birth to a new version of the API. Be careful with deprecating the old, failed stuff. If you do this, you will put your reputation as an internal service provider at risk. This is, precisely, one of the most annoying characteristics of the Google APIs as expressed by Steve Yegge in the blog post “Dear Google Cloud: Your Deprecation Policy is Killing You”61 .

Chapter 6: Technical Architecture Principles

180

AWS examples AWS cloud is an excellent and representative example of a distributed, serviceful platform that avoids monolithic designs and enables its services to evolve independently. This way, they can gradually provide their customers with the right features at the right time. For example, you may not have heard about a big announcement of a full-blown AWS v2 release. Still, you have noticed how new versions of the services are being upgraded individually and transparently. Amazon API Gateway AWS introduced API Gateway in 2015 as a managed service for developers to build and maintain REST APIs at scale. It served well to engineers for years, allowing them to design REST resources and HTTP interactions for their APIs. Then, AWS realized that they made a mistake: allowing fine-grained REST resources and interactions were too complex. Engineers were using the API Gateway more simplistically and, in many cases, just using the proxy resource for many of their integrations, which clearly indicated a usage pattern. How did AWS react to this convention introduced by the customers? They did it by listening to them. Instead of sunsetting REST APIs and infuriating thousands of existing users, they introduced HTTP APIs, a new variant that currently co-lives with the former REST API service. Today, they still maintain both, and there is no signal from AWS indicating they may force engineers to migrate from one to the other. This approach is very similar to Basecamp’s *Until the end of the Internet*62 ) policy. That is remarkable but, isn’t introducing new services like that confusing to the developers? Yes, very much. It is problematic, especially in cellular organizations that allow (and promote) the proliferation of services that, on many occa-

Chapter 6: Technical Architecture Principles

sions, even compete between them. So how does AWS solve this? Some people may argue that it would be easier to let new accounts have access only to the new services in this particular case. But why? Firstly, the old service is perfectly usable. And secondly, having a different set of services per account would confuse the users and mean a total waste of the economies of scale for the AWS engineers supporting those services. Instead, AWS makes sure that when you start configuring a service, all the necessary configurations default to the new features that the provider wants you to use. This is how they do the handholding toward the latest versions of the services, and it’s an ingenious solution to alleviate the perceived complexity of introducing changes.

Amazon SimpleDB This service provides NoSQL managed storage to developers. Released in December 2007, Amazon SimpleDB is one of the oldest AWS managed services and, at the same time, a very illustrative display of AWS’s strategy to building platforms. What is curious about Amazon SimpleDB is that it has never been retired, despite being set aside by other more scalable and reliable services such as AWS DynamoDB (released in 2012). For sure, its internal implementation has changed over the years, but the service has not been deprecated, and its API has not suffered breaking changes. It means that those pieces of software that still rely on Amazon SimpleDB’s very early versions can still function without glitches, or at least without those related to the integration with this managed service. If you are using AWS, you probably don’t realize that they introduce new features, APIs, SDKs, and versions of the services almost every day with nearly zero disruption to you. Also, there are virtually no migration notices or grace periods announced due to service sunsetting. How do other IaaS, PaaS, or SaaS providers manage this? For example, Stripe is taking a very similar approach to managing service versions. When they realize that an API

181

Chapter 6: Technical Architecture Principles

182

is no longer usable and have to fix it with breaking changes, they introduce a new version, so they don’t break your stuff63 . In the spirit of providing developers with a superior experience, backward compatibility is also a mandate. As mentioned earlier, deprecating your service or introducing breaking changes should be your last resort.

Favor iterations over big up-front designs “But in practice master plans fail - because they create totalitarian order, not organic order. They are too rigid; they cannot easily adapt to the natural and unpredictable changes that inevitably arise in the life of a community.” Christopher Alexander Start building small, simple pieces that you can evolve independently to produce business value from day one. In other words, do the simplest thing that solves your problem, and don’t try to predict future requirements, especially if they are not well defined. This will help foster the adoption of the internal software platform. The minimum viable versions of your platform services do not even have to be shaped as technical components. One of the most compelling concepts introduced by the Team Topologies book is the idea of the Thinnest Viable Platform (TVP), which is nothing but the minimal expression of a curated experience that offers value to developers. Internal software platforms can be found in many forms out there, from Markdown pages outlining a list of recommended managed cloud services and their configurations to the most complex standard

Chapter 6: Technical Architecture Principles

183

infrastructure topologies full of reusable components. We can find different shapes and flavors of platforms between these two sides of the same spectrum, always offering developers a curated experience to help accelerate software development. Starting with simple and centered services for the internal software platform will help you attack the complexity of your organization’s technology estate from the ground up and iterate incrementally from there. For example, providing a shared CI/CD pipeline library with common steps for building quality software can help with the automation problems of the different development teams. A small solution like this may not even require a compelling business case to be approved by your senior management, while it brings enormous value to the development teams from day one. Perfect is a verb, not a noun.

Solving real problems Starting small does not mean that you should not have a vision of the outcomes you want to achieve in the long term with your platform. The caution point here is how aggressively you are pushing that vision to the development teams by creating radical features that are too sophisticated for them. We all witnessed examples of very cutting-edge technology that found a lot of resistance and lack of acceptance only because they did not consider the community’s zeitgeist when they were released. This is the case of Google App Engine, which introduced serverless and utility runtime concepts in 2008, almost three years before Amazon Elastic Beanstalk was released (2011) and six years earlier than AWS Lambda (2014). But Google didn’t gain the mainstream acceptance that its competitors did.

Chapter 6: Technical Architecture Principles

184

You will have to put yourself in the developers’ shoes to solve real problems for them. Look at where they are and think about how you can help. Is it through better environment provisioning? Is it through the introduction of basic CI/CD pipelines? Is it through implementing infrastructure as code? Assess what developers need, offer them a curated experience, and move together with them toward your vision. Also, do you want to introduce managed cloud services and especially serverless computing? Look at what kind of applications and databases the business services are using. Can they be stateless? Can they be event-driven? Do operations teams struggle with infrastructure dimensioning and scalability? Again, identify quick wins, help them solve real problems with guidelines, tools, and small reusable components, and gradually take them where you want them to be in a few years from now.

Handholding with AWS AWS masters this principle exemplarily. Its cellular organization allows the different service teams to develop specialized managed services that solve customers’ real problems from day one without being condescending about how developers should use the technology. They excel at launching new services with features that meet customers exactly where they are so that they can iterate incrementally over time, helping those customers walk the evolution path seamlessly. Over the last 15 to 16 years, we have witnessed AWS do this handholding multiple times: • AWS Elastic File System (EFS) is Amazon’s NFS-based cloud storage service, and it was first launched in June 2016 with a minimal set of features and integration possibilities. For example, data encryption at rest was not included until one year later (August 2017).

Chapter 6: Technical Architecture Principles

185

Also, AWS introduced integration with Lambda and applicationspecific access points in shared datasets only in the summer of 2020, which allowed the service to become the robust, enterpriseclass storage system it is today. • The feature changes that AWS Lambda experienced during the past years represent a clear depiction of AWS’s iterative approach. As stated in his article “AWS Lambda is winning, but first it had to die”64 , Forrest Brazeal describes how the service looks a lot different than when it went GA in 2015, in a very particular way. As part of this evolution, AWS had to detach itself from the most purist (and somewhat radical) view of serverless and start adding features to the Lambda service that helped foster its adoption, even at the expense of conflicting with their Serverless Manifesto. Despite these changes, AWS Lambda is still a highly configurable service. As such, AWS assists customers in advancing toward the most purist vision of serverless by factoring in default configuration values to push this vision forward. These are outstanding examples of two services that evolved and matured into flagship managed services used by thousands of developers daily across the globe. And they did that while starting from totally different points in the spectrum of features: one was too basic (AWS EFS), and the other was too radical (AWS Lambda). AWS Step Functions Amongst all the managed cloud services offered by AWS, one deserves special attention because of its continuous upgrade over the years and the great potential it still holds. AWS Step Functions is a serverless orchestration service that lets you build business applications by combining different AWS managed services into an event-driven workflow. When AWS released Step Functions in 2016, it included very basic state

Chapter 6: Technical Architecture Principles

machine functionalities to run simple activities. Also, it featured only a couple of integrations with other AWS services, allowing you to access workflows through APIs exposed in the Amazon API Gateway and execute flow tasks as AWS Lambda functions. To make all this possible, Amazon specifically created a JSONbased Domain-Specific Language (DSL) for this managed service called Amazon States Language (ASL) that allows users (primarily engineers) to define these workflows programmatically. During the first couple of years, AWS focused on fostering the adoption of this new service by creating and updating lots of documentation (adding samples, references, and code snippets) and especially by gradually rolling out the service across all the AWS regions worldwide. And then, the year 2019 arrived. AWS started to add new exciting and pivotal features to support the types of business applications built with Step Functions. These are a few of the most important features added during the last couple of years: • Nested workflows allowed developers to break workflows into smaller state machines and to start executions of these other state machines (August 2019) • Express Workflows were added as a new workflow type, suitable for high-volume events processing workloads such as IoT data ingestion, streaming data processing and transformation, and mobile application backends (December 2019) • AWS Step Functions resources were added to the AWS Serverless Application Model, making it easier to include workflow orchestration into serverless business services (May 2020) • The Amazon States Language was updated, adding choice rules and intrinsic functions as basic operations without having task states. This enhancement added a lot to the ex-

186

Chapter 6: Technical Architecture Principles





• •



187

pressivity of the language and opened the door to implement richer workflows (August 2020) AWS Step Functions integrated with Amazon API Gateway, making it possible to call API Gateway APIs from state machine tasks (November 2020) Introduced Synchronous Express Workflows giving engineers an easy way to orchestrate microservices (November 2020) AWS Step Functions integrated with Amazon EventBridge (May 2021) AWS Step Functions added a visual workflow designer, the AWS Step Functions Workflow Studio, allowing engineers to create workflows using a drag-and-drop visual tool (June 2021) AWS Step Functions integrated with all AWS managed cloud services, allowing developers to call any AWS API from a workflow (September 2021)

Looking at the new features added during late 2020 and early 2021, we can observe how AWS is investing heavily in this managed cloud service. As engineers keep building new business services and making them reachable through APIs or events, AWS Step Functions helps in accelerating the creation of higher-order business services created on top of the existing ones by combining them in a workflow.

Evolving business service development with AWS To cope with evolution, AWS Step Functions could grow up in the value chain and add more and more features during the coming years to give developers better capabilities to build those new higher-order business services. As these abstractions continue to emerge and stack on top of each other, AWS could even make Step Functions expand to the point

Chapter 6: Technical Architecture Principles

188

where it scratches the business-specific overlay. This way, it may become a business development platform, something that the provider already did with the introduction of Amazon Nimble Studio, an end-to-end solution to empower creative studios to accelerate visual content creation in the cloud. All these usage patterns lead to a world where developers (and other types of users) could quickly build entire business processes by creating workflows that combine different serverless business services from a marketplace. This idea is represented in the following Wardley map (Figure 46)

Figure 46. AWS Step Functions enabling the emergence of higher-order business services

Favor asynchronous integrations over synchronous ones “Everything fails all the time.” - Werner Vogels Generally speaking, people’s minds are wired up to work synchronously. We expect immediate feedback (i.e., visual, auditive,

Chapter 6: Technical Architecture Principles

189

haptic) right after we take action. We tend to push this way of thinking when designing integration architectures, expecting services to interact synchronously in a block-and-await manner. It is hard for us to design callbacks, promises, events, and notifications as a default integration mechanism, but they are compelling. Try to develop platform services to be consumed asynchronously through their APIs and write your business logic to react to event subscriptions when they happen. An asynchronously integrated architecture is more solid, pluggable, performant, and scalable. It may indeed be more challenging to debug, but here is where the next principle in this chapter may be able to help you. As stated in Forrester’s report Use Event-Driven Architecture In Your Quest For Modern Application published in April 2021, “the growth of distributed application architecture hits a brick wall when only using synchronous APIs for integration due to the fragility and scalability limitations.” The report explains how to “best employ event-driven architecture (EDA) to address the complexities of modern distributed application architectures,” presenting a few valuable patterns to avoid painful pitfalls. With this context, the opportunity for EDA is that it does not replace synchronous call-and-response architectures. Instead, it complements them for greater agility and resilience. It removes the inherent challenges of working with multiple bounded contexts by promoting choreography over orchestration as an integration mechanism based on events for data interchange across the different services.

AWS internal asynchronous architecture AWS internal infrastructure is heavily event-driven, as we could observe with the Kinesis issue in November 202065 that caused severe degradation in the availability and response times of many other managed services in the Northern Virginia region, such as Amazon Cognito and Amazon Cloudwatch. This hiccup in AWS’s quality of service tells us that asynchronous integration (in its many levels and

Chapter 6: Technical Architecture Principles

190

forms) is a production-ready architectural style used by world-class distributed platforms. To that end, AWS offers us a few managed services that cover the whole spectrum of the developer needs for the construction of eventdriven applications: • Basic pub/sub management with Amazon SQS and Amazon SNS • Message Broking with Amazon MQ • Event Broking with Amazon EventBridge • Event Streaming With Amazon Kinesis and Amazon MSK (managed Kafka)

Amazon Quantum Ledger Database (QLDB) In addition to these well-known services, AWS introduced Amazon QLDB in late 2019, a new service that provides centralized ledger functionalities in a sort of SQL-like, append-only, and cryptographically secure database. It would have been irrelevant to the topic of asynchronous integrations if it wasn’t because of the native event sourcing capabilities added to Amazon QLDB. The built-in streaming capabilities of this AWS service allow for a real-time flow of data stored within Amazon QLDB, which enables developers to build event-driven systems. This way, Amazon QLDB captures the changes made to the immutable journal and sends them in real-time to a destination such as Amazon Kinesis. Other business services can subscribe to any data update event emitted by the original data store and replicate that data as part of their materialized versions. This feature makes Amazon QLDB an excellent service to act as the main storage technology for business services in an event-driven architecture. Additionally, QLDB is also serverless, which means that it automatically scales

Chapter 6: Technical Architecture Principles

191

to support the demands of your on-demand software.

Favor elimination over re-engineering “Possibly, the most common error of a smart engineer is to optimize the thing that should not exist.” - Elon Musk It is incredibly humbling and encouraging at the same time to watch Elon Musk espousing the principles of agile engineering in this Youtube video where he hosts a tour around the Starbase factory66 facilities in Texas in preparation for the first orbital launch of their flagship rocket. When asked if a particular feature of the spacecraft would be permanent, Elon answered with a very elaborated discourse that turned out to be a masterclass on building and delivering rapid innovations to the world, outlining what he calls a “five-step process.” Among those steps is the necessity for engineers to “simplify and optimize the design” with a focus on eliminating problems that you don’t have. At some point in his intervention, Elon even talks about the concept of Minimum Viable Rockets. This construct resonates well with the fundamentals advocated by the agile software development community, where software engineers are inspired to become autonomous problem solvers.

Problem restatement Sometimes software engineers tend to grow into problem seekers in a compulsive pursuit of unnecessary creativity. This is a problem that manifests itself in two forms:

Chapter 6: Technical Architecture Principles

192

• Technical problems: This is notably problematic in specialties such as UI development, where a great deal of the engineering efforts consists of selecting a mixture of frameworks and gluing them together even for building the most straightforward static Web page. Or even in serverless environments, where the most common mistake is to not fully leverage the standards-based cloud platform by including unnecessary heavy middleware frameworks in the solution. • Functional problems: Feature creeping is a real problem suffered by many software organizations. It is derived from the wrong assumption that a better product can be created by including more features. Most of the time, this is just the opposite, as an excessive amount of functionalities may upset the user, who in turn stops using the product as they find it too complicated to work with. Then, in practice, how does one know whether they face a made-up problem that must be removed or a real problem that must be solved? To answer this question, the “Less Software”67 principle at Basecamp proposes a solution called Problem Restatement. The proposition is to fight complexity with fewer features and less code, as described in the following excerpt: “The key is to restate any hard problem that requires a lot of software into a simple problem that requires much less. You may not be solving exactly the same problem, but that’s alright. Solving 80% of the original problem for 20% of the effort is a major win. The original problem is almost never so bad that it’s worth five times the effort to solve it.” Questioning the problem space with critical eyes does not come naturally to everyone and may require intentionality to put into practice. Indeed, holding your thoughts and impulses before jumping into an expeditious solution requires mastery.

Chapter 6: Technical Architecture Principles

193

Problem restatement helps in bypassing problems, which is way better than solving them.

Holding on to the doubt Again, there is a reference in the aerospace industry. There is an approach to problem-solving that Adam Steltzner (the engineer at the Jet Propulsion Laboratory who led the entry, descent, and landing of the Curiosity Mars rover) writes about in his book The Right Kind of Crazy and introduces as “holding on to the doubt.” This mantra pivots around the idea of avoiding fear-based decisionmaking (which leads to hasty answers to the problems) in favor of curiosity-based decision-making. This approach, Steltzner says, leverages the substance of your attributes as an engineer to tackle the root of the question and helps you come up with much better innovations. Indeed, aerospace manufacturing and space exploration seem to nurture the software engineering discipline with endless references and lessons about confronting complexity at scale. These ideas are crucial for internal software platform teams, where engineers are in charge of developing cross-cutting functionalities and enabling new value streams with other development teams. Internal software platforms are force multipliers, so focusing on solving the wrong problem may incur a negative cascade effect that spans the whole product portfolio, thus increasing overall waste and skyrocketing development costs. In essence, the elementary assumption is that platform engineers are given problems to solve by other developers, not just tickets to fix by project managers. Their responsibility will be to figure out how to solve those problems, either with a new platform service or with any other type of non-software-based artifacts, such as documented principles or references. Otherwise, the benefits of having an internal software platform will dilute compared to more traditional operations teams, which is something you are running away from.

Chapter 6: Technical Architecture Principles

Innovation at AWS AWS leans on its Customer Obsession principle to favor elimination over re-engineering. This statement may be paradoxical, especially for a company that has built and delivered more than 200 services and endorses duplication (technical and organizational) in the open like nobody else. Precisely, it is that customer obsession that makes them operate like a very effective platform. They always start with the customer and work backward, which means that they never build services if they don’t have to, only for the pleasure of playing with cool technology. The most significant expression of elimination is to avoid working on a problem that does not exist. Translated into AWS terms, it means that every piece of software that supports their cloud platform or any managed cloud service released to the users is seen as a vital and purposeful component to meet customer needs. As outlined by this AWS article68 , 90% of their product launches are derived from customer feedback. Only a tiny 10% respond to purely technology and industry trends aimed to improve internal efficiency. Andy Jassy, the former CEO of AWS, reportedly puts this very eloquently in the article above: “To me, the most important thing by far is not to be focused on what competitive dynamics are, but to listen carefully to what your customers say matters. And then to have the ability to unleash your builders to build that for customers. And to be iterating constantly, and experimenting constantly, and evolving the customer experience. That’s the only way any of us who are building businesses right now have a chance to build a business that stands the test of time. And no platform gives builders the ability to evolve their customer experience, experiment, and innovate on behalf of their customers like AWS.”

194

Chapter 6: Technical Architecture Principles

195

As described in the article “The Imperatives of Customer-Centric Innovation”69 by Daniel Slater, Culture of Innovation Worldwide Lead at AWS, the cloud provider’s secret to keeping the innovation levels sustainable high over the past years is to focus on durable customer needs. These needs include crucial things to businesses, such as performance, security, breadth, depth of features, and cost performance of managed cloud services. As mentioned above, 90% of what AWS builds is driven by customers telling them what matters, not what to do. The difference is subtle, but it makes a difference. The other 10% of AWS’s innovations come from internal teams developing a vision after remaining close to the customers but not articulated as a direct need.

Favor re-engineering over multiplying “Playing football is very simple, but playing simple football is the hardest thing there is.” - Johan Cruyff Before thinking about adding a new system, service, or function to the platform architecture, consider if you can put new ideas in quarantine until you fix the existing componentry. Service developers (your platform users) will appreciate consistency and stability over a stream of new features they can’t cope with. Technical teams tend to over-engineer their solutions and produce new components and features beyond the limits of what they can operate and monitor later. You will have to put a close eye on this when building distributed systems like an internal software platform because building software is very simple, but building simple software is the hardest thing there is. In a system like a software platform, made up of multiple parts in continuous evolution and used by an indeterminate number of other

Chapter 6: Technical Architecture Principles

196

systems and final users, you will have to apply refactoring techniques continuously to preserve the desired simplicity. Modern software development leans on these techniques to reduce technical debt, make it easier to add new features and soften the system’s learning curve for core maintainers and contributors. This fundamental simplicity requires intentionality. It is not achieved by eliminating engineering processes and just focusing on coding but instead by actively putting continuous careful thought and attention to keep the entropy levels of the system under control. Your software is as strong as the weakest of its features.

On many occasions, multiplying is not only a matter of adding features but also an opportunity to throw extra bodies into a project or product development with the vain hope of meeting a deadline at risk. Without entering in too many details and keeping it simple for the sake of argument, this is typically based on the misconception that the relationship between the number of engineers working together and their productivity is linear. Spoiler alert: it is not. It turns out that the overhead introduced by the extra coordination and communication between developers and the ramp-up needed for the new joiners is often underestimated. The result is that the expected gains in productivity never happen, as these overheads usually put in additional delays and extra costs.

Automate everything What do good opportunities for re-engineering look like? How can we keep our software more uncomplicated, stable, reliable, and secure without provoking an unmanageable stream of new features? One possible answer is to automate more.

Chapter 6: Technical Architecture Principles

197

As it turns out, automation is precisely one of the core parts of our ideal internal software platform and the starting point for many software organizations looking for solutions to help their developers build and ship quality software. Do you want predictable builds? Automate with a push pipeline. Do you desperately need faster deployments? Integrate your infrastructure-as-code scripts into an automated deploy pipeline. Do you need to ship new versions faster? Automate your test execution and help bundled artifacts promote through an established lifecycle of environments. Do you need to enforce cybersecurity rules? Write up policy-as-code definitions and integrate them in the automated build process. And so on, and so on. To find these opportunities, look at your software development process and spot the bottlenecks by acknowledging where the work in progress is piling up. This bottleneck usually becomes an overall constraint for the whole software production line, so automating it will be a good starting point from where you can evolve iteratively. As you will notice, once you automate one bottleneck, another one will pop up spontaneously in another part of the process. This effect is known as the Theory of Constraints, a management paradigm used in the manufacturing industry that was brilliantly introduced and explained in The Goal by Eliyahu Goldratt in 1984. This classic book also inspired other more IT-oriented, modern versions such as The Phoenix Project and The Unicorn Project. There is always an opportunity to re-engineer and automate a part (or a process) in the architecture that developers did manually.

Remove technical bebt Technical debt can manifest itself in many forms in software development.

Chapter 6: Technical Architecture Principles

198

For example, it could be that you are using a deprecated framework, you are locked into a past design decision that wasn’t flexible enough, or you have decided to build a custom service in an area where a new utility has emerged. And that last example is precisely what this section is about. As explained during the first chapter of this book, you would probably want to build those unique things that differentiate your organization from the competition to enable revenue streams. Conversely, you would only buy commoditized building blocks that do not directly drive business value creation. The real challenge is to cope with the pace at which those new utilities emerge in today’s competitive managed cloud services market. Re-engineering parts of the internal software platform (and in general all types of services that compose your on-demand software) to keep it current with the latest managed services is of utmost importance. Tackling technical debt while dealing with evolution is part of your job as a software engineering and architecture leader. You can’t ignore the fact that whatever you have designed and built won’t stay there forever and will eventually have to be replaced by a managed service. And this is not necessarily a bad thing. On the contrary, it will keep you on your toes. The introduction of new commodities will enable the co-evolution of other software development practices and new technologies that you will have to evaluate to keep helping developers do an effective and efficient job. By selecting emerging innovations and adding them to the internal software platform, you will be able to offer developers a new set of curated experiences so they can cope with the evolution themselves.

Chapter 6: Technical Architecture Principles

199

Improve performance Performance is a user experience concern. Keeping it at high levels is one of the most critical technical challenges in a distributed serviceful architecture like the one proposed in this book, mainly due to the numerous integrations (synchronous and asynchronous) between the different components. The fallacies of distributed computing70 are a set of assertions that warn us about a few wrong assumptions made by software engineers when building distributed systems. These fallacies are: 1. 2. 3. 4. 5. 6. 7. 8.

The network is reliable Latency is zero Bandwidth is infinite The network is secure Topology doesn’t change There is one administrator Transport cost is zero The network is homogeneous

Unfortunately, these fallacies exist because their assumptions are often made, and analyzing their impact on performance is not considered a job zero but an afterthought. And that’s where re-engineering comes into play. Cleaning up code, organizing data, optimizing interfaces between services, moving features, or configuring provisioned concurrency for some managed cloud services are good practices that directly and positively impact performance. If you can’t measure performance, you can’t improve it. Integrating the proper profiling tools with the internal software platform will help developers get instant feedback about the performance levels of the services they are building and keep them under control.

Chapter 6: Technical Architecture Principles

Evolving with AWS managed services Managed cloud services, and especially serverless computing, are the greatest expression of automation. Imagine for a moment that you could have infinite resources to improve your CI/CD pipelines and SDLC so you can move from idea to live product almost in zero seconds at zero cost. That is what managed cloud services do for you. It does make sense then to put the time you saved from not building those computing features yourself into something of greater value. These could be things that drive value, such as writing business logic or refactoring some of the existing services to use serverless utilities instead of custom-built components. And that’s excellent value for money because by re-engineering the existing componentry, you are tackling technical debt at its core and helping your organization move forward in terms of evolution. At this point, you need to be careful with workload orchestration frameworks such as Kubernetes, as these are typically aimed at dealing with technical debt, not addressing it, so you are not evolving. What do good re-engineering opportunities look like? It all depends on your context, but as a rule of thumb, you can lean on the following examples: • Remove heavy middleware frameworks (e.g., Express for NodeJS or even Spring Boot for Java) and fully leverage AWS Lambda runtime. The combination of Amazon API Gateway and AWS Lambda already gives the same functionalities for you out of the box. • Move your Nginx reverse proxy functionalities to the Amazon API Gateway. • Migrate your MongoDB clusters to Amazon DocumentDB or even Amazon DynamoDB.

200

Chapter 6: Technical Architecture Principles

• Migrate your message broking solution based on RabbitMQ to Amazon MQ or even Amazon SNS/SQS. • Think about moving your jBPM processes to AWS Step Functions. • Move your service local storage to an Amazon S3 bucket. Not everything can be migrated to serverless computing, although AWS can still help with other types of managed services. As an example, insurers’ de facto integration and communication pattern between organizations is the exchange of GPG-encrypted files over SFTP. Could a solution based on managed cloud services such as AWS S3 be used here, allowing for new functional and non-functional features? For sure, it could. But if you try to push insurers to use unprecedented technology for them, such as the AWS S3 API for file exchange, you will be getting a thousand questions from their legal department that you will not be able to back off. Then, instead of adding a new technology that enables a handful of new capabilities to your system, a sound solution would be reusing the existing SFTP server and gradually migrating to AWS Transfer Family. This effort will preserve the same standards while reducing the technical debt and decreasing the costs of administration.

Favor duplicity over hasty abstractions “The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise” - Edsger Dijkstra

201

Chapter 6: Technical Architecture Principles

202

As mentioned earlier in this book, it would be best to favor harvesting frameworks over foundation frameworks. The main reason is that, on many occasions, duplication is far cheaper than the wrong abstraction. You would prefer to let common patterns spring and then harvest them into reusable components rather than jumping into generic frameworks when building an internal software platform.

Let it spring and then harvest There is a tendency (and risk) to solve every possible problem with that foundation framework71 . The problem is that it gradually turns into your whole system by piling up new functionalities that add coverage to new edge cases. This approach results in an unmanageable piece of software with thousands of conditional statements to cater to those potential scenarios. There is an exciting technique that can shed some light on this challenge. It is known as desired paths, and it has its roots in urban architecture. City planners use it in many countries to spontaneously let paths and sidewalks emerge in parks, campuses, and cities. Instead of taking hard assumptions to interconnect different stops, they use some unspoiled ground to let people find their natural way of going from point A to point B. After the use uncovers those paths, they pave and supply them with services (e.g., light, bins, etc.) to help people walk the lines more safely and comfortably. This is one of the conclusions that emanate from Christopher Alexander’s book A Pattern Language and is documented as a pattern called “Paths and Goals”72 . In this pattern, the author observes how letting natural processes emerge will guide you to more human-centric designs (Figure 47)

Chapter 6: Technical Architecture Principles

203

Figure 47. Christopher Alexander’s Paths and Goals pattern. (Credits: Dan Palmer)

Take this principle from the function development level up to the internal software platform design. With this in mind, try to avoid significant upfront platform-centric developments as much as possible without understanding the actual use cases of how developers would use them. Instead, try to favor a more business service-centric approach by default, spot the commonalities, and harvest73 the necessary parts as new core platform abstractions. Remember, just enough encapsulations. A few examples of business functionalities that are usually repeated across services and are good candidates for harvesting into the platform include selecting workflow and business rules engines. It does not mean that the platform functionality should step into the business domain and provide these components out of the box. Instead, it should provide a common developer experience for those functionalities that many service development teams have in common. Then, that desired developer experience can come in the form of guidelines, libraries, or even reusable infrastructure components so that service development teams can build their business overlay with the platform.

Chapter 6: Technical Architecture Principles

204

Identifying platform core services There is usually a point of friction when platform and services teams start to interact and work together. Taking it to the extreme, both teams can draw rigid boundaries and claim that developing a particular functionality is the other’s responsibility. Stay vigilant about these situations and try to analyze why a specific team refuses to create some functionality. If the responsibility seems obvious (based on the guidelines outlined above), then it would be easier to resolve, but if it doesn’t, ask the teams what the challenges are. If you follow the rules of the five whys documented by Kent Beck in the book Extreme Programming Explained (second edition), you may arrive at the core of the problem, usually a people problem. Some experts may defend that any abstraction that helps in reducing the undifferentiated work in a service team, even if it is one team, is an excellent candidate to be absorbed by the internal software platform. In any case, you need to remain skeptical when an abstraction is pushed within the boundaries of the platform team by the services teams. Does this abstraction help? How many teams does it help? Is it a helpful curated experience? In general terms, it does not seem like a good idea but, as usual, it depends. Suppose that one service team is a special team that builds vital features consumed by many others (e.g., a premium calculation engine for insurance). In that case, helping that team with more dedicated abstractions with the platform may help, as the value will bubble up to others in the value chain. You don’t want the platform to be a collection of individual services where each one serves a different team.

References over templates This idea can be applied to tools and manuals (such as project templates, code-generators, or scaffolds) that get old the minute they

Chapter 6: Technical Architecture Principles

205

are released. Instead, you would prefer to identify a set of exemplary source-code projects that can be used as a reference for new developments, even if that means copy-pasting the project structure into a new project. However, we favor duplicity over hasty abstractions, so if it is repeated too many times, time to harvest it into a shared tool. As a result, it would be best to favor documented principles over guidelines, especially when the size of the teams starts to increase considerably. Big fat documentation manuals tend to be useless and time-consuming both for the author and the reader. They get old the minute after they are released, especially in a rapidly changing environment like an internal software platform, driven by research, innovation, and continuous refactoring. That’s the reason why you would instead produce and read principles and standards, as they are more strategical, more stable, not subject to frequent changes, and based on the long-term vision. And that’s the reason you are reading this book.

Duplicity in AWS services AWS exercises this principle to the extreme, almost excessively, pushing the levels of duplicity to limits that sometimes may even result in overwhelming choices to the users. Of course, this is a natural consequence of the cellular, servicesoriented organization at Amazon, which favors internal competition between teams to develop solutions for their customers. And they have a lot of them. This duplicity does not manifest itself by having multiple services that do the same thing in the same way. Instead, users are provided with different paths for getting to the same outcome. For example, AWS offers various forms of deploying applications running on containers, as Corey Quinn (Chief Cloud Economist at the DuckBill group) enumerates in his article “The 17 Ways to Run Containers on AWS”74 .

Chapter 6: Technical Architecture Principles

This is a problem that plays against AWS. While their industrialization apparatus keeps creating infrastructure utilities to help you write your best business logic, the time costs of figuring out the most optimal cloud service for your application might outweigh the benefits of the managed complexity. And this is not negligible. It can be a considerable challenge, especially for entry-level engineers who might find this ocean of options very annoying and step back from writing their applications on AWS. Following Christopher Alexander’s theory of spaces introduced earlier in this book, this extreme duplicity of competing services can be visualized as if AWS created multiple layers of buildings, with some stepping over the others. Instead of creating a simple campus layout, the result is similar to a twisted city in Doctor Strange’s mirror dimension. Consequently, the space left by these building blocks is confusing and not evident to the users who are unsure if they are making the right design decisions.

AWS’s Developer Experience Running a company that operates as a collection of independent startups leads to inconsistencies that may severely injure the developer experience. On many occasions, developers might feel they are not operating on a fully integrated platform because AWS APIs are inconsistent among different services (or even within a single service). As Luc van Donkersgoed mentions in the article “How AWS dumps the mental burden of inconsistent APIs on developers”75 : “Building services and innovating at Amazon’s scale is hard. Really hard. AWS has impressively solved this problem with its two-pizza team approach, which allows service teams to autonomously design and implement their solutions. But while these isolated development teams can go fast, it’s obviously a trade-off with consistency. We see that in the AWS console, in CloudFormation definitions and coverage, and in the APIs.”

206

Chapter 6: Technical Architecture Principles

As the author explains, this is not an unsolvable problem. There are solutions like making the APIs go through a revision by a centralized committee, providing API design guidelines, or even partner feedback. Anything to remove the burden caused by inconsistent APIs that AWS developers have nowadays.

207

Chapter 7: Technology Principles Choosing the right technology is often the most significant area of investment you will make when developing or maintaining an internal software platform. Your choice of technologies significantly impacts the quality of the software and the team’s ability to operate and iterate it incrementally. That’s why, without being too prescriptive, this chapter outlines a few principles to help you make technology choices for your internal software platform.

Make your platform reachable It is recommended that all the services directly or indirectly under the control of the internal software platform adhere very closely to the highest aspirations of REST. By doing so, they expose addressable functional components as discrete resources whose capabilities are dynamically discovered and exploited at run time. These services should be exposed as publicly available APIs accessible through an HTTP endpoint, with a preference to the precepts of the REST architectural style. These are known as L3 APIs, per the Richardson REST API Maturity Model76 , with hypermedia support based on the HATEOAS style. Such representations allow an API to be dynamically discovered and navigated at runtime and easily participate in any conversational engagement. In contrary to path-based APIs, and not surprisingly, there is no URI structure in hypermedia-based APIs. This is because resource URIs are dynamically generated and auto-discovered by the consumer program through HTTP OPTIONS calls, so there is no need to have

Chapter 7: Technology Principles

209

a pre-fixated URI structure. In hypermedia-based APIs, URIs could even be obfuscated or encrypted, improving the overall security of your implementation. Services that are not directly under the control of your organization, such as 3rd party APIs, may not adhere so closely to the REST style, exposing more of their functionality statically. However, whichever API is being addressed, the general high-level analogy of such an API is that of a website for the addressed service, which replies to HTTP requests in the same manner experienced by human usage of HTMLbased websites. And that is because APIs are just user interfaces; it just happens that the user is another machine. An application that composes the platform’s services into a personaspecific user experience can be thought of as an automated (even scripted) website user. This analogy holds to a reasonably deep level since such an application will usually address itself to the various service websites that it requires to fulfill its purpose via the use of appropriate scripting languages. Topping your APIs with language-specific SDKs will help you contribute to a richer, more engaging, and more effective Developer Experience, especially when working with hypermedia-based APIs. The main reason is that the intrinsic navigable nature of this API architectural style may lead to verbose application code when developers need to work with a particular interaction down in the resource tree. Hence, SDKs help encapsulate this chattiness into higher utility functions that offer added-value semantics and orchestrate these native API interactions transparently for the developer.

AWS SDKs It is stimulating to see how AWS plays in this space by building APIs and especially SDKs for their services. In the spirit of putting developers at the center, some of their services teams start their design process using Smithy77 , a resource-based, protocol-

Chapter 7: Technology Principles

210

agnostic, and domain-specific language for defining services and SDKs. Business teams can easily understand the service abstractions through this declarative technique, and developers can automatically generate their API clients and documentation for any programming language. On top of that and as an implementation detail, AWS is starting to favor a more modular way to building SDKs as they have recently announced with the introduction of SDK V378 . We can see how utility functions and high-level operations are still at the core of their developer experience artifacts, as depicted by the following example: const { S3Client } = require("@aws-sdk/client-s3"); const { Upload } = require("@aws-sdk/lib-storage"); const multipartUpload = new Upload({ client: new S3Client({}), params: {Bucket: 'bucket', Key: 'key', Body: st\ ream}, });

As we can understand from this code excerpt, this simplified way of managing multipart uploads is not offered by the native Amazon S3 API. Instead, it is a utility abstraction in the programmatic SDK to help developers achieve a common need and reduce the boilerplate code that otherwise would have been necessary via API calls.

Define everything as code Internal software platforms must be based upon solid operational and software excellence principles. They aim to transmit these very same principles to the development teams building services and applications for them. If you want to retain the ability to quickly and

Chapter 7: Technology Principles

211

safely evolve your platform services and components, you need to create them the right way. To achieve that, try to express everything as code, from infrastructure to policies to documentation. It helps both enforce this desired excellence and continuously deliver all the work in progress. Especially with infrastructure, this does not necessarily mean that you have to use a proper programming language to write a deployment procedure. It would help to have a mechanism to express the infrastructure in a repeatable, automatable, and declarative manner.

Debugging in a serverless environment Tracing a bug in an extremely distributed context feels like playing Cluedo. Let’s put an actual use case. The recommended serviceful approach introduced earlier leads to a noticeable challenge inherent to the always-on nature of distributed platforms: the debugability of services and applications on local (or offline) environments. Although some industry vendors may even present this as an anti-pattern79 , the truth is that some degree of local debugging is always necessary to keep up with the continuous flow and short feedback loops promoted by agile software development practices. To achieve this, developers do not have to replicate the entire cloud system with all the services locally, as that would be a total waste of time. Integration between serverless services on the cloud is entirely based on ages-old protocols such as HTTP, MQTT, or DNS. Hence, developers will have to use a mocking mechanism to intercept those integrations locally using available state-of-the-art libraries (e.g., mock service worker) to concentrate on building and debugging their business logic code. It helps shorten the feedback loops, so developers become more confident before pushing their code to the cloud. So, when does this cloud deployment happen?

Chapter 7: Technology Principles

212

Again, in the spirit of agile development (and despite any local development and debugging), development teams will need to deploy their services and applications on the cloud soon, really soon. This way, they test their workloads on the actual online, distributed system, the serverless cloud. The best way to do this in a repeatable, efficient, and reliable manner is through infrastructure-as-code scripts. This type of technology allows developers to provision and manage the necessary cloud infrastructure required by their services programmatically and deploy their artifacts on that infrastructure upon creation. Integrating these scripts in their CI/CD pipelines will empower teams to unleash new levels of agility for creating quality software. Or even further, imagine integrating those scripts with source control management systems such as GitHub so you could spin up a personal sandbox environment for developers automatically every time they create a feature branch. Those new opportunities for value in the developer and operator experience are apparent side effects of the co-evolution of practices brought in by technologies such as infrastructure as code.

Infrastructure as Code with AWS As Werner Vogels (CTO of Amazon.com) likes to say: “Everything fails all the time80 .” Therefore, having mechanisms to enforce repeatability and quickly recover when someone goes off the process is essential, especially on the cloud. As a critical player in this space, AWS provides engineers with the right technology to define infrastructure as code utilizing AWS CloudFormation and AWS Cloud Development Kit (CDK). These tools offer a DSL and a programmatic API to allow developers to create, update, destroy, and manage AWS resources through code. Without entering in all the technical details, software development teams utilize these frameworks’ state management and

Chapter 7: Technology Principles

drift detection features to integrate infrastructure-as-code scripts in their CI/CD pipeline. This approach helps them build more complex Lifecycle Management solutions for their services and platform environments. AWS keeps developing new features in this space and investing in infrastructure-as-code tooling. This way, developers can apply the same agile practices of application development to infrastructure provisioning. For example, when these developers create or update a CloudFormation stack, it might fail for multiple reasons. There can be syntactical errors in the template, permission errors between resources, or external files that the CloudFormation engine can’t find. When this happens, CloudFormation rolls back the stack, which typically means deleting all resources. While this rollback behavior is pretty standard and much appreciated by operators in live environments, it does not match the fast feedback loops of agile development. And if, as we said, developers can create infrastructure in the same way as applications, that rollback mechanism is a bottleneck. It is a constraint that needs to be removed. And so AWS did, introducing a new critical feature to CloudFormation in August 2021 that lets you disable the automatic rollback and keep the resources successfully created (or updated) prior to the error. This way, the system automatically retries the stack creation from the point of failure the next time the template is executed. Infrastructure developers can now iteratively develop their infrastructure design quicker. Also, you can turn this feature on when you create or update a stack.

213

Chapter 7: Technology Principles

214

Use the cloud platform as a programmable system As introduced earlier, one of the main reasons why the advantages presented by serverless are relevant in this context is because it provides an exciting architectural style you may want to use to build the platform. With serverless computing, cloud providers make technical design decisions for you by creating building blocks to help you complete your applications more efficiently. This has a huge positive impact on the time to business value. Remember, serverless does not introduce new cloud technology. It is just ages-old standard technology on the cloud, but the way developers can use it unfolds unprecedented ways of building quality software. We need to go beyond thinking about the cloud as somebody else’s data center and start to act in terms of using the cloud as a system you can program.

At this point, it is easy to come back to Christopher Alexander’s lessons again. Looking at this through the prism of his theory of spaces, we can conclude that when using the cloud as a programmable system, we are talking about minimal code that developers will have to write to fill up the spaces left between the cloud services. Once again, less is more, and the best code is the one you don’t have to write. As introduced in previous chapters, the modern cloud is an asset, and the code becomes a liability. Once you have decided to move to the cloud and use managed services, plenty of handy functionalities are available to you that get richer over time at no cost. Your cloud subscription costs only kick in once you start writing code. That’s because, in a serverless environment, all code is debt. Not only technical debt but pure and plain debt.

Chapter 7: Technology Principles

215

Embrace the cloud fully As expressed through other principles, try to avoid abstractions and unnecessary frameworks that keep you and the development teams away from the managed cloud services and their widely adopted technology. It is essential to highlight that the overhead introduced by dealing with these frameworks in your software will be more expensive than writing the code twice in the unlikely event that you need to migrate it to another cloud provider. By embracing the cloud fully, you will be able to tap into immediate, tangible benefits that outweigh any hypothetical re-platform efforts. This is an essential (yet overlooked) general software engineering principle that you can very well apply to other technology stacks and disciplines, such as mobile development. For example, this is what Dropbox engineering teams state in their article “The (not so) hidden cost of sharing code between iOS and Android”81 . This approach of leveraging the cloud as much as possible also means that it is critical to lean on the cloud platform’s capabilities to trace, observe, monitor, and debug services (core and business). At first sight, this may look like a step back in your ability to ship software fast. This effect becomes more evident if you compare the feedback loops in distributed cloud services with the ones you would get from other more traditional on-prem approaches, where everything seems to be in one place: the server. In any case, the number of innovations and investments and the pace at which modern cloud platforms are improving and iterating their managed services is impressive. It may very well be that in a couple of years from now, the speed to market that you get by building and operating serverless systems outpaces the one you would get from any other way of creating software. The main reason is that cloud providers keep turning into commodities many of today’s products and custom solutions.

Chapter 7: Technology Principles

216

Serverless is a non-zero-cost tradeoff Again, it would be best if you ran away from technical absolutisms, so it is acceptable to look at Serverless with a skeptical eye. We have seen how a few brilliant programmers such as Basecamp’s Jorge Manrubia write about the “Complexity-inducing Narratives”82 that surround this type of architectural style. Indeed, there is a risk of overengineering solutions with too many interconnected components, coming up with systems that otherwise could have been simpler using a more monolithic design. And there is some truth embedded in that, especially related to the fact that with serverless, we are programming a distributed system where the different services are integrated using network protocols such as HTTP. As we know from the fallacies of distributed computing83 , the network is unreliable, the latency is not zero, and the bandwidth is finite. Even if the cloud provider manages these complexities, the tradeoff is a non-zero-cost one. In the same spirit of disbelief, you need to be aware that this type of comparison usually happens when you look at serverless designs as if they were merely isolated infrastructure components on a data center instead of a system you are programming. The equivalent monolithic approach for the same system would for sure have an equally complex design. Although monoliths are deployed as one big unit without the complexities of distributed components, they go through the challenges of a multithreaded system. Once again, you will have to analyze and understand the tradeoffs to develop the best solution to your context. The next time you look at a serverless design diagram, think of it as if it was your system’s UML collaboration diagram you are about to deploy.

Chapter 7: Technology Principles

217

Working with AWS services As with any other abstraction layer, this new paradigm comes with tradeoffs and constraints you need to be aware of. Take your time to analyze and understand them and, if you accept them, embrace the technology thoroughly and do not try to circumvent it with unnecessary abstractions. For example, this is precisely the approach followed by Atlassian when they decided to implement Micros84 , their internal software platform for supporting more than 1000 microservices on production. As the author describes in the publication above, Atlassian’s internal software platform architecture adds AWS functionalities. Still, they don’t call them abstractions, as these functionalities are intentionally leaky, meaning that the platform “deliberately exposes the details of the underlying AWS infrastructure that it provisions and manages.” This approach leads some Atlassian service engineers to question the platform’s value, as they could achieve the same outputs by having direct access to the AWS services. This is where the developer experience curated by the platform team comes into play. Access to the underlying AWS managed services should be guided by the principles and tools you will be putting together as part of your internal software platform. It will help you to ensure integrity concerning security, compliance, and operational excellence. Make your platform transparent It would be best to expose the details of the underlying cloud service that your component manages, especially for those use cases where you need to build functionalities that provide developers with consistent experience around the cloud platform. For example, if you decide to create your own integration services to control all the meshing on your platform, don’t encapsulate the native event representations that the cloud platform pushes to the target services. In the case of AWS, this means that those services implemented as AWS Lambda

Chapter 7: Technology Principles

218

functions should get full access to the native event and context objects from S3, Kinesis, or API Gateway without any alteration, encapsulation, or modification introduced by your platform. Embracing the cloud will help you in the long term. This way, software engineers can use the widely adopted (and documented) AWS standards directly, benefitting from any service enhancement when the cloud platform makes them available.

Avoid micro-optimizations With AWS serverless computing services, software architects and designers must look at cost optimization as another critical factor in the design decision mix. But within reason. Let’s put a concrete example. Suppose you have to implement a layered authorization mechanism for a service you are creating so that access to specific API resources and interactions are restricted depending on the user role. Also, imagine your service backend is running with a couple of AWS Lambda functions exposed to the service clients through a secured API on the Amazon API Gateway. Thus far, this is a very standard design for a service running on AWS. To implement your custom authorization mechanism, you have the option of putting all the necessary logic in the backend. First, inspect the user token, extract the user role, and finally apply some data filtering. Everything in one place, running on the same component, following your best design patterns possible. However, AWS already gives you some specialized technical constructs, so you don’t have to implement all this authorization logic on your own. This way, you could use an Amazon API Gateway Authorizer Lambda function to execute a first-pass authorization check for the API resources. For example, suppose the user does not have the proper role. In that case, you could block API calls

Chapter 7: Technology Principles

219

at the Amazon API Gateway level and avoid your business logic being executed at all. This approach improves overall performance and costs, so your AWS Lambda functions are not invoked in every user call. Also, you are delegating this responsibility to a component that AWS specifically designed to achieve this task. What are the implications, though? You are splitting the authorization checks into several places. On the one hand, you have the first check at the Amazon API Gateway, but only for what this component can verify at this level, such as role-based policies associated with named API resources. On the other hand, what happens if you want to filter the resource data properties returned to the final user based on the role? You can’t do this at the Amazon API Gateway authorizer level because it has no access to this data. Instead, you will have to implement it in the AWS Lambda functions that implement the service backend. Are you embracing the cloud with this approach? For sure. Is this a good design? It depends. If your particular requirements demonstrate that the cost gains of taking this approach outweigh any other alternative, then it is a good decision. If not, this is a dangerous micro-optimization that compromises application maintainability and traceability.

While embracing the cloud is essential, there is a risk in making design decisions that trade insignificant cost improvements for other equally critical non-functional requirements.

Secure your platform access and data Nothing replaces the need to include a comprehensive cyberprotection solution for your internal software platform encompassing

Chapter 7: Technology Principles

220

authentication, authorization, and data encryption to control access and protect your customer environments. In a services-based platform where data and functionality are exposed through APIs, you can consider the authorization model to be multilevel: • Authenticate all your platform entry points by default. A potential approach to platform authentication is to leverage the underlying cloud authentication services. This, of course, should include basic user ID and password authentication support, OAuth2, multi-factor authentication, and single-sign-on capabilities that your customers are expecting. Customers may have their own enterprise Identity Access Management (IAM), in which case your platform identity management will act as a security consumer, accepting the authenticated user credential (such as a SAML token and assertion containing the user profile) from the upstream authenticator. • Authorize all your platform entry points by default. It means that the platform’s API management component should manage API authorization in conjunction with the Authorization policies provided by the platform’s identity management capability. Specific access policies can be designed and implemented to authorize (or restrict) who can access which API endpoints (or even REST resources) based on the particular characteristics of the object being secured. • Authorize access to your data. It would be best to hand off subsequent authorization for a particular business function and access to data to the business service, which provides detailed context to develop authorization policies for either individual users or groups based on role. This information can be communicated back to the platform itself, allowing for a single point of enforcement. Additionally, all data must be encrypted in-flight whenever sent over public connections and at rest when saved on the public cloud using managed storage services.

Chapter 7: Technology Principles

221

AWS security model Security and compliance are shared responsibilities between AWS and its customers. It means that the cloud provider is responsible for the security of the cloud stack, including hardware, virtualization, or runtimes. Their customers, conversely, are responsible for the security in the cloud concerning data, network configuration, or authorization. Of course, customers’ responsibilities vary depending on the services. For example, AWS customers are always responsible for patching the guest operating system on an Amazon EC2 instance, an activity that is not necessary when they use AWS Lambda. AWS also provides managed services to facilitate engineers securing access and data in the cloud, two of the responsibilities assigned to the customers. The following sections summarize a list of the most relevant managed cloud services in this space. Authentication Amazon Cognito provides user sign-up, sign-in, and access control to Web and mobile apps. Users and groups are configured and stored in user pools to offer a secured directory that scales to millions of users. Authentication with Amazon Cognito consists of verifying user identities to allow sign-in to applications and services and can be implemented with several approaches, including basic authentication, multi-factor, or passwordless. Authorization Services running on AWS typically use access control mechanisms to ensure only authenticated and authorized users gain access to the APIs exposed on the Amazon API Gateway and other backend resources.

Chapter 7: Technology Principles

222

This access control is implemented at the Amazon API Gateway level employing authorizer functions that require the user’s access token generated by Amazon Cognito. API requests are then authorized upon verification of the user ID and other claims contained within the token. Data Security Developers are responsible for encrypting data stored by the business services running with the platform. To that end, AWS storage services such as Amazon RDS or Amazon S3 include built-in encryption capabilities that developers can quickly turn on to activate an additional layer of data security at rest. Also, data flowing across the AWS global network that interconnects their datacenters and regions is automatically encrypted at the physical layer.

Observability and visibility The primary challenge of using the cloud platform as a programmable system is getting high levels of observability and debuggability for your applications. It is fantastic to log in to a platform console and look at it as a distributed set of managed services available for you to focus on your business function. However, debugging application logic through distributed services may negatively impact the feedback loops and affect your ability to ship software fast, which is precisely one of the benefits that you would obtain by using serverless. So yes, there is a Catch 22 somewhere. And that’s only half of the problem. It was difficult for engineers and architects in the past to mistakenly make a million-dollar-worth hardware purchase order with legacy infrastructure. That would have had to get several approvals per the organization’s DoA chain. With

Chapter 7: Technology Principles

223

serverless instead, developers can inadvertently spin up that same level of resources through a simple script and only find out months later when their managers receive the bill. In this case, you need to provide the teams with the tools to favor prediction over aftermaths. The challenge here is that it is difficult to inventory the services you are using. The only actual inventory with most cloud providers is usually the bill.

AWS X-Ray AWS X-Ray helps engineers satisfy part of their analysis and debugging needs for their distributed applications running on serverless environments. It is indeed a powerful service that, through non-intrusive code instrumentation, allows you to obtain incredible insights with practically zero performance impact. It is beneficial for debugging standard serviceful architectures where APIs are exposed through the Amazon API Gateway. It lets you obtain full traceability of all the services implicated down the integration chain by enabling one property. For more advanced visibility and observability requirements in cybersecurity, centralized logging, cost tracking, and performance monitoring, AWS allows native integration with other third-party, widely-used tools such as Prisma Cloud, Splunk, Cloud Range, or Dynatrace, respectively, amongst others.

Use open standards Try to leverage the outside-in as much as possible. Using standard components and patterns means you do not have to solve problems that have already been solved, not only by your organization but by the broader global community. By doing this, you can provide users with a good experience in

Chapter 7: Technology Principles

224

a cost-effective way. In the same way, when you develop your components or patterns, share them so that others can benefit from your work, either inside your organization (inner source) or outside (open source). Open standards and libraries help services work consistently, so you’ll spend less time making systems talk to each other. And they allow you to avoid getting locked into a particular supplier or product for building an internal software platform. Good services are customer-centric, designed for maximizing impact, and built to satisfy real needs. As with any good product, internal software platforms are developer-centric and provide a minimal, nonintrusive architecture for these teams to focus on building software that the final users will love to use. This approach espouses the integration principle of dumb pipes and smart endpoints. Consequently, the better service developers get at using the internal software platform, the better they get at building software without it too. The better developers get at using serverless computing, the better they get at building software without them too. The reason is that serverless computing services are also fully based on standards. That is, precisely, how they became utilities.

Industry standards in AWS Ignoring the industry standards has been one of the main issues with traditional Service-Oriented Architectures (SOA) as we knew it during the first decade of this century. Unlike modern, confirmed service-oriented teams and architectures, this legacy approach to building platforms was too focused on optimizing proprietary product licenses and centralizing integration logic, which developers shoved into the integration bus. As a distributed platform of infrastructure services, AWS cloud

Chapter 7: Technology Principles

is entirely based on standards, which makes it a great candidate to become the technology foundation of your internal software platform. Almost every service on their platform (if not all) has an off-cloud equivalent based on the same industry standards. This is because, as Simon Wardley says85 : “In a service economy, SLAs are not as important as portability between providers. Without such portability, you will remain stuck in a product-based economy, albeit one you can rent over the wire.” Applications running on AWS serverless computing services are fully portable because of their standards-based interoperability. For example, technologies such as the AWS Lambda engine or even its underlying virtualization layer have been open-sourced to the public. Also, Firecracker could become the de facto standard for serverless virtualization during the next couple of years. Of course, open sourcing is not as crucial as open-sourcing at the right time. You risk your competitive advantage if you do it too soon and do not enjoy a leading position. Evidently, AWS does not have that issue. To lay out some examples: • AWS Lambda runs standard runtimes such as NodeJS, .NET, JVM, or Python. Also, its engine has been made available to the public, as mentioned earlier. • Storage created through the Amazon EFS service adheres to the NFS standard. • Systems integrations handled by Amazon MQ follow the MQTT protocol. • Service APIs created using the Amazon API Gateway can be consumed using HTTP and Websockets and defined, imported, and exported using OpenAPI v2.0 and OpenAPI v3.0. • You can manage NoSQL databases on Amazon DocDB using an API that is fully compatible with MongoDB.

225

Chapter 7: Technology Principles

This approach means developers will always have an equivalent outside AWS to run their services, either on-prem or other cloud platforms. This re-platforming will undoubtedly take some effort, but the critical point is that lock-in is not an excuse not to leverage native managed cloud services. All these services are exposed through HTTP interfaces that allow interactions based on the widely accepted REST semantics. This standards-based approach helps in mitigating the risks associated with a potential migration off AWS for your internal software platform, a critical topic covered in detail in the following section.

226

Chapter 8: Serverless-first Software Engineering Building portable software As your organization starts transitioning toward a platform-based architecture, it is an accepted (yet transitory) pattern to use managed cloud services to build business services but deploy them off-platform when required by your customers. That presents an exciting challenge because, as you might have guessed, you don’t want to create multiple versions of the same service (one for each cloud platform). The focus is on serverless computing from the offset for default implementation of these business services. Therefore, this section describes a couple of software design approaches that mitigate the risks associated with any potential re-platform effort while still leveraging managed cloud services. With this background, the recommended technical approach that allows your organization to build business services on the cloud and deploy them virtually anywhere is straightforward. It follows some standard (and ages-old) programming patterns to apply code changes to very isolated and specific parts of the application’s source code. As we have already discussed in this book, it is preferable to write code twice than introduce unnecessary abstractions that keep you away from the benefits of the underlying cloud. Be wary about multicloud and workload orchestration solutions such as Kubernetes, as they only provide a mechanism to deal with technical debt, not attacking it at its core. When you transition to SaaS, the challenges typically are cost control, managed services, and live deployments (no maintenance windows).

Chapter 8: Serverless-first Software Engineering

228

These requirements indirectly push you to use serverless computing services, ensuring you will benefit from the full capabilities of the cloud platform. Closer proximity to cloud features makes for simpler implementation of new business services and enhancements to the existing ones. Approaches based on common abstractions invariably address the lowest common denominator, put a brake on innovation, and result in more costly and complex solutions. Conversely, biting the bullet to rewrite native code to address a different platform, while on the face of it is a pain, is usually relatively straightforward and less costly. How to approach this, then? The answer is by using good programming practices. We will look at this from two different architectural perspectives that arrive at the same conclusion and lead to similar implementations: the Provider Pattern and Hexagonal Architectures.

The Provider Pattern Where there is a supported industry API (e.g., DNS, MQTT, or HTTP), use that rather than direct service access, embracing the well-known robust tenets of interface-oriented programming: • Separate functionality into modules that communicate via wellknown functional interfaces (e.g., interfaces that mean something to the consumer function, rather than being technical abstractions) • Isolate code that accesses cloud services behind such interfaces. • Use dependency injection to instantiate the appropriate interface implementation at runtime. • Re-implement those interfaces in the event of a re-platform requirement.

Chapter 8: Serverless-first Software Engineering

229

This is commonly known as the Provider Pattern and can be applied to service and application development easily (Figure 48). The following scenario is an example of this pattern using a cloud service as the provider: • The majority of cloud platforms have a service to provide object storage. • You can build an interface that presents those objects in a manner that makes sense to the application (e.g., a file name, perhaps with additional application-specific semantics) • All consumer object storage modules within the service will address this interface instead of the underlying cloud API. • Write a provider module that implements the interface using the cloud API and translates the data into the format demanded by the interface. • At application startup, accept an environment variable naming the object storage implementation to use. • Re-implement the module with another provider if you need to deploy your service on a different platform using an additional object storage service, leaving the rest of the business logic untouched.

Chapter 8: Serverless-first Software Engineering

230

Figure 48. Provider pattern sowing the replaceable interface implementations

In summary, in the event of a change of provider, the result will be: • Isolated places where change is necessary, such as a new interface provider, a new event format transformer, or even a slightly different container configuration. • You will enjoy a greater ability to fit business services with on-prem deployments and ease any necessary migration to all major cloud providers.

Hexagonal Architectures Another way of looking at this challenge is through the prism of the evolutionary properties of hexagonal architectures. This pattern is used in software design and “aims at creating loosely coupled application components that can be easily connected to their software environment by means of ports and adapters,” as per the Wikipedia definition by mid-2021. The principles of hexagonal architectures can help you isolate the domain logic of your service component from the implementation details of the underlying platform. Ports and adapters handle this decoupling as described in the following picture (Figure 49)

Chapter 8: Serverless-first Software Engineering

231

Figure 49. Components of an Hexagonal Architecture

This architectural approach espouses the same interface-oriented programming and separation of concerns foundations as the provider pattern: • Again, the domain logic that implements your core business function is separated from the other component modules and communicated via functional interfaces. • Ports are more technical interfaces that allow clients to interact with the domain logic and access the underlying cloud services. • Adapters are functions that transform messages from external components into something understandable by the internal interfaces and vice-versa. Designing hexagonal architectures requires a little bit of upfront time and investment since careful thought is needed to create a good design. But, in any case, this design exercise is guided by a change-friendly pattern that will give you some guard rails and future-proofing benefits. In the unlikely event of platform portability, the changes in your component will be concentrated mainly in two dimensions: • Input adapters for transforming potentially new message formats used by the clients accessing your component in the new platform.

Chapter 8: Serverless-first Software Engineering

232

• Output ports for accessing the new cloud services as per the new programming interfaces. Although the underlying protocols may still be based on the same standards (e.g., HTTP endpoints) and the semantics may be equivalent, the developer experience may differ.

Conclusion and sample implementation using AWS The critical thing about cloud-managed services and serverless is that most of those technologies are very standard within the industry and available off-cloud, though usually at a higher overall cost. Remember, portability is essential when moving from a productbased economy to a service-based one, which is only possible when these services are based on standards. Let’s look at how a reference implementation could look like using AWS Lambda as the technology of choice for your service component (Figure 50). The Lambda function interacts with Amazon API Gateway as a primary driver to receive requests and accesses Amazon DynamoDB as a secondary player to read and write data:

Figure 50. Hexagonal Architectures with AWS services

• Assuming that your service component is exposed and accessible through a REST API, one of the service clients requests to the

Chapter 8: Serverless-first Software Engineering

233

service HTTP endpoint exposed on the Amazon API Gateway, which passes the request to the lambda function. • A message containing all the request properties arrives at the function encapsulated within the event and context objects. • At this point, an adapter is needed to parse the request message and extract the necessary information to process the request. The following code snippet in JavaScript shows how you can do this, where processPath is the adapter function that extracts the required information from the Amazon API Gateway event message and matches the resource path and HTTP method to a map of routes already pre-defined. The result is a port function name that is used to interact with the domain logic:

exports.handler = function (event, context, callbac\ k) { try { const portHandler = processPath({ event.requestContext.resourcePath, event.httpMethod, routes, }); if (portHandler) { portHandler(event, context, callback); } else { callback(null, { statusCode: 404, headers: myHeaders } ); } } catch (e) { callback(null, generateError(e, "Server Exception\ ")); } };

Chapter 8: Serverless-first Software Engineering

234

• The input port function is still technical in nature. It contains the REST API handling semantics, such as status assignment and code and response message creation, as we can see with the generateRESTResource function. However, it is at this point where we start interacting with the business logic employing the getData interface, as shown in the following extract:

exports.getDataHandler = async (event, context, cal\ lback) => { const dataId = event.pathParameters.dataId; try { const data = await getData(dataId); callback(null, { statusCode: 200, headers: myHeaders, body: generateRESTResource(data), }); } catch (e) { callback(null, { statusCode: 500, headers: myErrorHeaders, body: JSON.stringify({ message: e.message, }) }) } };

• Finally, the domain logic in charge of processing and transforming the business information is isolated behind the getData interface. At this point, the business logic uses an output port function to interact with the cloud storage service, in this case, Amazon DynamoDB. This output port is nothing but the getItem function in the AWS SDK for JavaScript that abstracts

Chapter 8: Serverless-first Software Engineering

235

developers from the implementation details off Amazon DynamoDB. const getData = async (dataId) => { const params = { TableName: myDataTable, Key: { 'my_data_id': { S: dataId } } } let dataResponse; try { dataResponse = await getAWSClient("DynamoDB").g\ etItem(params).promise(); } catch (e) { raiseError("Unable to determine data details fr\ om DynamoDB"); } return dataResponse }

As anticipated earlier, in the unlikely event that you need to move your service component to another infrastructure, changes to your code are localized in the input adapter processPath function and the output port getItem as per the provider’s SDK. In short, try to exploit modularization, interface-oriented programming, dependency injection, standard industry APIs, and decorator services at all layers of your service implementation (and even enterprise architecture). In other words, ages-old programming practice.

A serverless-first approach with AWS AWS has enabler technology that gives you the option of designing your service without following the above patterns, thus catering to

Chapter 8: Serverless-first Software Engineering

236

an unlikely re-platform event. This technology is known as AWS ECS Anywhere86 - which the cloud provider introduced in December 2020. This new managed cloud service allows you to deploy and run container tasks virtually in any environment. According to AWS’s press release, “this will include the traditional AWS managed infrastructure, as well as customer-managed infrastructure. All this without compromising on the value of leveraging a fully AWS managed, easy to use, control plane that’s running in the cloud, and always up to date.” AWS allows the execution of cloud workloads on edge, which helps when custom requirements force you to build and deploy the same business service on different platforms. These new AWS capabilities enable the provision of computing resources from regions, AZs, edge locations, and now even from custom premises. As AWS states in their note, “Cloud connectivity is required to update or scale the tasks, or to connect to other in-region AWS services at runtime.” This statement tells us that the industry is moving toward a decentralized provisioning model with centralized control. What does this mean for the development of software on the cloud? This move means that you can now safely adopt a serverless-first approach to implement business services that may need to be deployed both on custom premises and AWS cloud. As a result, while the provider pattern and hexagonal architectures are elegant ways of preparing your code for change, your default application development strategy can leverage serverless by default without defensively designing your applications with portability requirements in mind. You can now rely on the cloud provider to supply the application primitives you need for our services anywhere. There are a couple of options: 1. For small workloads, you can build your services using AWS Lambda functions. It just happens that AWS also announced that they had open-sourced all their base images for AWS Lamba

Chapter 8: Serverless-first Software Engineering

237

runtimes87 . It means that you can take all your AWS Lambda functions, package them into containers using their base images and run them on AWS ECS Anywhere. You invest in serverless once, and you get free portability. 2. You may be already using Fargate tasks on Amazon ECS for bigger workloads, which means you can now move those containers to ECS Anywhere, and you are ready to go. Of course, there are many other considerations, and the effectiveness of this approach also depends on how your architecture is leveraging other managed services such as DynamoDB, SNS/SQS, EventBridge, API Gateway, or Step Functions. This is a trend, and AWS is investing in the technology to make it happen. Consequently, decentralized resource provision with centralized control is here to stay. This business model for sure will keep evolving during the following years. For now, you can start moving your applications virtually to any premise while leveraging your initial serverless investment. And you can do that without the need of wrapping them in complex fleet managers or even without rewriting a single piece of code.

NOTES

238

Notes Preface 1 https://medium.com/wardleymaps 2 https://www.forbes.com/sites/bernhardschroeder/2019/09/23/ what-is-the-most-important-element-of-a-successful-startuphint-its-not-the-idea-team-business-model-or-funding-dollars/ Chapter 1: Transitioning to SaaS 3 https://www.laverdad.es/sociedad/despegue-gastronomico20180705004051-ntvo.html 4 https://www.forbes.com/sites/techonomy/2011/11/30/nowevery-company-is-a-software-company/ 5 https://www.askyourdeveloper.com/ 6 https://www.mckinsey.com/industries/financial-services/ourinsights/transforming-life-insurance-with-design-thinking 7 https://www.celent.com/insights/375184678 8 https://twitter.com/jezhumble/status/1422924763647778821 9 https://blog.thestateofme.com/2020/03/23/industry-bestpractice-as-expressed-in-software/ 10 https://www.twilio.com/docs/flex 11 https://blog.gardeviance.org/2008/04/here-comes-farmer.html 12 https://www.mckinsey.com/business-functions/mckinsey-

NOTES

239

digital/our-insights/saas-open-source-and-serverless-awinning-combination-to-build-and-scale-new-businesses Chapter 2: Internal Software Platforms 13 https://www.thoughtworks.com/radar/techniques/platformengineering-product-teams 14 https://serverlessland.com/patterns 15 https://martinfowler.com/articles/platform-prerequisites.html 16 https://www.linkedin.com/pulse/time-say-goodbye-nicolas-mchaillan/ 17 https://p1.dso.mil/#/ 18 https://software.af.mil/dsop/documents/ 19 https://repo1.dso.mil/users/sign_in 20 https://blog.developer.atlassian.com/why-atlassian-uses-aninternal-paas-to-regulate-aws-access/ Chapter 3: Platform Services 21 https://www.gov.uk/service-manual 22 https://www.gov.uk/service-manual/service-standard 23 https://www.gov.uk/service-toolkit 24 https://martinfowler.com/articles/microservices.html 25 https://blogs.gartner.com/marco-meinardi/2020/09/04/adoptingkubernetes-application-portability-not-good-idea/

NOTES

240

26 https://dxc.com/us/en/insights/perspectives/article/it-s-time-todo-cloud-right 27 https://twitter.com/rmedranollamas/status/1270394551270850560? s=20 28 https://martinfowler.com/bliki/HarvestedFramework.html 29 https://martinfowler.com/bliki/FoundationFramework.html 30 https://github.com/aws-amplify/amplify-js/issues/3123# issuecomment-621398068 31 https://en.wikipedia.org/wiki/Jevons_paradox 32 https://reactjs.org/blog/2020/12/21/data-fetching-with-reactserver-components.html 33 https://hotwire.dev/ 34 https://buttondown.email/hillelwayne/archive/a21f0eab-404c472b-b35d-e7d9c58e13fc 35 https://dev.to/peibolsang/angular-framework-spa-xor-ssr-339o 36 https://hexdocs.pm/phoenix_live_view/Phoenix.LiveView.html 37 https://dotnet.microsoft.com/apps/aspnet/web-apps/blazor 38 https://m.signalvnoise.com/html-over-the-wire/ 39 https://medium.com/javascript-scene/the-missing-introductionto-react-62837cb2fd76 40 https://twitter.com/isotopp/status/1287277306441011201?s=20 41 https://en.wikipedia.org/wiki/Content_negotiation

NOTES

241

42 https://martinfowler.com/articles/micro-frontends.html 43 https://blog.rowanudell.com/the-serverless-computemanifesto/ 44 https://acloudguru.com/blog/engineering/aws-lambda-iswinning-but-first-it-had-to-die Chapter 4: Platform Teams 45 https://opensource.guide/leadership-and-governance/ 46 https://skamille.medium.com/product-for-internal-platforms9205c3a08142 47 https://architectelevator.com/architecture/organizingarchitecture/ 48 https://swardley.medium.com/how-organisations-arechanging-cf80f3e2300 49 https://thenewstack.io/the-evolution-of-the-serverless-firstengineer/ Chapter 5: Platform Adoption 50 https://obamawhitehouse.archives.gov/blog/2010/11/07/usindia-partnership-open-government 51 https://www.viima.com/blog/types-of-innovation 52 https://martinfowler.com/bliki/BranchByAbstraction.html# footnote-coin 53 https://martinfowler.com/articles/patterns-legacydisplacement/

NOTES

242

54 https://docs.aws.amazon.com/whitepapers/latest/microserviceson-aws/microservices-on-aws.html 55 https://martinfowler.com/articles/microservices.html# SmartEndpointsAndDumbPipes 56 https://www.bbva.com/en/economics-of-serverless/ 57 https://sparkbox.com/ Chapter 6: Technical Architecture Principles 58 https://medium.com/javascript-scene/composing-software-thebook-f31c77fc3ddc 59 https://c4model.com/ 60 https://martinfowler.com/bliki/ParallelChange.html 61 https://steve-yegge.medium.com/dear-google-cloud-yourdeprecation-policy-is-killing-you-ee7525dc05dc 62 https://basecamp.com/about/policies/until-the-end-of-theinternet#:~:text=We’re%20dedicated%20to%20supporting, Basecamp%20is%20built%20to%20last. 63 https://stripe.com/docs/upgrades 64 https://acloudguru.com/blog/engineering/aws-lambda-iswinning-but-first-it-had-to-die 65 https://aws.amazon.com/message/11201/ 66 https://www.youtube.com/watch?v=t705r8ICkRw&t=814s 67 https://basecamp.com/gettingreal/10.1-less-software

NOTES

243

68 https://aws.amazon.com/blogs/industries/an-inside-look-atthe-amazon-culture-experimentation-failure-and-customerobsession/ 69 https://aws.amazon.com/executive-insights/content/theimperatives-of-customer-centric-innovation/ 70 https://en.wikipedia.org/wiki/Fallacies_of_distributed_ computing#:~:text=The%20fallacies%20of%20distributed% 20computing,to%20distributed%20applications%20invariably% 20make. 71 https://martinfowler.com/bliki/FoundationFramework.html 72 http://www.iwritewordsgood.com/apl/patterns/apl120.htm 73 https://martinfowler.com/bliki/HarvestedFramework.html 74 https://www.lastweekinaws.com/blog/the-17-ways-to-runcontainers-on-aws/ 75 https://www.lastweekinaws.com/blog/how-aws-dumps-themental-burden-of-inconsistent-apis-on-developers/ Chapter 7: Technology Principles 76 https://martinfowler.com/articles/richardsonMaturityModel. html 77 https://awslabs.github.io/smithy/ 78 https://aws.amazon.com/blogs/developer/modular-packages-inaws-sdk-for-javascript/ 79 https://dev.to/garethmcc/why-local-development-forserverless-is-an-anti-pattern-1d9b

NOTES

244

80 https://www.allthingsdistributed.com/2016/03/10-lessons-from10-years-of-aws.html 81 https://dropbox.tech/mobile/the-not-so-hidden-cost-ofsharing-code-between-ios-and-android 82 https://world.hey.com/jorge/complexity-inducing-narratives0808911c 83 https://en.wikipedia.org/wiki/Fallacies_of_distributed_ computing 84 https://blog.developer.atlassian.com/why-atlassian-uses-aninternal-paas-to-regulate-aws-access/ 85 https://blog.gardeviance.org/2008/04/here-comes-farmer.html 86 https://aws.amazon.com/blogs/containers/introducing-amazonecs-anywhere/ 87 https://github.com/aws/aws-lambda-base-images/blob/ nodejs12.x/Dockerfile.nodejs12.x