Part 11. The Future of DevOps and Software Delivery

Update, June 25, 2024: This blog post series is now also available as a book called Fundamentals of DevOps and Software Delivery: A hands-on guide to deploying and managing software in production, published by O’Reilly Media!

It’s difficult to make predictions, especially about the future.

— Niels Bohr

This is Part 11 of the Fundamentals of DevOps and Software Delivery series. In all the previous parts of this blog post series, you learned about the current state of DevOps and software delivery. In this blog post, I’m going to share my thoughts on what the future might look like. Trying to predict the future is usually a bad idea. If you get it wrong—and you almost always get it wrong—you look like a fool. If you get it right, everyone says that any fool could’ve seen it coming. That said, I thought it would be fun to conclude this blog post series by exploring the following emerging trends:

Infrastructureless
Generative AI
Secure by default
Platform engineering
The future of infrastructure code

Let’s start with infrastructureless.

Infrastructureless

Much of the history of software is one of gradually moving to higher and higher levels of abstraction. For example, here’s a brief, very incomplete, and mostly wrong glimpse at how programming languages have evolved (with a hat tip to James Iry):

Machine code → Assembly → FORTRAN → C → C++ → Java → Scala → Flix → ???

At each step in this evolutionary process, programming languages have generally gotten further and further away from the details of the underlying computer architecture, which has meant (a) giving up some control in exchange for (b) not having to worry about entire classes of problems at all. For every C programmer complaining that Java doesn’t let them do memory management the way they want to (e.g., with pointer arithmetic), there is a Java programmer happily working away without having to think about memory allocation, deallocation, and buffer overflows, at all. And they are both right. When you need low-level control, a lower-level language is great; when you don’t, a higher-level language is great.

The key insight is that (a) you don’t need the low-level control for the majority of use cases and (b) the higher-level constructs tend to make languages easier to use and accessible to a larger audience. Therefore, each time a new generation of higher-level languages come along, some percentage of programmers stay with the lower-level languages (after all, there is still plenty of code being written in C), but gradually, many more programmers move over to the higher-level languages.

I believe something similar is happening in the DevOps and software delivery space:

Servers → VMs → Containers → Serverless → ???

At each stage, you give up control in exchange for not having to worry about entire classes of problems at all. For example, with serverless, you give up control over the hardware and runtime environment, but in exchange, you don’t have to think about racking servers, patching the OS, scaling up or down, replication, and so on. Instead, as you move up the abstraction stack, you get to focus more and more on your particular problem domain.

To be fair, it’s a simplification to say that programming languages and software delivery evolve in a straight line; the reality is messier, full of zigzags and back-and-forths. Sometimes, we go backwards, such as Basecamp exiting the cloud, migrating from AWS back to their own servers. Sometimes, something comes along that is ahead of its time, before the industry is really ready for it, such as Lisp, which first appeared in the 1960s, but seems to only be getting industry adoption now in the form of modern dialects (e.g., Clojure). Sometimes, there is an abstraction penalty, where the extra layers of abstraction reduce performance in higher-level languages (e.g., Ruby).

Despite the occasional step backwards, over the long term, the industry moves forwards; or rather, it moves up the abstraction ladder. For every one company like Basecamp that finds running their own servers to be a better fit, there will be ten new companies that find IaaS, PaaS, and serverless a better fit; for every one use case where performance matters enough that you need to use a lower-level language like C, there will be ten use cases where productivity and safety matters far more, so you’ll pick a higher-level language like Java or Python.

Which brings me to my prediction for the future: most of what we think of as infrastructure today will be abstracted away, and the focus will shift more and more to apps. You’ll effectively be saying, "here’s a piece of code I want to run" and the rest will be taken care of without you having to think about it. This is similar to the serverless model, so you could say my money is on serverless—not VMs or Kubernetes—as where the industry is headed in the future, but I think it goes beyond that.

If you’re using serverless today, you still have to think quite a bit about infrastructure. For example, if you use AWS Lambda, you still have to think about AWS data centers (i.e., regions, availability zones), endpoints (e.g., Lambda function URLs), caching (e.g., CloudFront), databases (e.g., RDS), database connectivity (e.g., RDS Proxy), networking (e.g., VPCs), and so on. This is not a knock against Lambda: it’s an amazing technology, but we’re still in the early days of serverless.

When serverless matures, we’ll think of it, for lack of a better term, as infrastructureless, where you don’t have to think about any of the underlying infrastructure (data centers, servers, networks, data storage) at all, and can instead focus entirely on your apps. There’s of course still infrastructure there, but for most use cases, it will be handled entirely for you—a lower-level problem that you can ignore, while focusing on the higher-level problems related to your apps.

This focus on higher-level abstractions and apps will be a common theme throughout this blog post. This includes generative AI, which may become one of the higher-level abstractions we use, as discussed in the next section.

Generative AI

Generative artificial intelligence (GenAI) refers to software that can create text, code, images, and video, typically in response to prompts, at a level that sometimes matches or exceeds human abilities. Under the hood, GenAI uses large language models (LLMs), which are able to extract patterns, structure, and statistical relationships from massive data sets. Seeing GenAI answer your questions on almost any topic, write code that actually works, and create stunning images and videos can be equal parts remarkable and terrifying—when it works. Seeing all the things it gets wrong can be pretty entertaining, too.

This is an exceptionally fast-changing space, so by the time you read this, it’ll most likely be different, but as of 2024, some of the major players include ChatGPT, Llama, Claude, GitHub CoPilot, and Cursor (full list). Almost every major company is getting involved in creating new GenAI models, and everyone else is busy trying to find ways to incorporate GenAI into their software.

There are many wild claims that GenAI will replace programmers entirely, but if anything significant changes, I think it’s more likely that GenAI will become the next higher-level abstraction for programming. Perhaps some day, instead of writing code in a programming language, you will use something that looks more like natural language, which is processed by GenAI. In the meantime, GenAI will act more like a powerful coding assistant that helps you understand and write code. As always, you’ll give up some control, but in exchange, programming will become accessible to a larger pool of developers, and you will see gains in productivity.

There are already some claims out there that GenAI can increase productivity, such as a 2022 study by GitHub of 95 developers that claims the developers who used GitHub Copilot got tasks done 55% faster than those who didn’t. That said, studies from a company that makes money from GenAI products should always be taken with a grain of salt. Other research has been less promising. For example, a 2024 study by Uplevel of over 800 developers found GitHub Copilot did not produce any significant gains in productivity—but it did result in 41% more bugs; a 2024 survey by Upwork of over 2,500 contractors found that 77% of them believed that GenAI didn’t make them more productive—it solely added to their workload. I think the reality is that these tools are brand new, we’re still learning how to use them, and we don’t have nearly enough experience or research to know what the impact will be.

That said, GenAI has improved rapidly the last few years, and will only get better, so I’d fully expect some productivity gains—even if only as a better version of auto complete. If it keeps getting better, that raises an important question: what role will GenAI play in DevOps and software delivery? The GenAI DevOps tools I’ve seen so far are coding assistants that can generate infrastructure code: e.g., general-purpose coding assistants such as GitHub Copilot, or assistants designed specifically for infrastructure code, such as Pulumi AI.

My prediction: coding assistants will be of limited use for DevOps and software delivery. That’s because with DevOps and software delivery, the most important priorities are typically security, reliability, repeatability, and resiliency. Unfortunately, these are precisely the areas where GenAI tends to be weak. In particular, GenAI and LLMs typically struggle with the following:

Hallucinations: GenAI is known for hallucinations, where it sometimes makes up answers that look convincing, but are actually wrong or entirely fake. As Ian Coldwater put it, GenAI is sometimes just "mansplaining as a service." I’ve seen GenAI not only make up answers, but also make up fake citations (i.e., pointing to publications that don’t exist), and generate code that uses APIs that don’t exist.
Inconsistency: With GenAI, nearly identical prompts can produce different responses. Toss in a slightly different keyword here or there, and instead of getting back working, secure code, you might get broken code with major vulnerabilities. To get good results, you have to know how to prompt it just right, and a prompt that works today may not work tomorrow when the model is updated.
Not citing sources: GenAI builds models from vast quantities of data, but none of the models I’ve seen today are able to retain references back to where the knowledge came from. So you get a single, definitive response, but you have no idea of the source. Was this generated code based on something written by an expert, or a novice? Was it designed for a secure, compliant environment, or just something someone threw together for a side project? Was it battle-tested in production over many years, or something completely new and unproven? This is markedly different from using code from open source or StackOverflow, where you can compare many different options using a variety of signals of quality and trustworthiness (e.g., who is the author, who else is using this, stars, up votes, comments, etc).

Many of these weaknesses may be inherent to the design of LLMs, so it’s not clear if these can be fixed. As a result, I’d be nervous about letting GenAI "generate" my infrastructure for me. It may be helpful to have it generate some of the boilerplate, but there’s no getting away from ramping up on DevOps and software delivery knowledge (e.g., by reading this blog post series!), and doing a lot of hard work yourself to ensure that the result is secure, reliable, repeatable, and resilient.

That said, one type of GenAI that may have a significant impact on DevOps and software delivery is retrieval-augmented generation (RAG), where you provide the GenAI model with additional data beyond its initial training set so that it can generate more relevant, up-to-date responses. In particular, if you provide the GenAI model with the state of your infrastructure, your metrics, logs, events, and so on, then RAG may help in the following ways:

Understanding your infrastructure: You may be able to use RAG to answer questions such as how is this Kubernetes cluster configured? Where do we store our Docker images? Are we meeting all SOC 2 requirements? This would be especially useful for ramping up new hires and for navigating large, complicated architectures.
Debugging: You may be able to use RAG to help investigate incidents, asking questions such as for which requests is latency unusually high over the last 24 hours? What deployments did we do over that time period? Which backend services are affected? As you saw in Part 10, observability tools allow you to ask these questions, so the main changes would be to (a) switch to a natural-language interface, so you can "chat" with your observability tool, instead of having to learn to use a query language, and (b) train the GenAI model specifically on DevOps and software delivery data, so it can intelligently answer questions and provide suggestions, rather than you having to figure everything out yourself.
Detecting patterns: GenAI may also be useful in proactively detecting patterns in your infrastructure. For example, if you feed it your network traffic data (e.g., access logs, VPC flow logs), it might be able to automatically detect a DDoS attack or a hacker trying to access your systems. If you feed it your logs and metrics, it might be able to detect problems early on, and alert you to them before they cause an outage.

We’re already starting to see GenAI integration across many security and monitoring tools: e.g., Honeycomb’s Query Assistant, Datadog Bits AI, and Snyk DeepCode AI (full list). But it still feels like early days to me.

I think all of these RAG features become even more powerful if the GenAI model has access to not only your company’s data, but also that of thousands of other companies. Think of how many companies hit nearly identical problems—e.g., expiring TLS certificates, security vulnerabilities in popular libraries, and so on—and how much faster an AI model could solve these problems if it had access to what actually worked (or didn’t) in thousands of similar situations. However, this requires that the model can distinguish between what information should be kept private to each company (i.e., you don’t want to accidentally leak proprietary data), and what can be shared between companies. It’s not clear if this can be reliably achieved with an LLM, but if it can, it could dramatically reduce the time it takes to resolve incidents, and possibly mitigate many security problems.

One of the other new trends that may mitigate security problems is to build systems that are secure by default, as discussed in the next section.

Secure by Default

In 1854, at an exhibition at the Crystal Palace in London, Elisha Otis stood on an elevator platform as it was hoisted to a height of more than 5 stories within an open elevator shaft, when suddenly, his assistant would cut the elevator cable, sending the platform plunging downwards—only to stop before it could drop more than a few inches. "All safe, gentlemen, all safe," he would then announce to the astounded audience. What Otis was demonstrating was the world’s first safe elevator, which was based on a clever design. The elevator shaft was lined with metal teeth, and on each side of the elevator, there were latches that stuck out into the shaft, catching onto the teeth, and preventing the elevator from moving. To allow the elevator to move, you had to pull the latches into the elevator, which you could only do by applying enough pressure via the elevator cable. If the cable snapped, the latches would pop right back out into the shaft, and prevent the elevator from moving.

Otis' clever design made elevators safe enough for daily use, opening the way to tall buildings, skyscrapers, and modern cities. The design was not only clever, but also exhibited a key engineering principle: it was safe by default. The default state of the elevator was that it couldn’t move, and the only thing that would allow it to move was an intact cable. Unfortunately, the current state of many DevOps and software delivery technologies is not safe by default. For example, in many of the tools and systems we use, the defaults are as follows:

Network communication and data storage are unencrypted.
All inbound and outbound network access is allowed.
3rd party dependencies are not validated, scanned, or kept updated and patched.
Secure passwords and MFA are not required, and SSO requires extra setup work.
There’s no monitoring or audit logging built-in.

Doing things securely almost always requires considerable extra effort. Even worse, many vendors charge extra for security features, relegating SSO, RBAC, audit logs, and other key functionality to expensive enterprise plans. Due to the extra time and resources required, many smaller companies can’t afford to create secure systems, and if they are lucky enough to grow into large companies, they then have to try to go back and add thin layers of security over an otherwise insecure foundation.

Fortunately, this is starting to change. The following are a few of the positive trends:

Shift left: The idea behind shift left is to move security testing earlier in the development cycle. This includes both static application security testing (SAST), where you perform static analysis on the code to look for vulnerabilities, and dynamic application security testing (DAST), where you look for vulnerabilities by simulating attacks on running code. There are many SAST tools designed for a single language or framework, such as FindBugs for Java and Kubesec for Kubernetes; there are also some SAST tools that work with multiple languages and frameworks, such as Snyk, SonarQube, and Wiz (full list). Some of the early tools in the DAST space include Zed Attack Proxy and Invicti (full list).
Supply chain security: A supply chain attack is one where a malicious actor manages to compromise a part of your supply chain, which in the software world refers to all the 3rd party software you rely on: e.g., open source libraries, SaaS tools, etc. In the modern world, the vast majority of code your company uses is not written by your own developers. As you may remember from Part 4, approximately 70% of the code a typical company deploys originates in open source. Hackers know about this, so supply chain attacks are becoming more common, such as the XZ Utils backdoor, where a hacker managed to sneak a backdoor into the xz library, which is present on almost all Linux systems, that would’ve allowed them to access any Linux system remotely. Some of the early attempts to improve supply chain security include using security scanning tools (e.g., the ones from the shift left section), code signing, SBOMs (software bill of materials), automatic updating and patching for all dependencies, and secret detection. Some of the early players in this space include Chainguard, Mend, and GitGuardian (full list).
Memory-safe languages: In the last few years, there has been a push to move away from older languages where you manage memory manually (e.g., C, C++), to more modern languages that provide memory safety (e.g., Rust or Go). This is because memory safety problems account for roughly 70% of all security vulnerabilities, so switching to memory-safe languages that make most of these vulnerabilities impossible will result in a dramatic reduction in vulnerabilities. The challenge, of course, is that this may require rewriting a tremendous amount of software, including entire operating systems and their toolsets, in new languages.
Zero-trust networking: We are gradually seeing a shift from the castle-and-moat model to the zero-trust networking model, where every connection is authenticated, authorized, and encrypted. See Part 7 for the details.
Principle of least privilege in app frameworks: In the past, most app frameworks gave apps immediate access to all the capabilities of the underlying platform, which was convenient, but meant that any vulnerability could do significant damage. Many newer app frameworks follow the principle of least privilege, granting no permissions by default, and requiring developers to explicitly request access to the exact functionality they need, such as access to the file system, networking, camera, microphone, location, and so on. This is the model used on most mobile apps (e.g., Android, iOS), and in newer frameworks for desktop apps and browser apps, such as Tauri and Deno (full list).

There are a lot of positive trends, but we still have a long way to go until we are secure by default. In fact, we not only need the default path to be secure, but just as importantly, the secure path must be the easy path. Security is not only a technical question, but also a question of economics and ergonomics. A technical design that’s secure in theory, but is too time-consuming to set up, or so complicated no one can understand it, is not actually secure in practice.

Every time a company forces its employees to rotate passwords too often, with rules that are too complicated (your password must include 1 letter, 1 number, 1 special character, 1 upper case character, 1 lower case character, 1 Greek letter, 1 emoji, 1 dingbat, 1 interpretive dance, a pinch of salt, a sprig of thyme), those employees just end up writing their passwords down on post-it notes, which ultimately makes everyone less secure. Therefore, we need tools that make the secure option not only the default, but also the easy option. Doing something insecure should be hard.

One way to end up with secure defaults is to build them into your company’s internal developer platform, which is the focus of the next section.

Platform Engineering

One thing I’ve noticed from having worked with hundreds of companies on their DevOps and software delivery practices is that virtually every single company ends up creating its own internal developer platform (IDP), which consists of a set of workflows, tools, and interfaces customized to meet that company’s needs. These are often layered on top of existing tools, and the bigger the company, the more layers they have. For example, at most large companies, it’s rare for developers to use the cloud (e.g., AWS), orchestration tool (e.g., Kubernetes), or web framework (e.g., Spring Boot) directly; instead, they access them through a layer of custom webpages, scripts, CLI tools, and libraries. This layer is the IDP.

Usually, IDPs are designed for use by app developers, giving them a way to quickly stamp out new apps and start iterating on top of the company’s preferred toolset and workflows. Think of it this way: every company wants to use a PaaS; it’s just that they want it to be their PaaS, customized to their specific needs. That’s the role IDPs are trying to fill. In the last few years, platform engineering has become a popular term for the discipline of building IDPs, and it’s sometimes pitched as the successor to DevOps, but the reality is that companies have been building IDPs for decades, long before even the term DevOps, let alone platform engineering, appeared on the scene. The new thing is the emergence of reusable open source and commercially-available IDP tools. Some of the early players in this space include Backstage, Humanitec, and OpsLevel (full list).

The dream for an IDP is to provide your developers a single, central place where they can create and manage an app and everything it needs. What does it need? The answer: just about everything you’ve learned about in this blog post series! You can essentially go blog post-by-blog post for what the IDP should provide, out-of-the-box:

Hosting: The IDP should configure whatever infrastructure your app needs in your company’s chosen hosting provider (e.g., on-prem, IaaS, PaaS).
Infrastructure as code: All the infrastructure should be managed as code using your company’s chosen IaC tools (e.g., Docker, Ansible, OpenTofu). That code should have reasonable defaults, so you don’t have to write any code when initially creating the app, but if you need to customize something, you can dig into the code to make changes.
Orchestration: Your app should be configured for your company’s chosen orchestration tool (e.g., Kubernetes, Lambda).
Version, build, test: The app should be set up for you in your company’s chosen version control system, with your company’s chosen build system, with a set of built-in automated tests that you update and evolve.
CI / CD: The app should be integrated into your company’s CI / CD pipeline, so you get automatic builds, tests, and deployments.
Environments: The app should be configured to deploy into your company’s environments (e.g., dev, stage, prod).
Networking: The app should be configured for both public networking (e.g., DNS) and private networking (e.g., VPC, service mesh).
Secure communication and storage: The app should be configured to encrypt all data at rest and in transit by default.
Data storage: The app is automatically configured with the data stores it needs (e.g., relational database, document store), schema migrations, and backup and recovery.
Monitoring: The app is automatically instrumented with your company’s chosen tools for logs, metrics, events, and alerts.

An IDP should be able to give you all of this with a few clicks, thereby ensuring that your app meets all of your company’s requirements in terms of security, compliance, scalability, availability, and so on. This allows the developer to focus on their app, while the IDP handles all the infrastructure details. Perhaps IDPs are how we get to an infrastructureless world (at least from the perspective of the app developers).

One of the big challenges with the IDP space is that an IDP isn’t a single tool, but a distillation of your company’s culture, requirements, preferences, and processes, which are different for every single company, and even teams within companies. I saw something similar years ago with applicant tracking systems (ATS), which are the tools companies use to manage the hiring process, including posting jobs, searching for candidates, accepting applications, processing the candidate through an interview pipeline, and so on. I’ve seen many ATS tools created over many decades, but there still doesn’t seem to be a single dominant player or approach.

That’s at least in part because recruiting and hiring are also a distillation of your company’s culture, requirements, preferences, and processes, which are different for every single company. As a result, any tool you build to try to capture all of this tends to end up in one of two extremes:

Highly opinionated: The tool is highly opinionated on the workflows and processes you should use, and if those fit your needs, it’s a great experience, but if not, you can’t use it at all.
Highly customizable: The tool is highly customizable, so it can support the workflows and processes of almost any company, but it’s so complicated to use, that everyone hates it.

Finding the right balance between these two is hard. Most of the complaints I hear about the current IDP tools are that they are either too complicated or they don’t do what you need. Perhaps one way to solve this dilemma is to provide an IDP that is opinionated by default, but customizable through code, as per the next section.

The Future of Infrastructure Code

In Part 2, you saw several different kinds of infrastructure as code tools, including configuration management tools (e.g., Ansible), server templating tools (e.g., Packer), and provisioning tools (e.g., OpenTofu). There are two interesting new types of tools that are starting to emerge that are worth paying attention to: interactive playbooks and infrastructure from code.

Interactive playbooks are playbooks that allow you to execute code, fetch data, render graphs, and so on, directly in the playbook, a bit like Jupyter Notebooks for DevOps. The advantage is that you stop thinking of a playbook as bunch of static, often out-of-date instructions in a wiki that you have to execute manually, and you start thinking of it as a live, custom, interactive UI that is designed for debugging and introspecting specific systems and problems. Some of the early players in this space include RunDeck and Runme (full list).

If you squint at it, you could even imagine a collection of playbooks as a way to implement an IDP. The combination of reusable "widgets" and the ability to assemble them into just the UI that meets your company’s needs may be a way to strike the right balance between being opinionated and customizable. Moreover, you can create interactive playbooks to manage your infrastructure code (e.g., a playbook that lets you interactively run OpenTofu or Ansible), and the playbooks themselves are sometimes defined as code, so this might be a glimpse into the future of IaC as well.

Infrastructure from code (IfC) is based on the idea that instead of writing infrastructure as code (IaC) to define the infrastructure you need, you can automatically infer the necessary infrastructure from your application code. For example, if you write a JavaScript app that responds to HTTP requests by looking up data in a relational database, an IfC tool can parse this code, and automatically figure out that to deploy this app in AWS, it will need to provision a Lambda function, a Lambda function URL, and an RDS database. Some of the early players in this space include Ampt and Nitric (full list).

One of the advantages of IfC is portability, as for any given app, the IfC tool could figure out how to provision the right infrastructure for any cloud (e.g., AWS, GCP, Azure). Another advantage is that you don’t have to think about your infrastructure at all; you just focus on your app, and the IfC tool takes care of all the infrastructure details for you. Perhaps IfC is another route to an infrastructureless world.

Conclusion

If you’ve made it to this point in the blog post series, you now know—and have tried out—all the basics of DevOps and software delivery. Refer back to the 9 steps in Section 1.4, and consider just how many of the tools and techniques you’ve now seen:

You’ve deployed apps using PaaS and IaaS; managed your infrastructure as code with Ansible, Docker, Packer, and OpenTofu; orchestrated apps with Kubernetes and Lambda; used GitHub for version control, NPM as a build system, and Jest for automated testing; set up CI / CD pipelines in GitHub Actions; broke up your deployments across multiple environments, libraries, and services; set up DNS in Route 53, a VPC in AWS, and a service mesh with Istio; secured your communications with TLS and your data with AES; deployed PostgreSQL with a read replica in RDS and managed schema migrations with Knex.js; and used CloudWatch for log aggregation, metrics, dashboards, and alerts.

Phew! That’s a lot. Give yourself a pat on the back. You’re done!

That said, I hope that one of the things you learned along the way is that software isn’t done when the code is working on your computer, or when someone gives you a "ship it" on a code review, or when you move a Jira ticket to the "done" column. Software is never done. It’s a living, breathing thing, and DevOps and software delivery are all about how to keep that thing alive and growing. So when I said you’re done, I wasn’t entirely honest with you. The truth is that you’re just getting started!

So this is the end of the blog post series, but the beginning of your journey. To learn more about DevOps and software delivery, see the recommended reading on the blog post series’s website to go deeper on the topics covered in each blog post. If you’ve got feedback or questions, I’d love to hear from you at jim@ybrikman.com. Thank you for reading!

Part 11. The Future of DevOps and Software Delivery

Infrastructureless

Generative AI

Secure by Default

Platform Engineering

The Future of Infrastructure Code

Conclusion

Platform

Services

Open Source

Resources

Company