The Strategic Reasons for DevSecOps
I have scars from spending decades in the trenches delivering enterprise systems, mostly with dramatic installations and sleepless nights. You've not lived until you've experienced phantom pager syndrome, where you feel it vibrating even when you don't have it. Or the horror of forgetting it was your week to carry the pager until a teammate handed it to you. I've also experienced the benefits of fixing the causes of that pain by moving to continuous delivery. Along the way, I had several "ah-ha" moments and rediscovered the joy of delivering software.
I learned that relentlessly shrinking batch size and accelerating delivery frequency was the most effective way to expose problems so we could fix them.
The big "ah-ha" for me was when I understood that CD isn't about delivering features faster; it's about mission readiness and the ability to respond to real-world events.
A team that uses CD will have the muscle memory and refined processes to react safely to the unexpected. They can take advantage of opportunities that less evolved teams cannot. Having teams that operate this way is a strategic advantage for any competitive organization. However, those teams can only function this way within the broader operating model of DevSecOps because that provides the cultural and mission alignment required for multiple teams to move harmoniously.
Note: DevOps and DevSecOps are the same thing, and I use them interchangeably. DevSecOps was coined to emphasize that security is important to DevOps, but that's always been true because it's about the efficient delivery of value, and insecure systems aren't valuable.
That's Not DevOps
No conversation about DevOps is helpful until we define our terms. Unfortunately, when something comes along that makes things better but also requires skill and effort to implement, someone will try to package and sell it as something that takes less effort and skill while also neutering it. DevOps is no different.
It's not:
- a "DevOps team" staffed with "DevOps Engineers" who configure pipelines for other teams. That creates another handoff with wait times and delays and is the antithesis of DevOps.
- an application support team that eats the pain of crappy development. DevOps advocates "you build it, you run it."
- a release management team. The team doing the work should be designing for release.
- a platform team. Self-service platforms are critical but only a small part of DevOps.
So, what is it? It depends on whom you ask. If you want a deep dive, you can check out this free course by the Linux Foundation. Comparing it to the Toyota Production System isn't inaccurate. In The Phoenix Project, Gene Kim describes three core behaviors: systems thinking, amplifying feedback loops, and continuous improvement. Those are also very important.
In 2015, Donovan Brown from Microsoft coined the most comprehensive description I've found:
DevOps is the union of people, process, and products to enable the continuous delivery of value to our end users.
DevOps is how we organize workflow and team structure, measure effectiveness, improve efficiency, amplify feedback, and approach our improvement process.
CD is Not For Cat Videos
One of my favorite pushbacks is, "We are doing something important, not social media or e-commerce." They don't understand risk. The more critical something is, the more important it is that we have a safe, fast process to respond to any incident at any time.
CD Saving Lives
In February 2022, Starlink was activated in Ukraine to maintain battlefield communications. Immediately afterward, Russia began efforts to jam Starlink in the area. Within hours of being jammed, Starlink deployed a software update to bypass the jamming. This didn't require heroics and hacky workarounds. It only needed SpaceX to use their standard way of working.
Fighting the Current War, Not the Last One
Historically, the US Navy is the most conservative branch of the US military. Ships are complex systems, and the ocean is not a friendly operating environment, even when not in a war zone. Hardware and software improvements to Navy ships have been treated the same way. Every 3 years, the ship spends 6 to 12 months in a maintenance cycle. All changes are made during that time.
Today, with programs like Aegis Speed to Capability, ship systems can be updated over the air in a combat zone. This has become a critical capability when faced with evolving threats such as the missile attacks in the Persian Gulf.
Today, the Navy can analyze the data from new threats to define improvements, develop the updates, and release them within days or weeks instead of years. Amplifying feedback and building the ability to respond is saving lives.
You Can't Buy DevSecOps
It's tempting to believe anyone with deep enough pockets can buy the ability to respond at the speed of need, and many people try. However, remember that "products" account for only 33% of the DevOps definition and an even smaller percentage of the problem.
It's How You Use The Tools
Military history can teach us quite a bit, not least that people often matter more than the tools at their disposal.
In 1941, Japan declared war on the US. The US Navy entered the war with its state-of-the-art carrier-based fighter, the F4F-3 Wildcat, to counter Japan's newest naval fighter, the A6M Zero. On paper, the Japanese had all the advantages in air superiority at sea.
IJN - A6M | USN - F4F | |
---|---|---|
Range | ✅ | |
Speed | ✅ | |
Climb Rate | ✅ | |
Turn Rate | ✅ | |
Acceleration | ✅ | |
Dive Rate | ✅ |
The Wildcat was faster in a dive, so if it had altitude and enough distance, it could run away.
In addition, the pre-war IJN's pilot training was longer and more intense than the USN's, and Japan had been at war since 1937, so it had more experienced pilots.
The Japanese advantages resulted in a brutal kill ratio of three US planes lost for every Japanese plane.
Lieutenant Commander Jimmy Thatch, commander of the USS Saratoga's fighter wing, found a way to work around the weaknesses of his tools: the Thatch Weave.
With the Thatch Weave, US pilots used disciplined teamwork to counter the Japanese aircraft's advantages and even the odds from 3:1 to 1:1. When more advanced US fighters made it to the fleet, the better tools only enhanced the skills they'd learned, ultimately leading to more than 6:1 in the US's favor.
Achieving that level of discipline was possible because every squadron was a cohesive unit with a defined mission that practiced for that mission constantly. They were not a jumble of pilots thrown together just before a fight.
A proper development team isn't a group of coders thrown together to deliver a project—it's a highly cohesive team of mission specialists who know each other's strengths and weaknesses and how to work together to maximize the former and help each other with the latter. They own the problems, the solutions, and the outcomes. They work daily, delivering features to refine their development and quality processes. This establishes the habits that allow them to respond when needed without drama.
Responding to Failure
A few years ago, I had the opportunity to visit the Thunderbirds at Nellis AFB to photograph a training sortie.
I'd heard that the Thunderbirds trained like every day was show day, and I was not disappointed. They performed the same duties they would for an airshow when only launching two planes for training. Since only a subset of the team was required to launch two planes, why were they all there? Teamwork and muscle memory.
Notice the names on the plane. Those are the enlisted airmen who are responsible for that plane. It's their plane, not the pilot's. While performing pre-flight on Thunderbird 1, something went wrong. There was no panicking, just troubleshooting followed by the decision to down-check the plane and use the backup.
They warmed up Thunderbird 6 and continued the mission. Their Mean Time To Restore (MTTR) was about 10 minutes because they practice for the unexpected.
Building the Strategic Capability
We must build the people and the process to leverage the products properly. There's a well-known quote from Gene Kim in the DevOps community: "Improving daily work is more important than doing daily work." That's true. When we treat daily work as an opportunity to improve our ability to respond, not just to get things done, we build systems and teams that are resilient to the unexpected. We don't rely on theory or hope but on discipline and practice. The most effective way I've found to build those teams is to solve the problems that prevent continuous delivery.
Architect for Teamwork
First, we need the right team structures. It's not uncommon for teams to be organized around delivering features, where multiple teams share a backlog of features and work on the same code. This is tempting to do for maximizing team utilization, but it destroys any sense of ownership, harms quality, and harms the system architecture.
Teams should be organized around business capabilities, with each team having sole ownership of a portion of the system while being loosely coupled to other portions of the system. This may require evolving the system architecture along with realigning teams. I've been on that journey, and it's not something a legacy system can achieve overnight. However, if we have a critical system that must respond quickly, we cannot do it if we are organized around large, slow changes or teams that are tightly coupled.
Use CI to Teach CI
Once we have our teams in place, we can build the habits.
CI is defined as developers merging code changes very frequently into a shared repository, typically several times a day per developer. Each integration is automatically verified by building the application and running automated tests to detect integration errors as quickly as possible.
Executing a CI workflow requires both individual skill and teamwork. The fastest way to grow both is to fail at CI and fix what doesn't work. There are no valid "reasons" a team cannot use CI; there are only excuses or problems to be fixed. Focusing our improvement efforts on "why can't we CI?" effectively exposes issues with process and training. The most common issues are:
- Vague acceptance criteria with no testable outcomes
- Lack of testing knowledge
- Lack of experience with decomposing work
- Lack of understanding of evolutionary development
- Work habits and incentives that focus on individual output rather than team goals.
As we find and resolve these problems, we improve the efficiency and effectiveness of our work and grow as individuals and as a team. The next step is fixing organizational speed bumps by asking, "Why can't we deliver today's work today?"
Fix the Surrounding Environment
Many people view software development as a linear process, involving the definition of requirements, coding, testing, accreditation, and deployment. Classically, each of these steps is done by a different functional silo, and the entire process can take weeks or months. In the meantime, the people who need a solution must wait, and feedback on fitness for purpose is nonexistent, since that only comes from the end users.
Reality is far messier than this clean, linear flow of work. The first issue is that we aren't "done" when we deliver. There's still operations to consider. We are "done" when we delete the code. In addition, the flow of work through those silos results in misunderstandings, long wait times, misaligned goals, rework, frustration, and sometimes even abandonment of the effort because the solution is no longer relevant.
Optimize For Reaction, Not Projects
Instead of designing our processes for the eventual delivery of large batches of features, we need to design them to respond safely to an emergency. Being able to deploy quickly isn't enough. We need exactly one way to deliver any kind of change, so we never bypass our controls. By applying the offsetting forces of "respond quickly" and "do it safely," we are forced to confront the problems in our current validation processes that may not have felt like problems before.
Do we require external approvals, manual testing, security reviews, or < SHUDDER > a release train to deploy a feature? Will we do these while responding to a critical incident? If not, we need to fix our quality process.
-
If a process exists to make people feel better, eliminate it. CAB is a good example. It gives people the illusion of control and risk reduction while encouraging larger batches of work, which in turn leads to increased risk.
-
Are there compliance rules or checklists? Automate them. My favorite example is "no one person should be able to change production." We can automate this control and prevent it from happening instead of reporting it in an audit later. Prevention is always better than detection.
-
Some tests, such as exploratory tests and usability tests, require humans to execute. Move them from the delivery critical path and do them continuously. Identify every validation that must happen to certify that an artifact is deliverable and automate it in the pipeline. Then, instead of validating artifacts, validate that the pipeline emits compliant artifacts.
Production-like Test Environments
We should always test under conditions that closely mirror production. To do this effectively, we need test environments that are repeatable, ephemeral, and on-demand, while still adhering to our security and compliance standards. Ideally, these environments are built from immutable, version-controlled artifacts, ensuring consistency and traceability across deployments.
Tools like Defense Unicorns' Unicorn Delivery Service (UDS) make this process seamless by automating the creation of such environments. By leveraging solutions like UDS, teams can spin up production-like environments at will, minimizing configuration drift and eliminating surprises during deployment. This approach not only enhances test fidelity but also supports faster, more confident releases by ensuring that "it works on my machine" becomes "it works in production."
Continuous Feedback and Improvement
Things will go wrong. Instead of pretending we can eliminate all defects before delivery, we focus on building confidence through rapid, incremental changes and developing a resilient system that can detect, respond to, and learn from failure quickly.
CD practices emphasize frequent, small, and reversible changes. When something breaks, it does it in a controlled and understandable manner. This minimizes blast radius and makes root cause analysis faster and less disruptive. The goal isn't zero defects — it's to reduce the time and cost of addressing them.
Every failure is an opportunity to improve. We treat incidents as signals, using them to strengthen our pipelines, infrastructure, and processes. These are not just technical fixes but systemic reinforcements that increase delivery confidence over time.
It's critical we do this without compromising how quickly we can deliver. CD relies on tight feedback loops, whether through automated tests, canary releases, or production monitoring. Instead of slowing down to achieve illusory perfection, we embrace rapid iteration as a path to both speed and stability. In this way, Continuous Delivery isn't just about shipping software frequently, it's about creating a culture of continuous learning, adaptation, and resilience. That's the essence of DevOps.
It's Too Late To Start After It's Needed
DevOps isn't some trendy buzzword or a new role. It's how we optimize our work to align everyone required to deliver our mission goals. It's how we improve the flow of communication and put guardrails in place to reduce error. It's the operational environment needed to reduce risk and respond at the speed of threat.
Continuous delivery is the daily practice we use to train and validate that ability when it's not an emergency so we have the ability when it matters.
We don't rise to the level of our expectations; we fall to the level of our training.
— Archilochus, 650 BC