If one of the buzzwords of 2016 is microservices then the other is DevOps, but from what I see, read and hear, many teams are doing it wrong. To really ‘get’ DevOps you need to understand the problem it was designed to solve, and this requires knowledge of Lean thinking.

Lean is the name of the methodology behind the Toyota Production System and is synonymous with the eradication of waste. With regards to software development waste manifests in the following ways…

  • Partially Complete Work
  • Superfluous Features
  • Relearning
  • Handover
  • Waiting
  • Task Switching
  • Rework / Defects

Now consider a typical approach to deploying and supporting software. This may sound crazy but it is the reality in many organisations…

Alex has built some software and needs it to be deployed to the staging environment. Because staging is identical to production, Alex must raise a ticket (handover) for a system admin to release it. This usually happens within thirty minutes (waiting), so rather than starting something new, Alex makes a coffee and catches up on email (task switching).

Unfortunately Alex gets side-tracked and so checks the deployment after forty-five minutes only to find that it still hasn’t been deployed. By leaning on a friend in operations the deployment gets fast-tracked but something in the configuration was wrong and it fails (defect). Alex however, was at lunch, and it’s an hour before the configuration is fixed and the software is ready for redeployment (waiting, rework). This time, everything goes smoothly and Alex advises the QA team the software is in staging ready to be tested (handover).

The above process repeats until the QA team are satisfied there are no issues. Meanwhile, Alex works on the release notes (handover). Two days later the QA team give Alex the green light and he creates another ticket (handover) requesting the software be released to production. Production releases must be approved during the weekly Change Advisory Board (CAB) meeting (waiting), and can only be performed out of hours (waiting).

Five days later the software is finally released to production, but contains another error and the release is rolled back (rework). Alex fixes the issue quickly and his manager negotiates a dispensation from the CAB, but they still need to wait until after hours before the software can be released to live again (waiting).

This is all too common. Over a week just to get a minor software update deployed to live, without considering the impact if another developer had released another change to production in the meantime, requiring Alex to merge, rebuild and retest.

The root cause of all this waste is fear of something going wrong, but the processes put in place to mitigate risk often have the opposite effect. By introducing delays and additional steps, and by playing control in the hands of people distanced from the work, we increase the probability of error.

Development teams are incentivised to change systems, operations teams are incentivised to keep them stable. The separation of responsibilities encourages developers to handover software without considering how easy it is to support and maintain. Being paged at 3am and taking the brunt of the pressure when critical applications fall over, encourages operations to introduce overly burdensome processes.

The solution is to make development, deployment and support the responsibility of a single team. Any member of the team who is capable, responsible and communicative (they discuss the changes they are going to make before they make them and notify the team after doing so), should be permitted access to any system that doesn’t contain sensitive data. To start with, not everyone will be capable, so part of the job entails training the engineers lacking experience and automating the processes to reduce the barrier to entry.

In practice you still need expert system administrators. As a developer who has been automating deployment pipelines for almost ten years I still lean heavily on them for most things hardware, operating system or network related, however, I am wary of engineers who specialise in DevOps. As the saying goes, when all you have is a hammer, everything looks like a nail. DevOps engineers are prone to build systems that only other DevOps engineers can maintain. We saw the same with build experts when continuous integration first became popular, and when configuration management tools went mainstream too. As with any other complex system, it should be composed from simple, observable parts.

At one of our clients, TES, we’ve helped automate and simplify the deployment pipeline to the extent that business users release minor changes to production without assistance. DevOps is more than a buzzword, but to do it well, it requires fundamental change to an organisation’s structure and deep understanding of the principles, as well as the tools.