DORA metrics – DevOps Research and Assessment

0
1462

The first DevOps assessments had started back in 2012. This is just under 9 years ago. Over the years, it’s become clear through the available research that certain things about DevOps are as true today as they were in the early days of the movement. DevOps still gives organizations a serious competitive advantage. Automation, collaboration and sharing are as important as ever. And the organizations doing DevOps well don’t have to make a trade-off between moving fast and keeping things stable and secure.

This intention – moving fast and keeping things stable and secure can be misunderstood. It does not mean pushing your developers to commit more times per day, overtime or risk-accepting all open security issues. Automation, collaboration and sharing, as mentioned, are the key factors here. Understanding them the right way, is the true spirit of DevOps (and not hiring DevOps engineers). The people, who worked closely in DevOps and wanted to document the right approach, had created an organization.

DevOps Research and Assessment – Over the years, have surveyed more than 50,000 technical professionals worldwide to better understand how the technical practices, cultural norms, and lean management practices we associate with DevOps affect IT performance and organisational performance.

What are then the DORA metrics, which the science behind State Of Devops had given birth to? I have listed them below. Before you read through them, note that:

Metrics tell the story, don’t think on how to make the metrics better, think which practice you might be missing.

  • Production Deploy Frequency – how big is your batch (the theory is, the less times you deploy, the bigger your batch, aka amount of stuff you release at once; the bigger the batch, the more potential problems it carriers and most likely the more time it takes to get to production)defined as: how often your organization deploys code to production.
  • Lead Time – amount of blockers for devops to get stuff to production (the longer, the bigger merge requests, more process issues, tech debt, conflicting requirements, rework and unexpected work)defined as: how long it takes for a code commit to be deployed to production.
  • Mean Time To Recover – how fast you can recover (but also tells you how much time it takes you to tell there is an issue in the first place)defined as: how long it takes to restore the service after a service incident occurred.
  • Change Fail Rate – what is your quality focus (the higher, the more questionable your quality practices)defined as: what percentage of changes result either in degraded service or subsequently require remediation (e.g. leads to impairment or outage, requires hotfix, rollback, fix forward).

In the end it follows the rules of lean:

  • Value is only released to production, once it leaves the factory floor (production release frequency)
  • Pay attention and improve daily to have as little unexpected work and rework as possible (change fail rate)
  • Optimize your bottlenecks to have the right amount of Work In Progress (lead time)
  • Invest in automation, scalability, observability, security and other practices that help to keep your services running seamlessly (mean time to recover)

In terms of how you can implement the measurements. Here are some ideas I have found useful:

  • Production Deploy Frequency – best if automated via CI/CD tools like jenkins / circleci. However I have also seen integration with service now or JIRA. All that matters is to record each production deployment.
  • Lead Time – any code versioning tool, with conjunction with a CI/CD tool. You can configure jenkins, gitlab, circle ci and others with github, svn, etc. Then you just measure the time between a commit was created and the same commit went live via ci/cd. Worst case you can even use excel and measure these things manually.
  • Mean Time To Recover – I really liked blameless in terms of integrating all recovery procedures. However multiple other tools like service now and JIRA provide the capability to track and record time of outages.
  • Change Fail Rate – the best idea I have seen were correlated with automatic tracking of google four golden signals after a change was deployed. However number of redeploys can be retrieved from CI/CD tooling or even recorded manually in tools like JIRA or service now.

Whether you decide on automating early or just start collecting the metrics manually, remember to make the metrics really visible. Find a senior sponsor that would continuously remind others what metrics say should be continuously discussed.

This is all I have prepared for today. I hope you will find it helpful and interesting enough to start discussing this approach within your organization.

The key message you need to remember from this article is: if you can’t measure it, you won’t be able to improve it.

Think about the metrics by which your result will be judged, and make a resolution to live by them every day so that in the end, your result will be judged a success.

To summarize:

My personal choice are DORA metrics. DORA metrics can help you make important decisions to improve the software development and delivery processes. Further investigations can help in finding any bottlenecks in the process. With these findings, organizations can make informed adjustments in their process workflows, automation, team composition, tools, and more.

Video On DORA metrics – DevOps Research and Assessment