The hardest part when implementing DevOps is a shift in conversations with management. Management is used to asking the following questions:
- How much will it cost?
- How much will we earn from it?
From a management perspective, these are reasonable questions. But in a DevOps world, they can be toxic and can lead to a large amount of planning upfront if they are answered at the wrong time and in the wrong way. In this chapter, I'll show you metrics that can shift discussions with management away from efforts toward general engineering velocity and developer productivity.
I'll explain how to measure engineering velocity and developer productivity and how to make your DevOps acceleration measurable.
The following topics will be covered in this chapter:
- Why accelerate?
- Engineering velocity
- High-performance companies
- Measuring metrics that matter
- The SPACE) framework for developer productivity
- Objectives and key results
The expected lifespan of companies is decreasing rapidly. According to Richard Foster from the Yale School of Management, the average lifespan of a Standard & Poor's (S&P) 500-listed company 100 years ago was 67 years. Today, it is 15 years. Every 2 weeks, an S&P-listed company goes out of the market, and by 2027, it is expected that 75% of the top 500 companies will be replaced by new companies. Another study from the Santa Fe Institute (The Mortality of Companies) concludes that the average lifespan of a United States (US) company across all industries is about 10 years.
To remain competitive, companies must not only solve a customer problem; they also need to deliver products and services that delight their customers, and they must be able to engage with the market and respond quickly to changing demands. Time to market is the most important driver for business agility.
Software is at the heart of every product and service in every industry, not only because the digital experience has become as important as (or maybe even more important than) the physical experience. Software touches every part of a product life cycle, for example:
- Supply chain management
- Cost optimization/predictive maintenance/robotics
- Product individualization (lot size 1)
- Sales, after-sales, and service:
- Customer service and support
- Social media
- Digital assistant
- Digital product:
- Companion app
- Mobile experience
- New business models (pay-by-use, rent, and so on)
These are just examples to illustrate that most interactions your customers have with your company are digital. You do not just buy a car today—you are already aware of the brand from social media and the press. You buy and configure a car on a website or in a store with a salesperson, but also by looking at the screen of a tablet. The price of the car is influenced by the optimization of your assembly line by robotics and artificial intelligence (AI). The first thing you do with the car is to connect your phone. While driving you listen to music, make a phone call, or respond to a text message using your voice. The driving assistant keeps you safe by braking for you if something is in your way and by making sure you stay in your lane; and soon, cars will do most of the driving autonomously. If you have a problem with a car or an app, the chances that you'll use the app or email to contact after-sales are high, especially for the younger generations. A car is mainly a digital product. Not only are there millions of lines of code that run in a car, but there are also millions of lines of code that power cars' apps, websites, and the assembly line, (see Figure 1.1).
The good thing is that software can be changed much faster than hardware can. To accelerate your time to market and your business agility, software is the key driver. It is much more flexible than hardware components and can be changed in days or weeks, not months or years. It also allows a much better connection to your customers. A customer that is using your app is more likely to respond to a survey than one in a physical shop. Also, hardware does not provide you with telemetry of how your products are being used.
To be one of the companies that stay in business for longer than 10 years, your company must leverage the power of software to accelerate its market response and delight customers with a great digital experience.
How does your company measure developer velocity? The most common approach is effort. There used to be some companies that used metrics such as lines of code or code test coverage, but those are obviously bad choices, and I'm not aware of any company today that still does this. If you can solve a problem in one line of code or in 100 lines of code, one line is obviously preferable since every line comes with a maintenance cost. The same goes for code test coverage. The coverage itself says nothing about the quality of the tests, and bad tests also introduce additional maintenance costs.
I try to keep the wording agnostic to the development method. I've seen teams adopt DevOps practices that use Agile, Scrum, Scaled Agile Framework (SAFe), and Kanban, but also Waterfall. But every system has its own terminology, and I try to keep it as neutral as possible. I talk about requirements and not user stories or product backlog items, for example, but most of the examples I use are based upon Scrum.
The most common approach to measure developer velocity is by estimating requirements. You break down your requirements into small items—such as user stories —and the product owner assigns a business value. The development team then estimates the story and assigns a value for its effort. It doesn't matter if you use story points, hours, days, or any other number. It's basically a representation of the effort that is required to deliver the requirement.
Measuring velocity with effort
Measuring velocity with estimated effort and business value can have side effects if you report the numbers to management. There is some kind of observer effect: people try to improve the numbers. In the case of effort and business value, that's easy—you can just assign bigger numbers to the stories. And this is what normally happens, especially if you compare the numbers across teams: developers will assign bigger numbers to the stories, and product owners will assign bigger business value.
While this is not optimal for measuring developer velocity, it also does no big harm if the estimation is done in the normal conversation between the team and the product owner. But if the estimation is done outside your normal development process, estimates can even be toxic and have very negative side effects.
The search for the answer to the question How much will it cost? for a bigger feature or initiative normally leads to an estimation outside the normal development process and before a decision to implement it. But how do we estimate a complex feature and initiative?
Everything we do in software development is new. If you had done it already, you could use the software instead of writing it anew, so even a complete rewrite of an existing module is still new as it uses a new architecture or new frameworks. Something that has never been done before can only be estimated to a limited certainty. It's guessing, and the larger the complexity, the bigger the cone of uncertainty (see Figure 1.2).
The cone of uncertainty is used in project management and its premise is that at the beginning of a project, cost estimation has a certain degree of uncertainty that then is reduced due to rolling planning until it is zero at the end of the project. The x axis is normally the time taken, but it can also relate to complexity and abstraction: the more abstract and complex a requirement is, the bigger the uncertainty in estimation.
To better estimate complex features or initiatives, these are broken down into smaller parts that can better be estimated. You also need to come up with a solutions architecture as part of the work breakdown. Since this is done outside the normal development process and in time upfront and outside the context, it has some unwanted side effects, as outlined here:
- Normally, the entire team is not present. This leads to less diversity, less communication, and therefore less creativity when it comes to problem-solving.
- The focus is on finding problems. The more problems you can detect beforehand, the more accurate your estimates probably are. In particular, if you treat estimates later to measure performance, people learn fast that they can buy more time if they find more problems and can therefore add higher estimates to the requirements.
- If in doubt, the engineers who are assigned with the task of estimation take the more complex solution. If, for example, they are not sure if they can solve a problem with an existing framework, they might consider writing their own solution to be on the safe side.
If these numbers were only used by management to decide upon the implementation of a feature, it would not do that much harm. But normally, the requirements—including the estimates and the solution architecture—are not thrown away and later are used to implement features. In this case, there is also a less creative solution visible that is optimized for problems and not for solutions. This inevitably leads to less creativity and outside-the-box thinking when implementing features.
Estimates are not bad. They can be valuable if they take place at the right time. If the development team and the product owner discuss the next stories, estimates can help to drive the conversation. If the team plays, for example, planning poker to estimate user stories and the estimates differ, this is an indication that people have different ideas on how to implement it. This can lead to valuable discussion and may be more productive, as you can skip some stories with a common understanding. This is also true for the business value. If the team does not understand why the product owner assigns a very high or very low number, this can also lead to important discussions. Maybe the team already knows a way how to achieve a successful outcome, or there are discrepancies in the perception of different personas.
But many teams feel more comfortable without estimating the requirements at all. This is often referred to under the hashtag #noestimates. Especially in highly experimental environments, estimation is often considered a waste of time. Remote and distributed teams also often prefer not to estimate. They often take discussions from in-person meetings to discussions on issues and pull requests (PRs). This also helps when documenting the discussions and helps teams to work in a more asynchronous way, which can help to bridge different time zones.
With developer velocity off the table, teams should be allowed to decide on their own if they want to estimate or not. This also might change over time. Some teams gain value from this, while some do not. Let teams decide what works for them and what doesn't work.
The correct way to estimate high-level initiatives
So, what is the best way to estimate more complex features or initiatives so that the product owner can decide if these are worth implementing? Get the entire team together and ask the following question: Can this be delivered in days, weeks, or months? Another option is to use an analogy estimation and compare the initiative to something that has already been delivered. The question is, then: Is this initiative smaller, equal, or more complex than the previous one delivered?
The most important thing is not to break the requirements down or to already lay out a solution architecture—what is important is just the gut feeling of all engineers. Then, have everyone assign a minimum and a maximum number for the unit. For the analogy estimation, use percentages relative to the original initiative and calculate the results using historical data.
The easiest way to report this would look like this:
Given the current team,
if we prioritize the initiative <initiative name>,
the team is confident to deliver the feature in between <smallest minimum> and <highest maximum>
Taking the smallest minimum and the highest maximum value is the safest way, but it can also lead to distorted numbers if the pessimistic and optimistic estimates are far apart. In this case, the average might be the better number to take, as illustrated here:
Given the current team,
if we prioritize the initiative <initiative name>,
the team is confident to deliver the feature in between <average minimum> and <average maximum>
But taking the average (the arithmetic mean; in Excel,
=AVERAGE() is used for this) means having a higher or lower deviation, depending on the distribution of the single estimates. The higher the deviation, the less confident you really can be that you can deliver that feature in that period. To get an idea of how your estimates are distributed, you can calculate the standard deviation (
=STDEV.P() in Excel). You can look at the deviation for the minimum and the maximum, but also the estimate of each member. The smaller the deviation, the closer the values are to the average. Since standard deviations are absolute values, they cannot be compared with other estimations. To have a relative number, you can use the coefficient of variation (CV): the standard deviation divided by the average, typically represented as a percentage (
=STDEV.P() / AVERAGE() in Excel). The higher the value, the more distributed the values from the average; the lower the value, the more confident each team member is with their estimates or the entire team is with regard to minimum and maximum. See the example in the following table:
To express uncertainty in the deviation of the values, you can add a confidence level to the estimation. This can be text (such as
high) or a percentage level, as illustrated here:
Given the current team,
if we prioritize the initiative <initiative name>,
the team is <confident level> confident to deliver the feature in <arithmetic mean>
I don't use a fixed formula here because this would involve knowing the team. If you look at the data in the example (Table 1.1), you can see that the average of the minimum (2,7) and the maximum (6,3) are not so far away. If you look at the individual team members, you can see that there are more pessimistic and optimistic members. If past estimations confirm this, it gives you very high confidence that the average is realistic, even if the minimum and maximum values have a pretty high CV. Your estimate could look like this:
Given the current team,
if we prioritize the initiative fancy-new-thing,
the team is 85% confident to deliver the feature in 4.5 months"
This kind of estimation is not rocket science. It has nothing to do with complex estimation and forecasting systems such as the three-point estimation technique (https://en.wikipedia.org/wiki/Three-point_estimation), PERT distribution (https://en.wikipedia.org/wiki/PERT_distribution), or the Monte Carlo simulation method (https://en.wikipedia.org/wiki/Monte_Carlo_method), and they all depend upon a detailed breakdown of the requirements and an estimation on a task (work) level. The idea is to avoid planning upfront and breaking down the requirements and relying more on the gut feeling of your engineering team. The technique here is just to give you some insights into the data points you collect across your team. It's still just guessing.
From developer to engineering velocity
Effort is not a good metric for measuring developer velocity, especially if it is based upon estimates, and in cross-functional teams, velocity does not only depend upon developers. So, how do you shift from a developer velocity to an engineering velocity?
The Developer Velocity Index
In April 2020, McKinsey published their research about the Developer Velocity Index (DVI) (Srivastava S., Trehan K., Wagle D. & Wang J. (2020)). This is a study taken among 440 large organizations from 12 industries that considers 46 drivers across 13 capabilities. The drivers are not only engineering capabilities—they also contain working practices and organizational enablement such as the company culture. The study shows that the companies in the top quartile of the DVI outperform other companies in their market by four to five times, and not only on overall business performance. Companies in the top quartile score between 40 and 60% higher in the following areas:
- Customer satisfaction
- Brand perception
- Talent management
The study conducted interviews with more than 100 senior engineering leaders at 440 large organizations across 12 industries. The interview contained 46 drivers across 13 capabilities in 3 categories, outlined as follows:
- Technology: Architecture; infrastructure and cloud adoption; testing; tools
- Working practices: Engineering practices; security and compliance; open source adoption, agile team practices
- Organizational enablement: Team characteristics; product management; organizational agility; culture; talent management
The DVI, therefore, goes way beyond pure developer velocity. It analyzes the engineering velocity and all the factors that influence it and relates them to business outcomes such as revenue, shareholder returns, operating margin, and nonfinancial performance indicators such as innovation, customer satisfaction, and brand perception.
The state of DevOps
The findings align with the results from the DevOps Research and Assessment (DORA) State of DevOps report (https://www.devops-research.com/research.html#reports) but take them one step further by adding the business outcomes. The DevOps Report 2019 states how elite performers compare against low performers (Forsgren N., Smith D., Humble J. & Frazelle J. (2019)), as outlined here:
- Faster value delivery: They have a 106-times faster lead time (LT) from commit to deploy.
- Advanced stability and quality: They recover 2,604 times faster from incidents and have a 7-times lower change failure rate (CFR).
- Higher throughput: They do 208 times more frequent code deployments.
High-performance companies not only excel in throughput and stability but are also more innovative, have higher customer satisfaction, and greater business performance, (see Figure 1.3).
Focusing on the measures that highlight the capabilities that set apart high-performance companies from medium and low performers, you can make your transformation visible and provide management with metrics that hopefully matter more to them than lines of code or estimation-based velocity.
Measuring metrics that matter
- Delivery performance metrics:
- Delivery lead time
- Deployment frequency
- Stability metrics:
- Mean time to restore
- Change fail rate
Delivery lead time
The delivery lead time (DLT) is the time from when your engineers start working on a feature until the feature is available to the end users. You could say from code commit to production—but you normally start the clock when the team starts to work on a requirement and changes the state of it to doing or something similar.
It is not easy to get this metric automated from the system. I will show you in Chapter 7, Running Your Workflows, how you can use GitHub Actions and Projects together to automate the metric. If you don't get the metric out of the system, you can set up a survey with the following options:
- Less than 1 hour
- Less than 1 day
- Less than 1 week
- Less than 1 month
- Less than 6 months
- More than 6 months
Depending on where you are on the scale, you conduct the survey more or less often. Of course, system-generated values would be preferable, but if you are on the upper steps of that scale (months), it doesn't matter. It gets more interesting if you measure hours or days.
Why not lead time?
From a Lean management perspective, the LT would be the better metric: how long does a learning from customer feedback flow through the entire system? But requirements in software engineering are difficult. Normally, a lot of steps are involved before the actual engineering work begins. The outcome could vary a lot and the metric is hard to guess if you must rely on survey data. Some requirements could stay for months in the queue—some, only a few hours. From an engineering perspective, it's much better to focus on DLT. You will learn more about LT in Chapter 18, Lean Product Development and Lean Startup.
The deployment frequency focuses on speed. How long does it take to deliver your changes? A metric that focuses more on throughput is the DF. How often do you deploy your changes to production? The DF indicates your batch size. In Lean manufacturing, it is desirable to reduce the batch size. A higher DF would indicate a smaller batch size.
At first glance, it looks easy to measure DF in your system. But at a closer look, how many of your deployments really make it to production? In Chapter 7, Running Your Workflows, I will explain how you can capture the metric using GitHub Actions.
If you can't measure the metric yet, you can also use a survey. Use the following options:
- On-demand (multiple times per day)
- Between once per hour and once per day
- Between once per day and once per week
- Between once per week and once per month
- Between once per month and once every 6 months
- Less than every 6 months
Mean time to restore
A good measure for stability is the mean time to restore (MTTR). This measures how long it takes to restore your product or service if you have an outage. If you measure your uptime, it is basically the time span in which your service is not available. To measure your uptime, you can use a smoke test—for example, in Application Insights (see https://docs.microsoft.com/en-us/azure/azure-monitor/app/monitor-web-app-availability). If your application is installed on client machines and not accessible, it's more complicated. Often, you can fall back on the time for a specific ticket type in your helpdesk system.
- Less than 1 hour
- Less than 1 day
- Less than 1 week
- Less than 1 month
- Less than 6 months
- More than 6 months
But this should only be the last resort. The MTTR should be a metric you should easily get out of your systems.
Change fail rate
As with DLT for performance, MTTR is the metric for time when it comes to stability. The pendant of DF that focuses on throughput is the change fail rate (CFR). For the question How many of your deployments cause a failure in production?, the CFR is specified as a percentage. To decide which of your deployments count toward this metric, you should use the same definition as for the DF.
The Four Keys dashboard
These four metrics based upon the DORA research are a great way to measure where you are on your DevOps journey. They are a good starting point to change your conversations with management. Put them on a dashboard and be proud of them. And don't worry if you're not yet an elite performer—the important thing is to be on the journey and to improve continuously.
It's very simple to start with survey-based values. But if you want to use automatically generated system data you can use the Four Keys Project to display the data in a nice dashboard, (see Figure 1.4).
The project is open source and based upon Google Cloud (see https://github.com/GoogleCloudPlatform/fourkeys), but it depends on webhooks to get the data from your tools. You will learn in Chapter 7, Running Your Workflows, how to use webhooks to send your data to the dashboard.
What you shouldn't do
It is important that these metrics are not used to compare teams with each other. You can aggregate them to get an organizational overview, but don't compare individual teams! Every team has different circumstances. It's only important that the metrics evolve in the right direction.
Also, the metrics should not become the goal. It is not desirable to just get better metrics. The focus should always be on the capabilities that lead to these metrics and that we discuss in this book. Focus on these capabilities and the metrics will follow.
The SPACE framework for developer productivity
The DORA metrics are a perfect starting point. They are easy to implement and there is lots of data to compare. If you want to take it one step further and add more metrics, you can use the SPACE framework for developer productivity (Forsgren N., Storey M.A., Maddila C., Zimmermann T., Houck B. & Butler J. (2021)).
Developer productivity is the key ingredient to achieving a high engineering velocity and a high DVI. Developer productivity is highly correlated to the overall well-being and satisfaction of developers and is, therefore, one of the most important ingredients to thrive in the war of talents and attract good engineers.
But developer productivity is not just about activity. The opposite is often the case: in times of firefighting and meeting deadlines when activity is normally high, productivity decreases through frequent task switching and less creativity. That's why metrics that measure developer productivity should never be used in isolation, and never to penalize or reward developers.
Also, developer productivity is not solely about individual performance. As in team sports, individual performance is important, but only the team as a whole wins. Balancing measures of individual and team performance is crucial.
SPACE is a multidimensional framework that categorizes metrics for developer productivity into the following dimensions:
- Satisfaction and well-being
- Communication and collaboration
- Efficiency and flow
All the dimensions work for individuals, teams, and the system as a whole.
Satisfaction and well-being
- Developer satisfaction
- Net promoter score (NPS) for a team (how likely it is that someone would recommend their team to others)
- Satisfaction with the engineering system
Performance is the outcome of the system or process. The performance of individual developers is hard to measure. But for a team or system level, we could use measures such as LT, DLT, or MTTR. Other examples could be uptime or service health. Other good metrics are customer satisfaction or an NPS for the product (how likely it is that someone would recommend the product to others).
Activity can provide valuable insights into productivity, but it is hard to measure it correctly. A good measure for individual activity would be focus time: how much time is a developer not spending on meetings and communication? Other examples for metrics are the number of completed work items, issues, PRs, commits, or bugs.
Communication and collaboration
Communication and collaboration are key ingredients to developer productivity. Measuring them is hard, but looking at PRs and issues gives you a good impression of how the communication is going. Metrics in this dimension should focus on PR engagement, the quality of meetings, and knowledge sharing. Also, code reviews across the team level (cross-team or X-team) are a good measure to see what boundaries there are between teams.
Efficiency and flow
Efficiency and flow measure how many handoffs and delays increase your overall LT. Good metrics are the number of handoffs, blocked work items, and interruptions. For work items, you can measure total time, value-added time, and wait time.
How to use the SPACE framework
It is important to not only look at the dimension but also at the scope. Some metrics are valid in multiple dimensions.
It is also very important to select carefully which metrics are being measured. Metrics shape behavior and certain metrics can have side effects you did not consider in the first place. The goal is to use only a few metrics but with the maximum positive impact.
You should select at least three metrics from three dimensions. You can mix the metrics for individual, team, and system scope. Be cautious with the individual metrics—they can have the most side effects that are hard to foresee.
To respect the privacy of the developers, the data should be anonymized, and you should only report aggregated results at a team or group level.
Objectives and key results
Many companies that are practicing DevOps are using objectives and key results (OKRs)—among them Google, Microsoft, Twitter, and Uber.
The OKR method dates back to the 1970s when Andrew Grove, the father of OKRs, introduced the method to Intel. The method was called iMBO, which stands for Intel Management by Objectives. He described the method in his book High Output Management (Grove, A. S. (1983)).
In 1999, John Doerr introduced OKR to Google. He had worked for Intel when Andrew Grove introduced iMBO there. OKR quickly became a central part of Google's culture. John Doerr published his book Measure What Matters (Doerr, J. (2018)), which made OKR famous. If you want to learn more about OKR, I highly recommend reading this book.
What are OKRs?
OKR is a framework that helps organizations to achieve a high alignment on strategic goals while keeping a maximum level of autonomy for teams and individuals. Objectives are qualitative goals that give direction and inspire and motivate people. Each objective is associated with unambiguously measurable quantitative metrics—the key results. The key results should focus on outcomes and not on activities, as illustrated in the following table:
OKRs should in no way be associated with the performance management system of the company or bonuses for its employees! The goal is not to achieve a 100% success rate for OKRs—this would mean the OKRs are not aggressive enough.
OKRs are written in the following format:
We will [objective]
As measured by [set of key results]
It is important that OKRs focus on outcomes and not on activities. A good example is an objective that was set by Google's chief executive officer (CEO) Sundar Pichai in 2008 when Google launched their Chrome browser. This was the OKR:
We will build the best browser
As measured by 20 million users by the end of 2008
The goal was bold for a new browser and Google failed to achieve this in 2008, getting fewer than 10 million users. In 2009, the key result was increased to 50 million users, and again, Google failed to achieve this, with about 37 million users. But instead of giving up, the key result was again increased in 2010—this time, to 100 million users! And this time, Google overachieved their goal, with 111 million users!
How do OKRs work?
For OKRs to work, a company needs a good vision and mission that defines the WHY: Why are we working for this company? The vision is then broken down into mid-term goals (called MOALS). The MOALS themselves are also OKRs. They are broken down into OKRs for an OKR cycle, typically between 3 to 4 months. In OKR planning and alignment, OKRs are broken down in the organization so that every individual and every team has its own OKRs that contribute to the bigger goal. The OKRs are then continuously monitored, normally on a weekly basis. At the end of the OKR cycle, the OKRs are reviewed, and the achievements (hopefully) celebrated. With the learning from the cycle, the MOALS get updated and a new cycle begins, (see Figure 1.6).
OKR in theory is simple, but implementing it is not. Writing good OKRs is especially hard and needs a lot of practice. There are also strong dependencies on the corporate culture and existing metrics and key performance indicators (KPIs) that are measured.
OKRs and DevOps
Once implemented correctly, OKRs can give you the ability to have a strong alignment between your teams by preserving their autonomy to decide on their own what they are building, and not only on how they build it, (see Figure 1.7). This is important when we talk about experimentation in Chapter 19, Experimentation and A/B Testing with GitHub. Your teams can define their own experiments and measure the output. Based on this, they decide which code stays in the projects and which doesn't.
We will build the best visual project management tool
As measured by a 75% market share by the end of 2025
Your product is built by two teams: one team focuses on the core of the product and builds the visuals for project management. They focus on the existing customers and building a product that the customers love. They agree on the following OKR:
We will build the visual project management tool that is loved by our customers
As measured by an NPS of higher than 9
The NPS is currently at 7.9, so the team must figure out on their own what they can do to delight the customers. After a few interviews with some customers, they formulate the hypothesis that all the project management tools are based on older project management techniques and are too complicated in a more agile-oriented project world. They decide to conduct an experiment with part of the customers, with a completely new concept on how to visualize the project to confirm or diminish the hypothesis.
The second team is the shared services team. They focus on user management, enterprise integration, and billing. The product needs more new users to achieve the MOAL, not only to make the current ones happy. So, the focus in this OKR cycle is on bringing new customers to the product, as illustrated here:
We will build a project management tool that is easy to use for new customers
As measured by a 20% increased monthly new registered users
Currently, newly registered users have flattened, so the intent is to start growing again. The team looks at the numbers and finds that a lot of new customers quit the registration process on the details page, where they must enter their address and banking details. They have the hypothesis that more customers would try the product and hopefully stay on the platform if the registration process were easier. They decide to conduct an experiment and reduce registration to the minimum that is required for authentication. They grant new users a 30-day free trial and request payment details after that period.
I will explain in Chapter 18, Lean Product Development and Lean Startup, and Chapter 19, Experimentation and A/B Testing with GitHub, how hypothesis-driven development and experimentation work. This is independent of OKR, but both work very well together.
If you are interested in real-world OKRs, GitLab share their OKRs publicly (https://about.gitlab.com/company/okrs/). They also share their entire process and how they link OKRs to epics and issues.
OKRs are not a prerequisite for DevOps. But as with agile practices, they are just a natural match. If you are not working in an agile way and start with DevOps, your way of working will become agile anyway, and you can benefit from frameworks such as Scrum to not invent the wheel again. And the same is true for OKRs: they come naturally when you scale DevOps in big organizations and you want to provide teams with great autonomy by maintaining an alignment to global goals.
In this chapter, I explained how software is taking over the world, its impact on the lifespan of companies, and a need to accelerate software delivery if your company wants to stay in business. This helps you to change your conversation with your management team by making your engineering velocity visible.
Measure metrics that matter for your company and focus on capabilities. Start with the four key metrics from DORA and add more metrics to the mix from different dimensions of the SPACE framework. But remember that metrics shape behavior, so be careful which metrics you choose.
By picking the right metrics, you make your DevOps transformation and acceleration measurable and transparent.
Most of this chapter focuses on efficiency: doing things right. Only OKR also addresses effectiveness: doing the right things. OKR is also relevant for lean product development and is touched on in Chapter 18, Lean Product Development and Lean Startup.
In the next chapter, you'll learn how to plan, track, and visualize your work.
As with development processes, the tools landscape is very heterogeneous. There are some old Team Foundation Server (TFS) installations on premises; some teams use Jira, Confluence, and Bitbucket, and some use GitHub and Jenkins. Some teams already have some continuous integration/continuous deployment (CI/CD) practices in place, while other teams still build, package, and deploy manually. Some teams already work in a DevOps way and operate their own products, while other teams still hand over the production releases to a separate operations team.
Tailwind Gears faces the following problems:
- No visibility for top management on how development is doing. Since all teams work differently, there is no common way to measure velocity.
- The divisions report slow release cycles (between months and years) and high failure rates.
- Every division has its own team to support its toolchain, so there is a lot of redundancy. Things such as templates and pipelines are not shared.
- It's difficult to allocate developers and teams to the products with the most business value. Toolchain and development practices are too different and the onboarding time is too long.
- Developers feel unsatisfied with their work and not productive. Some already left the company and it's hard to recruit new talent in the market.
To address these issues, the company decides to implement one common engineering platform. This also intends to unify the development processes. These are the goals of the initiative:
- Accelerate software delivery in all divisions.
- Increase the quality of the software and reduce failure rates.
- Save time and money by raising synergies and only have one platform team that is responsible for the one engineering system.
- Increase the value of the software being built by allocating developers and teams to the products with a higher value proposition.
- Increase developer satisfaction to retain existing talent and to make it easier to hire new developers.
Since there is no unified platform yet, the metrics will be collected using surveys. The plan is to move one team after another to the new unified platform and use system metrics there.
Developer satisfaction is an important part of the transformation. Therefore, two more metrics are added, as follows:
- Developer satisfaction
- Satisfaction with the engineering system
This is a mix of six metrics from at least three SPACE dimensions. There is no metric for communication and collaboration yet. This will be added to the system as the transformation evolves.
Here are the references from this chapter that you can also use to get more information on the topics:
- Srivastava S., Trehan K., Wagle D. & Wang J. (April 2020). Developer Velocity: How software excellence fuels business performance: https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/developer-velocity-how-software-excellence-fuels-business-performance
- Forsgren N., Smith D., Humble J. & Frazelle J. (2019). DORA State of DevOps Report: https://www.devops-research.com/research.html#reports
- Brown A., Stahnke M. & Kersten N. (2020). 2020 State of DevOps Report: https://puppet.com/resources/report/2020-state-of-devops-report/
- Forsgren N., Humble, J. & Kim, G. (2018). Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations (1st ed.) [E-book]. IT Revolution Press.
- To read more on the four key projects, see Are you an Elite DevOps performer? Find out with the Four Keys Project (Dina Graves Portman, 2020): https://cloud.google.com/blog/products/devops-sre/using-the-four-keys-to-measure-your-devops-performance
- Forsgren N., Storey M.A., Maddila C., Zimmermann T., Houck B. & Butler J. (2021). The SPACE of Developer Productivity: https://queue.acm.org/detail.cfm?id=3454124
- Grove, A. S. (1983). High Output Management (1st ed.). Random House Inc.
- Grove, A. S. (1995). High Output Management (2nd ed.). Vintage.
- Doerr, J. (2018). Measure What Matters: OKRs: The Simple Idea that Drives 10x Growth. Portfolio Penguin