Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

practice

DOI:10.1145/ 3159169
Leaders must embrace this new

Article development led by
queue.acm.org
world or step aside. Gartner Inc. pre-
dicts that by 2020, half of the CIOs
who have not transformed their
Your biggest mistake might teams’ capabilities will be displaced
be collecting the wrong data. from their organizations’ leadership
teams. And as every good leader
BY NICOLE FORSGREN AND MIK KERSTEN knows, you cannot improve what you
do not measure, so measuring the

DevOps Metrics
software development process and
DevOps transformations is more im-
portant than ever.
Delivering value to the business
through software requires processes
and coordination that often span mul-
tiple teams across complex systems,
and involves developing and deliver-
ing software with both quality and re-
siliency. As practitioners and profes-
sionals, we know that software
development and delivery is an in-
“Software is eating the world.” creasingly difficult art and practice,
and that managing and improving any
— Marc Andreessen process or system requires insights
“You can’t manage what you don’t measure.” into that system. Therefore, measure-
ment is paramount to creating an ef-
— Peter Drucker fective software value stream. Yet ac-
curate measurement is no easy feat.
ORG ANIZATIONS FRO M ALL industries are embracing Measuring DevOps. Collecting
software as a way of delivering value to their measurements that can provide in-
sights across the software delivery
customers, and we are seeing software drive pipeline is difficult. Data must be com-
innovation and competitiveness from outside of the plete, comprehensive, and correct so
that teams can correlate data to drive
traditional tech sector. business decisions. For many organi-
For example, banks are no longer known for hiding zations, adoption of the latest best-of-
gold bars in safes: instead, companies in the financial breed agile and DevOps tools has
made the task even more difficult be-
industry are harnessing software in a race to capture cause of the proliferation of multiple
market share. Using innovative apps, banks are making it systems of recordkeeping within the
organization.
possible for their customers to do most of their daily One of the leading sources of cross-
banking in a few swipes, from depositing checks to organization software delivery data is
transferring money securely between bank accounts. the annual State of DevOps Report
(found at https://devops-research.
Moreover, the banks themselves can improve their com/research.html).2 This industry-
service in a number of ways, such as using predictive wide survey provides evidence that
software delivery plays an important
analytics to detect fraudulent transactions. Other role in high-performing technology-
industries are seeing similar changes: cars are now driven organizations. The report out-
computers on wheels, and even the U.S. Postal Service lines key capabilities in technology,
process, and cultural areas that con-
is in the middle of a massive DevOps transformation. tribute to software-delivery perfor-
Software is everywhere. mance and how this, in turn, contrib-

44 COMM UNICATIO NS O F THE AC M | A P R I L 201 8 | VO L . 61 | NO. 4


COD
DEV OPS
E

utes to key outcomes such as employee


well-being, product quality, and orga-

ILD

PL
nizational performance.
Bolstered by this survey-based re-
SE

AN
BU
search, organizations are starting to
EA
measure their own DevOps “readi-

L
ness” or “maturity” using survey data.

RE

DE
While this type of data can provide a

PL
useful view of the potential role that

OY
DevOps can play in teams and organi- TES T
zations, the danger is that organiza-
tions may blindly apply the results of
surveys without understanding the

MO

AT E
limitations of this methodology.

NI
On the flip side, some organiza-

ER
tions criticize survey-based data
R

O
wholesale and instead attempt to mea-
sure or assess their DevOps readiness OP
or maturity using system data alone.
These organizations, which are creat-
ing metrics based on the system data
stored in their repositories, may not
understand the limitations of that
methodology, either.
By understanding these limita-
tions, practitioners and leaders can
better leverage the benefits of each
methodology. This article summarizes
the two separate but complementary
approaches to measuring the software
value stream and shares some pitfalls
of conflating the two. The two ap- need to develop and deliver software manufacturing is extremely mature in
proaches are defined as follows: competitively. terms of metrics and data collection,
˲˲ Survey data. Using survey mea- As an analogy, consider how a man- there is a severe lack of industry con-
sures and techniques that provide a ufacturer may track the effectiveness sensus on how to measure software de-
holistic and periodic view of the value of a complex assembly line. Instru- livery. This implies that this practice is
stream. mentation at each step provides data still in its in infancy. (Note that this is
˲˲ System data. Using tool-based on rates of flow and defects within likely related to the relative maturity of
data that provides a continuous view each phase and across the end-to-end the fields themselves: the manufactur-
of the value stream and is limited to system. Augmenting that with survey ing discipline has been around for a
what is automatically collected and data of the assembly line staff can long time, so those who study and
correlated. prove invaluable—for example, dis- measure it have had several decades to
covering that a newly deployed coop- perfect their craft; in contrast, soft-
A Complementary Approach erative robot is putting more physical ware engineering is a relatively young
IMAGE BY AND RIJ BORYS ASSOCIAT ES/SHUT TERSTOCK

Neither system data nor survey data strain on employees than was prom- field, making its measurement study
alone can measure the effectiveness of ised by the robot vendor. much less mature.) As such, it is criti-
a modern software delivery pipeline. Capturing that information before cal for organizations to understand
Both are needed. A complementary higher defect rates, lower employee what they can and cannot measure
approach to measurement can arm survey scores, or even lawsuits arise with which approach, and what steps
organizations with a more complete can prove invaluable. In this example, they must take to gain visibility into
picture of their development and op- the survey data provides leading indi- their software delivery value streams.
erations environment, address the key cators to system data, or provides in- Using the authors’ collective de-
gaps of each approach, and provide or- sights that system data might not dis- cades of research and experience in
ganizations with the information they close at all. Whereas assembly line collecting both survey and system

A P R I L 2 0 1 8 | VO L. 6 1 | N O. 4 | C OM M U N IC AT ION S OF T HE ACM 45
practice

data—confirmed by in-depth discus- source of baseline information. This is


sions with hundreds of experts at doz- valuable both for baselining current
ens of global organizations who make and future survey data, and for com-
software value-stream measurement a paring survey with system data once in
key part of their digital transforma-
tion—this article outlines the mea- Leaders must place. Therefore, it is best to capture a
system baseline with survey measures
sures necessary for understanding
your ability to develop and deliver
embrace now while continuing to build out sys-
tem-based metrics.
software. this new world What happens once you are fully in-

Start Building a Baseline Now


or step aside. strumented with system-based met-
rics? You can continue using your sur-
There are several reasons why both sys- Gartner Inc. vey-based metrics for both
tem and survey data should be used to
measure the value streams that define
predicts that augmentation and capturing addition-
al data that’s uniquely suited for sur-
your software-delivery processes. One by 2020, vey methods.
of the most important is that most or-
ganizations seem to have almost no half the CIOs There are still some measures that
are important to software delivery,
visibility or reliable measurement of who have not such as cultural measures, that sur-
their software-delivery practices.
The earlier an organization starts transformed vey-based measures will pick up and
system-based metrics may miss. In ad-
measurement, the earlier a baseline is
established and can be used for gaug-
their teams’ dition, having both types of metrics
provides opportunities for triangula-
ing relative improvement. For a small capabilities will be tion: if your survey measures provide
organization, applying system met-
rics as the initial baseline can be easy.
displaced from data that is drastically different from
the data coming from your systems,
For example, a 20-person startup can their organizations’ this can highlight gaps in the system.
measure MTTR (mean time to repair)
using just an issue tracker such as leadership teams. Some might say such a gap is just
an area where “people lie,” but if all of
Jira. A large organization, however, the people working closely with the
will need to include service desks and system are lying, you might want to
potentially other planning systems in consider their experience as a true
order to identify that baseline and data point. If your engineers consis-
may not have implemented a tool that tently report long build times and the
provides cross-system visibility. We system data reports short build times,
recommend getting started with base- could it be a configuration error in the
line collection immediately, and for API? Or could it be that the system-
many organizations that will mean based measure is capturing only a por-
collecting survey data while efforts to tion of the data? Without consistently
capture and correlate system data are collecting insights from the profes-
under way. sionals working with your systems,
In the absence of complete system you will miss opportunities to see the
measurements, comprehensive sur- full picture. The rest of this article out-
veys can provide a holistic view of your lines the pros and cons of each mea-
system relatively quickly (such as, surement type.
within several weeks). Contrast that
with full visibility of your system pro- System-Based Metrics
vided by system-based metrics. Get- System-based metrics generally refer
ting end-to-end system data can be a to data that comes from the various
long journey as you first must deploy a systems of record that make up an end-
measurement solution across sys- to-end software delivery value stream.
tems, and then make sure that cross- Important aspects of this data include:
system integration is in place so the ˲˲ Completeness. Is the data captured
data can be properly correlated. Mod- from a particular system of record,
ern value-stream metrics are making such as an agile tool, complete enough
this easier, but for many organizations to provide the kind of visibility, met-
this has been a multiyear project. rics, and reports that are the goal of
While it is important to start as ear- the initiative? For example, if demon-
ly as possible to get the benefits of sys- strating faster time to market is the
tem data, deploying survey data pro- goal, are enough historicals captured
vides an almost immediate value and to derive the trend line of how quickly

46 COMMUNICATIO NS O F TH E AC M | A P R I L 201 8 | VO L . 61 | NO. 4


practice

new products and features are deliv- metrics (continuous data) at subsec- but they cannot paint a complete pic-
ered? ond intervals (precision). You can then ture of what is happening in your soft-
˲˲ Comprehensiveness. Is enough data combine and correlate (volume and ware-delivery work. Therefore, it is
captured across all systems of record? scale) these to create a full picture of strongly recommended that you aug-
For example, to measure time to mar- what is happening in your house. ment your metrics with complementa-
ket for a customer request, you may ry survey measures.
need data from a customer/support System Data Challenges
tracking system, the roadmapping/re- ˲˲ Capturing behavior outside of the Survey-Based Metrics
quirements system, the agile tool, and system. This may be the most impor- Survey-based metrics generally refer to
the deployment tool chain. tant yet most overlooked limitation in data about systems and people (such
˲˲ Correctness. Is the data sufficiently system-based data. An example is ver- as culture) that comes from surveys.
correlated to be correct? For example, sion control: your system can tell you Ideally, these surveys are sent to the
if a support ticket and a defect are ac- only what is inside of it. What portion people who are working on the systems
tually the same item but exist in two of the work being done is not being themselves and who are intimately fa-
different systems, should the two sys- checked into a version control system? miliar with the software-development
tems be integrated in a way to indicate Common culprits include system con- and delivery system—that is, the do-
that these are the same item, or do you figuration and database configuration ers. It is better for teams to avoid sur-
risk double-counting defects in this scripts. veying management and executives,
scenario? ˲˲ Gaining a holistic view. Eventually, because, as a recent study by Forrester
system-level data can provide a rela- shows, executives tend to overestimate
System Data Advantages tively full view of your system, but this the maturity of their organizations.3
˲˲ Precision. Only system-generated requires full instrumentation, plus Important aspects of this data in-
data can accurately show minute, sec- correlation across measures and ma- clude:
ond, and millisecond response times. turity in reporting and visualization ˲˲ Cohesiveness. Survey-based data is
˲˲ Continuous visibility. System-gen- techniques so that teams can under- particularly good at providing a com-
erated data is particularly well suited stand system state. This is a nontrivial plete and holistic view of systems. This
for continuous/streaming data and re- task, especially if undertaken without is because it can capture information
al-time reporting. You can just point it the right tooling and infrastructure in about systems, processes, and culture.
to the data store and gather everything place. Additionally, the holistic view Measure your system periodically and
for targeted analysis later. should include the human aspects of at regular intervals: every four to six
˲˲ Granularity. Data from systems the process, such as the difficulty of months.
can provide very granular data, allow- deployments and software sprints, ˲˲ Correctness. Survey design and
ing you to report on subsystems and which are important for understand- measurement is a well-understood
components. This is useful for iden- ing the sustainability of the work. discipline and can be leveraged to
tifying trends and bottlenecks, but ˲˲ Capturing drifts in the system. If any provide good data and insights about
requires additional effort to create a part of your system stack changes and systems and culture. By using carefully
higher-level picture of the full system. your data collectors are not updated, designed surveys with statistically val-
The more granular the data, the more your view of the system will be inac- id and reliable survey questions that
work is required to paint a full picture. curate. Note that this is not a charac- have been rigorously developed and
˲˲ Scalability. Once the integration teristic of a first-class data reporting tested, organizations can have confi-
and visibility infrastructure is imple- solution, but it happens in some com- dence in their survey data.
mented, it can be pointed at all sys- mercial systems and in many home-
tems. This means that the solution grown solutions, so it is worth men- Survey Data Advantages
can be scaled from getting visibility on tioning as a condition to watch for. ˲˲ Accuracy. When collected correct-
a single project to dozens or hundreds ˲˲ Cultural or perceptual measures. ly, survey data can provide accurate
of projects with large amounts of data. If you want to measure aspects of cul- insights into systems, processes, and
To use an analogy to illustrate: ture, these are perceptual and should culture. For example, you can mea-
when building a house, a contractor be measured with surveys. Further, any sure system capabilities by asking
may use concrete for the foundation; measures that come from system data- teams how often key tasks are done
wood/nails/screws/drywall for the bases (such as HR systems) are usually in automated or manual ways. When
walls; wiring and plumbing; brick for poor representations of the data you’re designed correctly, this provides a
the exterior; paint/carpet for the fin- trying to collect and will be lagging in- fast and accurate measurement that
ish; plus any materials for the kitchen dicators. That is, they will be able to can be used to baseline and guide im-
and bath. In order to track and moni- measure something only after it has provement efforts.
tor progress, you build in monitoring happened (such as someone leaving a ˲˲ A holistic view of the system. Sur-
to track each piece of the construction team or an organization). In contrast, veys are particularly good at capturing
and install it as the house is built. survey measures can let you measure holistic pictures of systems, because
Once installed, each and every piece of perceptions of culture in time to act on the answers that respondents provide
this infrastructure (specific data) can the information. synthesize data related to automation,
continually provide reporting and System-based metrics are useful, processes, and culture.

A P R I L 2 0 1 8 | VO L. 6 1 | N O. 4 | C OM M U N IC AT ION S OF T HE ACM 47
practice

˲˲ Triangulation with system data. and system measures disagree, you Conclusion
Survey data provides an alternate view have great cause to start debugging Software is driving value in organiza-
of your system, allowing you to identify the system. tions across all industries and around
problems or errors when there are two the world. To help deliver value, qual-
contrasting views. Do not automati- Survey Data Challenges ity, and sustainability more quickly,
cally discount your survey measures ˲˲ Precision. While you can query companies are undergoing DevOps
when this happens: there can often be practitioners about broad strokes, transformations. To help guide these
cases where changes in configurations you should not rely on them for de- difficult transformations, leaders
or system behavior alter the way that tailed or specific information. When must understand the technology pro-
system data is collected, while survey you ask about deployment frequency, cess.
measures remain true—and it is only your survey options increase in log This process can be illuminated
the delta in these two measures that scale: people can generally tell you through a good measurement pro-
calls attention to changes in the un- if they are deploying software on de- gram, which allows team members,
derlying system. mand, weekly, monthly, quarterly, or leaders, and executives to understand
˲˲ Capturing behavior outside of the yearly. Those frequencies are easy to technology and process work, plan
system. In the discussion of system confirm with system-based metrics initiatives, and track progress so the
data, version control was used as an (when available—though that is a organization can demonstrate the
example of data that will be incom- nontrivial metric to get from systems, value of investments to key stakehold-
plete if it is collected only from your because it requires getting data from ers. System-based metrics and survey-
system. You can gain a more complete several systems along the deployment based metrics each have inherent limi-
view of what is happening both with- pipeline). tations, but by leveraging both types of
in and around your system by using ˲˲ Continuity of data. Asking people metrics in a complementary measure-
surveys. For example, are there situ- to fill out surveys at frequent intervals ment program, organizations can gain
ations where version control is being is exhausting, and survey fatigue is a a better view of their software-delivery
bypassed? real concern. It is better to limit the fre- value chain and DevOps transforma-
˲˲ Cultural or perceptual measures quency of big data collection through tion work.
related to the system. Survey data pro- surveys—say, every six months or so.
vides insights into what it’s like to do ˲˲ Volume. The amount of data you
Related articles
the work: organizational culture, job collect is related to how often you col- on queue.acm.org
satisfaction, and burnout are impor- lect it. Experience tells us that surveys
Adopting DevOps Practices
tant as leading indicators of work tem- should be kept to 20–25 minutes (or
in Quality Assurance
po sustainability and hiring/retention. shorter) to maximize participation James Roche
Research shows that good organiza- and completion rates. There are nota- http://queue.acm.org/detail.cfm?id=2540984
tional cultures drive software deliv- ble exceptions: Amazon’s famous de- Statistics for Engineers
ery and organizational performance,2 veloper survey was rolled out on an an- Heinrich Hartmann
and job satisfaction drives revenues.1 nual basis and took about an hour to http://queue.acm.org/detail.cfm?id=2903468
Monitoring these proactively (through complete, but the engineers were very The Responsive Enterprise: Embracing the
survey data) and not just reactively interested and invested in the results, Hacker Way
(through turnover metrics in HR data- so they took the time to complete it. Erik Meijer and Vikram Kapoor
bases) should be a priority for all tech- ˲˲ Measures in strained environments. http://queue.acm.org/detail.cfm?id=2685692
nical managers and executives. If management has made it very clear
References
Let’s return to the house analogy. that it isn’t safe to be honest, or that 1. Azzarello, D., Debruyne, F. and Mottura, L.
When using system data, you can get the results will be used to punish The chemistry of enthusiasm. Bain and Co., 2012;
http://www.bain.com/publications/articles/the-
detailed information from each piece teams, then any survey responses will chemistry-of-enthusiasm.aspx.
of the system that is reporting. This be suspect. To quote the late W. Ed- 2. DevOps Research and Assessment. 2014, 2015, 2016,
and 2017 State of DevOps Reports; https://devops-
level of detail isn’t possible (or realis- wards Deming: “Whenever there is research.com/research.html.
3. Stroud, R., Klavens, E., Oehrlich, E., Kinch, A. and
tic) when asking people through sur- fear, you will get wrong figures.” But, Lynch, D. A dangerous disconnect: executives
vey questions—but you can very to be fair, system-based metrics are overestimate DevOps maturity. Forrester, 2017; http://
bit.ly/2Fs6Wjo
quickly and easily get a holistic un- equally suspect in unsafe and fearful
derstanding of what your system or environments, and possibly more so.
Nicole Forsgren is co-founder, CEO and Chief Scientist
its components are doing. For exam- Why? Because it only takes a single at DevOps Research and Assessment (DORA). She is best
ple, you can reliably ascertain if the person with root access to slip a rogue known for her work measuring the technology process
and as the lead investigator on the largest DevOps studies
house is in a good state: anyone can metric into the system and a tired per- to date.
report if the house is on fire, if a room son on peer review or a CAB (change Mik Kersten is the founder and CEO of Tasktop and drives
is dirty or smoky, or if an event has approval board) to miss it (as those of the strategic direction of the company and a culture of
customer-centric innovation. Previously, he launched a
caused damage. This data can be us who have seen the cult classic movie series of open source projects that changed how software
gathered much faster than the time Office Space can attest). In contrast, it developers collaborate.
needed to instrument and then corre- takes several or dozens or hundreds
late and synthesize hundreds or thou- of people to skew survey results en Copyright held by owners/authors.
sands of data points. If your survey masse. Publication rights licensed to ACM.

48 COMMUNICATIO NS O F TH E AC M | A P R I L 201 8 | VO L . 61 | NO. 4

You might also like