Reader small image

You're reading from  Engineering Manager's Handbook

Product typeBook
Published inSep 2023
PublisherPackt
ISBN-139781803235356
Edition1st Edition
Concepts
Right arrow
Author (1)
Morgan Evans
Morgan Evans
author image
Morgan Evans

Morgan Evans has been leading web and native app engineering teams since 2010. Having held senior engineering leadership roles at complex media and technology organizations, the author knows first hand how to lead challenging projects at high scale with demanding stakeholders and vocal customers. Evans has an educational background in social psychology and information architecture, lending a unique perspective to the book. She has been working on development teams delivering consumer and b2b digital products for 18 years.
Read more about Morgan Evans

Right arrow

Supporting Production Systems

In software development and operations, production systems are the instances of applications and services that are actively in use by end users. Once we build and release a network-accessible software system, it is said to be running in a production environment rather than a development, staging, or test environment. The activities that engineering and operations teams engage in to keep production systems available and troubleshoot operational issues are collectively known as production support.

Production support is an area with vastly different practices and norms depending on the company, industry, and customer expectations. In large technology organizations, it may be highly formalized across the company as site reliability engineering. In software-as-a-service (SaaS) settings with business customers, there are typically contracts with specific performance metrics that must be adhered to. Consumer product companies may vary widely in their approach...

Creating a commitment to reliability

Supporting production systems can be some of the most stressful work we do as software engineers. It may involve late nights, spoiled weekends, or interrupted family occasions when engineers must dig into incidents for hours on end with half-asleep brains, all while knowing the company is losing money by the second until systems are back online. It is crucial work that can be incredibly unpleasant and disruptive by its very nature. We may strive to make incident and support scenarios easier to manage and resolve, but in most settings, we can never shield our teams from them completely. In most engineering teams, production support is inevitable.

The goal of production support is to find a balance between the inherent stress of the work and the reality that systems must remain online and available. Our objective is to support these systems in such a way that we avoid burnout while delivering the best possible level of service.

Because this...

Raising awareness of reliability

Entire industries have spawned from the concept that awareness is an effective catalyst for action. A classic example is wearing a step tracker throughout your day to count how many steps you have taken. These wearable step-counting devices were created with the premise that if you see how many (or how few) steps you have taken in a day, you will be naturally motivated to increase or improve that number. You may layer gamification on top of awareness to further incentivize goals, but it seems that awareness alone is enough to inspire action when the desire is already there.

So, when our goal is supporting production systems and improving their reliability, it follows that we may create the conditions for transformative action just by raising awareness of the performance of systems. To this aim, anything at all we can do to raise awareness of the state and trends of our production systems will trigger a series of actions that will have a positive...

Reliability solutions

Reliability solutions include software platforms, configurations, integrations, automation, practices, and procedures. They provide insights, debugging tools, and timely information to engineering teams. Depending on your organization, you may already have access to a wealth of resources to increase reliability, or you may need to chart your own path.

Numerous volumes could be written on approaches and options for instrumenting systems, so here we will give an overview of the concepts for engineering managers to be aware of. These include service objectives, documentation, monitoring, alerting, and service interruption procedures.

Service objectives

If your company operates in a business-to-business context or provides SaaS, you may have specific service-level agreements (SLAs) and service-level objectives (SLOs). SLAs are contracts with customers that outline the performance expectations of a system. SLOs are the specific target ranges of different performance...

Summary

In this chapter, you learned how to support production systems by creating a commitment to reliability on your engineering team, raising awareness of the performance of your systems, and utilizing specific conventions and tools.

First, we learned how instilling our engineering teams with a personal commitment to strive for reliability is a powerful technique to reduce the stress and burden of difficult support work:

  • Give engineers an ownership mindset by getting them invested in the work, sharing decisions, and growing their understanding and confidence
  • Help engineers develop a sense of pride in their work by growing their reputation and providing recognition of achievements
  • Help engineers see how the work they do is making a difference in a community that they care about

Next, we learned how to activate that personal commitment by raising awareness of the performance of our production systems:

  • Use active communication periodically to raise...

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Engineering Manager's Handbook
Published in: Sep 2023Publisher: PacktISBN-13: 9781803235356
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Morgan Evans

Morgan Evans has been leading web and native app engineering teams since 2010. Having held senior engineering leadership roles at complex media and technology organizations, the author knows first hand how to lead challenging projects at high scale with demanding stakeholders and vocal customers. Evans has an educational background in social psychology and information architecture, lending a unique perspective to the book. She has been working on development teams delivering consumer and b2b digital products for 18 years.
Read more about Morgan Evans