Encora SRE (Site Reliability Engineering)

 

Introduction

Site reliability engineering (SRE) and DevOps are two trending disciplines with quite a bit of overlap; their essential goals are understanding how to measure success or failure and how to gain continuous reliability across every application.

Reliability is not just about the infrastructure, but is relevant every step of the way, from application quality to performance and security. Site Reliability Engineers care about every process from source code to deployment; that’s how they earn the reputation of being a true bridge from development to operations.

History:

While Site Reliability Engineers (SREs) work between development and operations, they don’t necessarily operate within DevOps . The concept of SRE has been around since 2003, which means that it precedes DevOps.

The term was made popular by Ben Treynor, who created Google’s Site Reliability Team. According to Treynor, SRE is “what happens when a software engineer is tasked with what used to be called operations.”

What is SRE?

Site Reliability Engineering is an engineering discipline devoted to helping an organization sustainably achieve the appropriate level of reliability in its systems, services, and productions.

SRE Core Principles

  1. SRE Focuses on reliability
  2. SRE Lives in the production
  3. SRE Manages scale and complexity
  4. SRE requires engineering and architecture
  5. SRE uses tech and respects people

SRE Practices

  1. Service level Indicators and service level objectives (SLIs and SLOs)
  2. Operational Balance
  3. Learning from Failure

How does an organization begin with SRE?

Mikey Dickerson’s Hierarchy of reliability

 

 

How Do You Start?

 

  1. Have a problem/ Downtime/epiphany
  2. Get management support lined up
  3. Read the available literature critically
  4. Spend time with other SREs
  5. Try out SLIs / SLOs
  • Site reliability engineers’ day to day work

Site reliability engineers measure service-level indicators (SLIs) and service level objectives (SLOs), while DevOps teams measure the failure rate plus the success rate over time. SREs share responsibilities related to the following DevOps pillars of infrastructural improvement.

  • Reduce organizational silos

Instead of discussing the number of existing silos in the company,SREs encourage everyone else to address the issue. This discussion is accomplished by using the tools and techniques across the company, helping spread ownership across all employees.

  • Accept failure as normal

SREs need to make sure that there aren’t too many errors or failures. To do so, they use a formula composed of SLI and SLO scores. SLIs count failures per request, by calculating request latency, the throughput of requests per second, or failures per request per time. SLOs are derived from threshold and percentage and represent the success of SLIs over a certain amount of time.

  • Implement gradual change management

SREs are all in for slow, methodical changes. Because companies want to move faster, they demand frequent releases, continually updating the product. So, DevOps and SREs must respond quickly but maintain a steady, controlled pace.

  • Leverage tooling and automation with smart dedications

Automate if it provides value to developers and operations by removing manual tasks.

  • Measure everything in the daily work

SRE teams need to know that everything is moving in the right direction. This can be accomplished by setting up alerts for various scenarios, embracing peer code review, and/or using unit tests.

Conclusion

Once you have a monitoring solution that meets your organization's needs — including complete coverage for your entire stack, unified views of hybrid environments, monitoring for ephemeral systems (containers/microservices), real-time models of your IT services, and massive scalability, you're then set up for success. Now you can take integrated data and insights from monitoring into incident response, root-cause analysis, remediation procedures, capacity planning, and so on at any scale.

About Encora

Fast-growing tech companies partner with Encora to outsource product development and drive growth.  As you evaluate IT monitoring solutions and their capabilities, check out how Encora can set your organization up to achieve the ultimate service reliability.

 

 

Share this post