Encora SRE (Site Reliability Engineering)

 

Introduction

Site reliability engineering (SRE) and DevOps are two trending disciplines with quite a bit of overlap; their essential goals are understanding how to measure success or failure and how to gain continuous reliability across every application.

Reliability is not just about the infrastructure, but is relevant every step of the way, from application quality to performance and security. Site Reliability Engineers care about every process from source code to deployment; that’s how they earn the reputation of being a true bridge from development to operations.

History:

While Site Reliability Engineers (SREs) work between development and operations, they don’t necessarily operate within DevOps . The concept of SRE has been around since 2003, which means that it precedes DevOps.

The term was made popular by Ben Treynor, who created Google’s Site Reliability Team. According to Treynor, SRE is “what happens when a software engineer is tasked with what used to be called operations.”

What is SRE?

Site Reliability Engineering is an engineering discipline devoted to helping an organization sustainably achieve the appropriate level of reliability in its systems, services, and productions.

SRE Core Principles

  1. SRE Focuses on reliability
  2. SRE Lives in the production
  3. SRE Manages scale and complexity
  4. SRE requires engineering and architecture
  5. SRE uses tech and respects people

SRE Practices

  1. Service level Indicators and service level objectives (SLIs and SLOs)
  2. Operational Balance
  3. Learning from Failure

How does an organization begin with SRE?

Mikey Dickerson’s Hierarchy of reliability

 

 

How Do You Start?

 

  1. Have a problem/ Downtime/epiphany
  2. Get management support lined up
  3. Read the available literature critically
  4. Spend time with other SREs
  5. Try out SLIs / SLOs
  • Site reliability engineers’ day to day work

Site reliability engineers measure service-level indicators (SLIs) and service level objectives (SLOs), while DevOps teams measure the failure rate plus the success rate over time. SREs share responsibilities related to the following DevOps pillars of infrastructural improvement.

  • Reduce organizational silos

Instead of discussing the number of existing silos in the company,SREs encourage everyone else to address the issue. This discussion is accomplished by using the tools and techniques across the company, helping spread ownership across all employees.

  • Accept failure as normal

SREs need to make sure that there aren’t too many errors or failures. To do so, they use a formula composed of SLI and SLO scores. SLIs count failures per request, by calculating request latency, the throughput of requests per second, or failures per request per time. SLOs are derived from threshold and percentage and represent the success of SLIs over a certain amount of time.

  • Implement gradual change management

SREs are all in for slow, methodical changes. Because companies want to move faster, they demand frequent releases, continually updating the product. So, DevOps and SREs must respond quickly but maintain a steady, controlled pace.

  • Leverage tooling and automation with smart dedications

Automate if it provides value to developers and operations by removing manual tasks.

  • Measure everything in the daily work

SRE teams need to know that everything is moving in the right direction. This can be accomplished by setting up alerts for various scenarios, embracing peer code review, and/or using unit tests.

Conclusion

Once you have a monitoring solution that meets your organization's needs — including complete coverage for your entire stack, unified views of hybrid environments, monitoring for ephemeral systems (containers/microservices), real-time models of your IT services, and massive scalability, you're then set up for success. Now you can take integrated data and insights from monitoring into incident response, root-cause analysis, remediation procedures, capacity planning, and so on at any scale.

About Encora

Fast-growing tech companies partner with Encora to outsource product development and drive growth.  As you evaluate IT monitoring solutions and their capabilities, check out how Encora can set your organization up to achieve the ultimate service reliability.

 

 

Learn More about Encora

We are the software development company fiercely committed and uniquely equipped to enable companies to do what they can’t do now.

Learn More

Global Delivery

READ MORE

Careers

READ MORE

Industries

READ MORE

Related Insights

Essential Guide to AWS Migration: Steps and Strategies

Discover the key steps and strategies for a successful AWS migration. Learn why AWS is a top cloud ...

Read More

Dynamic Pricing Reimagined: Leveraging AI to Balance Profitability and Customer Trust

To avoid the inevitable loss of customer trust and erosion of loyalty, retailers must exercise ...

Read More

Mastering Microsoft Microsoft Azure Migration: A Comprehensive Guide

Learn about Azure Migrate, the Azure migration process, tools, and services with our expert guide. ...

Read More
Previous Previous
Next

Accelerate Your Path
to Market Leadership 

Encora logo

+1 (480) 991 3635

letstalk@encora.com

Innovation Acceleration

Encora logo

+1 (480) 991 3635

letstalk@encora.com

Innovation Acceleration