AWS

The introduction of Site Reliability Engineering (SRE)

The introduction of Site Reliability Engineering (SRE) teams in the structure of an organization, is becoming more and more popular in the IT industry and in the DevOps domains.
Let’s discover in this article the reason for SRE popularity and what are the differences and the common points between DevOps and SRE. 


In the last two decades we have witnessed a huge transformation in the way of building and delivering software. The Agile culture first and the DevOps revolution later, have transformed the structure of the tech organizations and they can be seen as a de facto standard in the IT industry.

As everyone knows, IT is a constantly evolving sector and recently we are seeing an increasing popularity of Site Reliability Engineering discipline, especially in the DevOps domains. But what is SRE? And what are the differences and the common points between SRE and DevOps? 

DevOps

DevOps culture has contributed to tearing down the wall between software development and the software operation, providing a set of ideas and practices that has led several benefits:

  • Better collaboration and communication between the parts;
  • Shorter release cycles;
  • Faster innovation;
  • Better systems reliability;
  • Reduced IT costs;
  • Higher software delivery performance.

Even though this sounds amazing, there are still quite a lot of companies that struggle with bringing the DevOps culture in their organization. The reason for this is that DevOps is an ideology and not a methodology or technology, which means it doesn’t say anything about how to successfully implement a good DevOps strategy. 

SRE 

Site Reliability Engineering (SRE) is a discipline that was born at Google in early 2000s to reduce the gap between software development and operations and that was completely independent by the DevOps movement. SRE uses software engineering approaches to solve operational problems.
SRE teams have main focus on:

  • Reliability;
  • Automation.

Let’s deepen these aspects.

Reliability

One of the main goals of the SRE is making and keeping systems up and running “no matter what”. In order to achieve this, it is important to keep in mind that failures and faults can happen. SRE discipline embraces them by focusing on:

  • observability;
  • system performance;
  • High availability (HA);
  • emergency response and disaster recovery; 
  • incidents management;
  • Learning from the past problems;
  • disaster mitigation and prevention.

Automation

Automating all the activities that are traditionally performed manually is another of the main goals of SRE.
Automation and software engineering are used to solve operational problems. 

Automation plays a fundamental role in SRE: it allows us to get rid of human errors present in the processes and the activities that regard the system. One could argue that automation introduces bugs in the system anyway and well, that is true but there is one big difference: one can test automated processes but cannot test processes that involve human activities. 

DevOps vs. SRE

As we have understood, both DevOps culture and SRE discipline aim to reduce the gap between software development and operations. Below we summarize them, describing their common goal first and where they differ the most. 

class SRE implements DevOps

As mentioned earlier DevOps doesn’t say anything about how to successfully bring the culture in the organization since it is an ideology. On the other hand, SRE can be seen as implementation of the DevOps philosophy. 
In fact, even though the origins of SRE are completely independent from DevOps, and the discipline provides additional practices that are not part of DevOps, SRE implements DevOps ideas.

Responsibility and Code ownership

SRE can be considered the next stage of DevOps because of the focus on code ownership: the SRE engineer accepts the responsibility of owning the code they develop, in production. A bit different from DevOps where the responsibilities are shared to achieve a shorter release cycle and to improve the collaborations.

Conclusions

The introduction of Site Reliability Engineering (SRE) teams in the structure of an organization, is becoming more and more popular in the IT industry and in the DevOps domains. 
The reason for its popularity can be found in the benefits that the discipline brings: 

  • Better collaboration and communication between the parts;
  • Shorter release cycles;
  • Faster innovation;
  • Better systems reliability;
  • Reduced IT costs;
  • Higher software delivery performance;
  • Reducing incidents in production;
  • Code ownership;
  • Automation of the processes;

As you could notice some of these benefits are exactly the same that you will experience bringing DevOps in your organization.

SRE can be considered an implementation of DevOps culture that has the goal of making and keeping services reliable.