Resilience Testing: Definition, Examples and How to Do It

Resilience testing is a type of software testing performed to evaluate how an application will perform under stress, or in “chaotic” circumstances. Often chaos in this type of testing is simply many neutral, but different environments that the program hasn’t encountered before. Essentially, resiliency testing shows how well a program operates under stress before it malfunctions. Key features being observed are how well the program can keep on performing core functions and preserve data integrity during these chaotic conditions. 

In today’s day and age, any downtime for an application can be detrimental to the overall organization. Downtime can mean an organization bleeding customers, so it’s essential to perform resiliency testing to prepare for these inevitable software failures. 

What is Resilience Testing?

It also demonstrates software’s ability to recover after momentary outages and other random stressors. For example, software engineers will want to test how an application will handle a large influx of users while network interfaces are disabled. These tests show where improvements can be made to the software that will support the product’s resilience, and ultimately the user experience. 

Here are the two principles behind resilience testing.

1. Knowledge of the system.

Intimate knowledge of the system being tested is essential to do quality resilience testing.

2. Awareness that failure is inevitable. 

Not only does this idea direct the practical application of how you run testing, but it is also an important philosophical distinction between trying to prevent errors and mistakes and preparing for a better response to them. 

Examples of Resilience Testing

There is little value in spending days running perfectly designed tests with a flawless infrastructure. This gives you little real data about how your software will perform in the real world of glitches and crashes. Here are some ways that software engineers can test their software’s resiliency and ability to function in the chaos of regular life.

  1. Down nodes in the load balancers.

  2. Disable network interfaces.

  3. Turn off application processes.

  4. Unmount shared files systems.

These tests will mimic real-life conditions the software will be encountering once deployed. Tests like these will show you more of the true functioning capacity of your software. Defects will be revealed-for example if components can’t make it through momentary outages. This is valuable information and will lead to a better functioning, more successful software. 

Failures are part of life and part of any software. Preparation, with the acknowledgment and acceptance of this fact, will make for better software. It is a waste of time trying to prevent failures and chaos. Instead, focus that energy on helping your software better respond to the inevitable failures that are coming its way. Two more tips: focus on time to recovery and ask “Why did we allow that error to impact our users?”. Because at the end of the day, resiliency testing is about supporting the user experience. 

How to Do Resilience Testing

Resilience testing is part of the SDLC and starts with setting up a test environment for the application to perform in. Here are the steps of a resilience test.

1. Determine metrics.

Developers need to pick out which metrics should be monitored to show how well the software is performing. For example, metrics could be input and output times, throughput, time to recovery, and latency. Metrics could also include the relationship between metrics.

2. Identify baseline performance.

Now establish a baseline for the maximum load that the software can experience and still perform adequately. You need this base to help establish other variables seen in the testing. 

3. Introduce and evaluate disruptions.

Next, it’s time to try and break your application. There are a variety of ways to do this, including communication disruption with external dependencies, inserting malicious input, turning off interfacing systems, etc. The information being gathered during these disruptions is what’s important. 

4. Come to conclusions and decide how to respond to the test results.

It’s now time to use the data you just gathered to make informed improvements to your software. This data can also inform future testing. 

Resilience Testing with Encora

Get support with the resilience testing part of your software’s development and lifecycle. Our team of expert software engineers is standing by to develop and execute resilience testing for your product. Based on this testing, we will determine ways to support the integrity of your product, and improve your user’s experience overall. Reach out to us to get started today.  

Share this post

Table of Contents