top of page
Search

Stress Testing 101: Evaluating System Robustness and Recoverability

Writer's picture: Ongeziwe BulanaOngeziwe Bulana

stress testing

What is stress testing?

Stress Testing is a software testing technique that evaluates the system or application under extreme workloads. It helps to identify the breaking points where the system starts to fail or crash.


The goal of Stress Testing is to measure the robustness and the error handling capabilities of software under heavy loads. It ensures the application does not hang or crash during peak usage or demanding situations. This technique tests beyond normal operating conditions to see how the software behaves under extreme scenarios. 


A common use of stress testing is to find the limit at which the system will stop working due to “stress”.


Stress Testing Example

A quick example of placing an application (Lets use Notepad for example) under stress is to copy a 5GB file into it. As a result, Notepad may become unresponsive either due to the application limits or due to the hardware supporting the application.


Need for Stress Testing

Here are some real-world scenarios where stress testing is essential:

  • General high load sites: Sites such as Facebook and YouTube for example handle both high traffic as well as file loads.

  • Sudden traffic surges: Ecommerce site during sales , or event tickets sales such as the server handling Taylor swift sales!

  • Preparation for launch: There would be a need to stress test sites that are expected to carry some high load or stress at some point in the future.


Why Stress Testing is Important

  • Ensures the system functions correctly under abnormal conditions.

  • Validates if the system can protect itself and continue functioning under those conditions. 

  • Confirms that appropriate error messages are displayed during stress situations.

  • Prevents significant revenue loss caused by unexpected system failures.

  • Validates if rate limiting systems are functioning.


Types of Stress Testing

Following are the types of stress testing which are explained as follows:


  1. Distributed Stress Testing:


    Distributed stress testing

    In distributed client-server systems, stress testing is performed across multiple clients from a central server. The server distributes a set of stress tests to connected clients and monitors their status. When a client connects, the server registers the client’s name and sends testing data.


    Throughout the test, client machines send a "heartbeat" signal to indicate they are still connected. If a server stops receiving signals from a client, it flags the issue for further investigation. For example, in the figure, the server is connected to Client1 and Client2 but is unable to communicate with Client3 and Client4.


    Running these stress tests during the night is a common practice for efficient testing. Large server farms need a streamlined method to identify which machines have experienced stress failures and need debugging.


  2. Application Stress Testing

    Focuses on identifying issues related to data locking, data blocking, network congestion, and performance bottlenecks within an application.


  3. Transactional Stress Testing

    Tests one or more transactions between two or more applications. This helps fine-tune and optimize the system for better performance under load.


  4. Systemic Stress Testing

    Conducted across multiple systems running on the same server. It helps identify issues where data from one application might interfere or block data from another application.


  5. Exploratory Stress Testing

    Tests the system with unusual parameters or scenarios that are unlikely to occur in real-world situations. It helps uncover defects in unexpected situations such as:

    1. A large number of users logging in simultaneously.

    2. All machines start a virus scan at the same time.

    3. Database going offline while being accessed from a website.

    4. Inserting a large volume of data into the database simultaneously.


How to do Stress Testing?

The stress testing process involves five main steps:


Step 1: Planning the Stress Test

  • Gather system data, analyse the current performance, and define the goals for the stress test.


Step 2: Create Automation Scripts

  • Develop automation scripts for the stress scenarios and generate the necessary test data.


Step 3: Script Execution

  • Execute the stress testing scripts and record the results.


Step 4: Results Analysis

  • Review the test results to identify performance bottlenecks and areas of improvement.


Step 5: Tweaking and Optimization

  • Optimize the system by adjusting configurations, refining the code, and making necessary changes to meet the desired performance benchmarks.


Metrics for Stress Testing

Metrics help evaluate a system’s performance and are typically analysed at the end of a stress test. Some commonly used metrics include:

1. Scalability & Performance Metrics

  • Pages per Second: Measures how many pages are requested per second.

  • Throughput: Indicates the amount of data (in bytes) processed per second. It helps understand the system’s capacity.

  • Rounds: Compares the number of planned test scenarios to the number of times a client has executed them.


2. Application Response Metrics

  • Hit Time: The average time it takes to retrieve a single image or page.

  • Time to First Byte (TTFB): The time taken to receive the first byte of data from the server.

  • Page Load Time: The total time taken to retrieve and display all elements on a webpage.


3. Failure Metrics

  • Failed Connections: The number of connections refused by the client (e.g., due to weak signals or server overload).

  • Failed Rounds: The number of test rounds that failed to complete successfully.

  • Failed Hits: The number of failed attempts by the system (e.g., due to broken links, missing images, or server errors).


Conclusion

The objective of stress testing is to evaluate how a system performs under extreme conditions. It monitors critical system resources such as memory, CPU, and network, and assesses the system’s ability to recover back to normal operation. It also checks whether the system behaves as expected as well as under failure conditions displays appropriate error messages.


Ultimately a good system or Web app should be able to handle extreme stress gracefully, continue functioning but also protect itself should the limit be exceeded beyond design conditions.


This article references information from Guru99 on stress testing techniques and best practices.


bottom of page