Introduction
Every Software Testing Professional who has worked with test automation has likely encountered flaky tests in the test automation suite. You know the scenario: your tests pass perfectly on your local machine, but fail randomly in the CI pipeline – only to pass again when rerun without any code changes. These inconsistent tests, commonly known as “flaky tests,” have become one of the most challenging aspects of maintaining reliable test automation suites.
In this practical guide, we’ll dive deep into flaky tests in test automation. Drawing from real-world experience with Selenium, Rest Assured, and Appium, we’ll explore not just what makes tests flaky, but more importantly, how to detect, manage, and fix them.
Index Terms—Test automation, flaky tests, software testing, Selenium, Rest Assured, Appium, test reliability, XPath, test maintenance, continuous integration.
What is a Flaky Test?
A flaky test is an automated test that sometimes passes and sometimes fails, even when nothing has changed. This makes it hard to know if there’s a real bug or if the test is just unreliable.
Characteristics of Flaky Tests
- Inconsistency: The test passes in some runs and fails in others without code changes.
- Unpredictability: The outcome is not reliably reproducible, making debugging difficult.
- Environment Sensitivity: Results may vary across different machines, networks, or test runs.
- False Signals: Flaky tests can mask genuine issues or falsely indicate problems, wasting developer time.
Example of a Flaky Test (Selenium)
Consider a Selenium test that checks if a login button is clickable:
public class LoginTest {
@Test
public void testLoginButton() {
WebDriver driver = new ChromeDriver();
driver.get("https://example.com/login");
driver.findElement(By.id("login-btn")).click();
assert driver.findElement(By.id("welcome-message")).isDisplayed();
driver.quit();
}
}
This test may pass when the page loads quickly but fails if the button isn’t yet clickable due to network delays, demonstrating flakiness.
What Causes Flaky Tests?
Flaky tests are created by a variety of sources, particularly in automation frameworks like Selenium, Rest Assured, and Appium. Some of the common sources are:
- Asynchronous Operations: Tests that don’t wait for asynchronous operations (e.g., page load, API calls) properly will randomly fail.
- External Dependencies: Relying on external systems like web servers, APIs, or mobile devices creates variability.
- Timing Issues: Race conditions or insufficient wait times lead to inconsistency in test execution.
- Shared State: Tests sharing resources (e.g., browser sessions, database records) can interfere with each other.
- Environmental Variability: Network speed fluctuations, device state, or test environment variations may cause failures.
- Non-Deterministic Inputs: Tests using random data or system time can produce unpredictable outcomes.
- Fragile Locators: In Selenium, incorrect usage of locators like XPath or CSS selectors tightly coupled to the DOM structure can result in test failure when the UI is slightly changed.
Example of Flakiness Due to Misuse of XPath (Selenium)
A test using a brittle XPath locator may fail when the DOM structure changes:
public class FlakyXPathTest {
@Test
public void testAddToCartButton() {
WebDriver driver = new ChromeDriver();
driver.get("https://example.com/shop");
// Brittle XPath relying on DOM structure
driver.findElement(By.xpath("//div[2]/div[1]/button[@class='add-to-cart']")).click();
assert driver.findElement(By.id("cart-count")).getText().equals("1");
driver.quit();
}
}
This test is flaky because the XPath //div[2]/div[1]/button[@class=’add-to-cart’] depends on the exact DOM hierarchy. If a developer adds or removes a div, the locator breaks, causing the test to fail intermittently even if the button is still present.
Why is Flaky Test Detection Important?
Flaky tests will hurt the velocity and quality of software development. When tests pass unpredictably, developers lose trust in them and ignore real problems. Debugging flaky tests takes time better used to bring new features into production. And worse, flaky tests might hide real bugs that make their way to production unseen.
In order to keep your test suite in sync, flaky tests need to be caught early. More often than not, it needs to run the same test multiple times or check previous test results to see if there’s abnormal behavior. Some tools such as TestNG’s retry analyzer or CI plugins can automatically detect and manage flaky tests.
Best Practices for Identifying and Reducing Flaky Tests
To effectively manage flaky tests, adopt these best practices:
- Run Tests Multiple Times: Execute tests repeatedly to identify inconsistent results. Use tools like TestNG’s IRetryAnalyzer to automate retries.
- Isolate Tests: Ensure each test runs independently, with its own setup and teardown, to avoid shared state issues.
- Use Explicit Waits: In Selenium and Appium, use explicit waits to handle dynamic content, reducing timing-related flakiness.
- Mock External Dependencies: Simulate APIs or services using libraries like WireMock for Rest Assured tests.
- Use Robust Locators: In Selenium, prefer stable locators like IDs or data attributes over fragile XPath/CSS selectors.
- Monitor Test Results: Use CI tools (e.g., Jenkins, GitHub Actions) to track test stability and flag flaky tests.
- Log Extensively: Add detailed logging to pinpoint the source of failures during debugging.
Example (Identifying Flaky Tests with TestNG Retry)
Implement a retry mechanism to detect flakiness:
import org.testng.IRetryAnalyzer;
import org.testng.ITestResult;
public class RetryAnalyzer implements IRetryAnalyzer {
private int retryCount = 0;
private static final int MAX_RETRY_COUNT = 3;
@Override
public boolean retry(ITestResult result) {
if (retryCount < MAX_RETRY_COUNT) {
retryCount++;
System.out.println("Retrying test: " + result.getName() + ", Attempt: " + retryCount);
return true;
}
return false;
}
}
Apply it to a test:
import org.testng.annotations.Test;
public class FlakyTest {
@Test(retryAnalyzer = RetryAnalyzer.class)
public void testPotentiallyFlaky() {
// Test logic
}
}
This setup retries failing tests and logs attempts, helping identify flakiness.
Actionable Strategies to Fix Flaky Tests
Once flaky tests are identified, use these strategies to fix them, with framework-specific examples.
1. Handle Asynchronous Operations (Selenium)
Use WebDriver’s WebDriverWait to wait for elements to be clickable or visible.
Fixed Selenium Example:
public class LoginTest {
@Test
public void testLoginButton() {
WebDriver driver = new ChromeDriver();
driver.get("https://example.com/login");
WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10));
wait.until(ExpectedConditions.elementToBeClickable(By.id("login-btn"))).click();
assert wait.until(ExpectedConditions.visibilityOfElementLocated(By.id("welcome-message"))).isDisplayed();
driver.quit();
}
}
The explicit wait ensures the button is clickable before interaction, reducing flakiness.
2. Fix Fragile Locators (Selenium)
Replace brittle XPath locators with robust ones, such as IDs or data attributes, and use explicit waits.
Fixed Selenium XPath Example:
public class StableLocatorTest {
@Test
public void testAddToCartButton() {
WebDriver driver = new ChromeDriver();
driver.get("https://example.com/shop");
WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10));
// Use a stable locator, e.g., data attribute
wait.until(ExpectedConditions.elementToBeClickable(By.cssSelector("[data-testid='add-to-cart']"))).click();
assert wait.until(ExpectedConditions.visibilityOfElementLocated(By.id("cart-count"))).getText().equals("1");
driver.quit();
}
}
Using a CSS selector with a data-testid attribute (or an ID if available) ensures the locator remains stable even if the DOM changes, eliminating flakiness from XPath misuse.
3. Mock External Dependencies (Rest Assured)
Use WireMock to simulate API responses, eliminating reliance on live servers.
Fixed Rest Assured Example:
public class ApiTest {
private WireMockServer wireMockServer;
@BeforeClass
public void setup() {
wireMockServer = new WireMockServer(8080);
wireMockServer.start();
stubFor(get(urlEqualTo("/data"))
.willReturn(aResponse().withStatus(200).withBody("{\"status\": \"success\"}")));
RestAssured.baseURI = "http://localhost:8080";
}
@Test
public void testApiResponse() {
given().get("/data").then().statusCode(200).body("status", equalTo("success"));
}
@AfterClass
public void teardown() {
wireMockServer.stop();
}
}
WireMock ensures consistent API behavior, eliminating flakiness from server variability.
4. Stabilize Mobile Tests (Appium)
In Appium, use explicit waits and reset app state to avoid device-related flakiness.
Fixed Appium Example:
public class MobileTest {
@Test
public void testMobileLogin() throws Exception {
AppiumDriver driver = new AndroidDriver(new URL("http://localhost:4723/wd/hub"), capabilities());
driver.resetApp(); // Reset app state
WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10));
wait.until(ExpectedConditions.elementToBeClickable(By.id("login-btn"))).click();
assert wait.until(ExpectedConditions.visibilityOfElementLocated(By.id("welcome-message"))).isDisplayed();
driver.quit();
}
}
Resetting the app and using waits ensures consistent test execution on mobile devices.
5. Isolate Tests
Use TestNG’s @BeforeMethod and @AfterMethod to set up and tear down test environments, preventing shared state issues.
Example:
public class IsolatedTest {
private WebDriver driver;
@BeforeMethod
public void setup() {
driver = new ChromeDriver();
}
@Test
public void testFeature() {
driver.get("https://example.com");
// Test logic
}
@AfterMethod
public void teardown() {
driver.quit();
}
}
Each test gets a fresh browser instance, avoiding interference.
6. Eliminate Non-Determinism
Avoid using system time or random inputs. If needed, use seeded random generators.
Example:
public class DeterministicTest {
@Test
public void testRandomInput() {
Random random = new Random(42);
int value = random.nextInt(100);
assert value >= 0 && value < 100;
}
}
Seeding ensures consistent random values.
Conclusion
Flaky tests are a common problem of test automation, but they can be handled well with the right handling. They are tests that sometimes pass and sometimes fail even though there are no changes in the code. This happens mostly because of factors like time dependency, async behavior, network latency, or bad locators like bad XPath.
To avoid flakiness, the testers must make clean and stable test scripts. It is necessary to find the root cause—either in the test or in the app. Tools like Selenium, Rest Assured, or Appium must be used correctly, along with smart waits and better locators, to make tests stable and less flaky.