Google’s software development process relies heavily on a vast array of tests to validate code submissions and ensure system readiness for deployment. These tests, both pre-submit and post-submit, play a crucial role in maintaining code quality and facilitating informed decision-making for developers and project managers alike. However, the tech giant faces a persistent challenge in the form of flaky tests, which can significantly impact productivity and reliability.
Understanding Flaky Tests
Flaky tests are defined as tests that produce both passing and failing results with the same code. At Google, approximately 1.5% of all test runs exhibit flaky behavior, affecting nearly 16% of their tests. This prevalence means that more than one in seven tests written by Google’s engineers occasionally fail due to reasons unrelated to code or test changes.
Impact on Development Processes
The presence of flaky tests has far-reaching consequences for Google’s development workflow:
- Post-submit testing: About 84% of observed transitions from pass to fail involve a flaky test, leading to additional work in distinguishing between flaky results and legitimate failures.
- Project releases: With an average project containing around 1,000 tests, the 1.5% flakiness rate translates to approximately 15 tests likely failing during the release process, necessitating time-consuming investigations.
- Pre-submit testing: Flaky tests can block code submissions and introduce costly delays in the core development process.
- False negatives: Developers may dismiss legitimate failures as flaky, potentially allowing broken code to be submitted or released.
Mitigation Strategies
To address the challenges posed by flaky tests, Google has implemented several mitigation strategies:
- Re-running options: During pre-submit testing, developers can re-run only failing tests or set tests to automatically re-run upon failure.
- Flaky test designation: Tests can be marked as flaky, requiring three consecutive failures to report a failure. However, this approach has its drawbacks, potentially masking real issues.
- Automated quarantine: A tool monitors test flakiness and automatically quarantines tests with excessively high flakiness rates, removing them from the critical path and filing bugs for developers to address.
- Flakiness change detection: Another tool identifies changes in test flakiness levels and works to pinpoint the cause of the change.
Ongoing Efforts and Future Directions
Google continues to invest in detecting, mitigating, tracking, and fixing test flakiness throughout its codebase. Some of the ongoing efforts include:
- Dedicated team: A new team has been established to provide accurate and timely information about test flakiness, aiming to help developers and build monitors understand and address flakiness-related issues.
- Data analysis: Promising correlations have been observed in flaky test execution data, potentially enabling accurate identification of flaky results without the need for test re-runs.
- Continuous improvement: Google aims to remove the friction caused by test flakiness from core developer workflows by advancing the state of the art for its teams.
By addressing the challenges posed by flaky tests, Google strives to enhance the reliability of its testing processes, improve developer productivity, and maintain the high quality of its software products. As the company continues to innovate in this area, it sets an example for the broader software development community in tackling the persistent issue of test flakiness.
Read more such articles from our Newsletter here.