Inside Netflix’s Approach to Large-Scale Android App Testing

Jump to

Netflix’s Android app, now over 14 years in development, has undergone significant evolution in both its architecture and testing methodologies. Originally built as a hybrid application, it transitioned to a fully native app to address performance limitations and deliver a seamless user experience. Today, the codebase spans approximately one million lines of Java and Kotlin, distributed across more than 400 modules, and is actively being modernized with Jetpack Compose. A dedicated team of around 50 engineers manages the app’s development and maintenance.

Transition in Testing Ownership

Historically, Netflix relied on a specialized SDET (Software Development Engineer in Test) team to design and execute device tests in collaboration with developers and product managers. These engineers focused on creating automated tests and building custom testing frameworks, while feature developers were responsible for unit and integration tests. Over time, the SDET team was dissolved, shifting the responsibility for automation to individual feature teams, with a couple of SDETs remaining as support. Manual QA still plays a role in final release validation.

Addressing Device Diversity

One of Netflix’s core challenges is supporting a vast array of playback devices, from low-end Android Go phones to high-end foldables. The app maintains compatibility as far back as Android 7.0, though the minimum supported version is moving to Android 9. Device-specific quirks, manufacturer variations, and unique hardware features like foldable hinges necessitate extensive physical device testing to ensure consistent performance and user experience across the ecosystem.

Netflix’s Multi-Layered Testing Pyramid

Netflix employs a layered approach to testing, though the distribution of tests can resemble an hourglass due to legacy code and the emphasis on device testing. New features typically follow a classic test pyramid, with a strong foundation of unit tests, followed by screenshot and end-to-end (E2E) automation, and topped off with smoke tests.

Unit Testing Frameworks

  • Strikt: For fluent assertions tailored to Kotlin.
  • Turbine: Complements testing of Kotlin Flows.
  • Mockito: Used for mocking dependencies, though inline mocks are minimized to avoid build slowdowns.
  • Hilt: Facilitates dependency injection for test environments.
  • Robolectric: Bridges business logic with Android services for more complex test scenarios.
  • A/B Testing Framework: Enables feature flagging and test overrides for experimental features.

Developers are encouraged to use the simplest form of unit tests first, as test execution time increases dramatically with more complex frameworks and device-based tests.

Managing Flaky Unit Tests

Flakiness in unit tests, especially those running in CI pipelines, is a significant concern. Common causes include residual global state and asynchronous code execution. Netflix mitigates these issues by enforcing dependency injection, recreating altered states, and using test schedulers to control timing in asynchronous operations, ensuring deterministic outcomes.

Screenshot Testing

Screenshot tests validate the visual output of the app and serve as a robust alternative to manual QA for static screens. Netflix utilizes:

  • Paparazzi: For Compose UI and screen layouts, using static resources to avoid network dependencies.
  • Localization Testing: Captures screenshots across all supported locales for UX validation.
  • Device Screenshot Testing: Assesses visual behavior on real devices.
  • Accessibility Testing with Espresso: Ensures compliance with touch target size standards, though variations exist between WCAG and Android guidelines.

Device Testing Frameworks

Device tests, while slower, are critical for verifying overall app functionality. Netflix’s device testing arsenal includes:

  • Espresso: The primary framework for UI instrumentation.
  • PageObject Pattern: Abstracts test logic from screen implementation, simplifying migration to new UI technologies.
  • UIAutomator: Used for smoke testing release candidates.
  • Performance Testing: Monitors screen load times for regressions.
  • Network Capture/Playback: Reduces test instability by simulating API responses.
  • Backend Mocking: Ensures deterministic test results by controlling backend data.
  • Analytics Testing: Verifies correct event sequencing, crucial for monitoring user interactions.

Innovative Device Lab Infrastructure

Netflix pioneered dedicated device labs before cloud-based solutions like Firebase Test Lab existed. Their infrastructure supports:

  • Targeted device selection and configuration
  • Video and screenshot capture during tests
  • Log aggregation
  • Unique features such as an in-lab cellular tower for network simulation, network throttling, and system update control
  • Automated test execution using raw adb commands
  • Hardware and software qualification for partner devices, ensuring support for features like HDR

Combating Test Flakiness

Given the inherent instability of device-based tests, Netflix has developed robust tooling to:

  • Detect and minimize flakiness
  • Identify root causes and notify responsible teams
  • Automatically classify tests as stable, unstable, or disabled based on real-time results
  • Promote tests to stable status through continuous evaluation
  • Employ automated rules for retries, failure handling, and device repairs
  • Generate detailed failure reports segmented by device, OS, and environment for targeted troubleshooting

Continuous Integration and Test Pipelines

Netflix’s CI pipeline runs an extensive battery of unit, lint, and device tests on every pull request. A subset of smoke tests is executed against fully obfuscated builds, while additional automation is reserved for post-merge, daily, and weekly suites. This approach balances test coverage with execution time, ensuring rapid feedback without overburdening developers.

Test Coverage Strategies

Test coverage is managed through a matrix of OS versions and device types. Pull requests run on a “narrow grid” to optimize speed, while post-merge and scheduled suites utilize a “full grid” for comprehensive coverage. Special attention is given to layout issues, especially on tablets and foldables, with ongoing efforts to enhance automation for these form factors.

Looking Ahead: The Future of Testing at Netflix

Netflix continues to refine its testing strategy by exploring emulator-based testing for faster feedback, adopting new frameworks like Roborazzi for interactive screenshot testing, and developing modular demo apps for feature-level validation. The commitment to continuous improvement ensures that testing at scale remains robust, efficient, and adaptable to emerging technologies.

Read more such articles from our Newsletter here.

Leave a Comment

Your email address will not be published. Required fields are marked *

You may also like

Automated testing workflow integrated with DevOps pipeline

Driving DevOps Excellence Through Automated Testing

DevOps emerged as a transformative approach to software development, evolving from the collaborative principles of Agile methodologies in the early 2000s. Agile fostered teamwork, adaptability, and rapid iteration, but it

QA Wolf and MuukTest QA automation services

Choosing the Right QA-as-a-Service: QA Wolf vs MuukTest

Selecting a robust and scalable test automation partner is crucial for engineering leaders aiming for efficiency and cost-effectiveness. While QA Wolf and MuukTest both present themselves as comprehensive QA-as-a-Service providers,

Categories
Scroll to Top