Netflix’s Scalable App Testing: Strategies & Infrastructure

Inside Netflix’s Approach to Large-Scale Android App Testing

Netflix’s Android app, now over 14 years in development, has undergone significant evolution in both its architecture and testing methodologies. Originally built as a hybrid application, it transitioned to a fully native app to address performance limitations and deliver a seamless user experience. Today, the codebase spans approximately one million lines of Java and Kotlin, distributed across more than 400 modules, and is actively being modernized with Jetpack Compose. A dedicated team of around 50 engineers manages the app’s development and maintenance.

Transition in Testing Ownership

Historically, Netflix relied on a specialized SDET (Software Development Engineer in Test) team to design and execute device tests in collaboration with developers and product managers. These engineers focused on creating automated tests and building custom testing frameworks, while feature developers were responsible for unit and integration tests. Over time, the SDET team was dissolved, shifting the responsibility for automation to individual feature teams, with a couple of SDETs remaining as support. Manual QA still plays a role in final release validation.

Addressing Device Diversity

One of Netflix’s core challenges is supporting a vast array of playback devices, from low-end Android Go phones to high-end foldables. The app maintains compatibility as far back as Android 7.0, though the minimum supported version is moving to Android 9. Device-specific quirks, manufacturer variations, and unique hardware features like foldable hinges necessitate extensive physical device testing to ensure consistent performance and user experience across the ecosystem.

Netflix’s Multi-Layered Testing Pyramid

Netflix employs a layered approach to testing, though the distribution of tests can resemble an hourglass due to legacy code and the emphasis on device testing. New features typically follow a classic test pyramid, with a strong foundation of unit tests, followed by screenshot and end-to-end (E2E) automation, and topped off with smoke tests.

Unit Testing Frameworks

Strikt: For fluent assertions tailored to Kotlin.
Turbine: Complements testing of Kotlin Flows.
Mockito: Used for mocking dependencies, though inline mocks are minimized to avoid build slowdowns.
Hilt: Facilitates dependency injection for test environments.
Robolectric: Bridges business logic with Android services for more complex test scenarios.
A/B Testing Framework: Enables feature flagging and test overrides for experimental features.

Developers are encouraged to use the simplest form of unit tests first, as test execution time increases dramatically with more complex frameworks and device-based tests.

Managing Flaky Unit Tests

Flakiness in unit tests, especially those running in CI pipelines, is a significant concern. Common causes include residual global state and asynchronous code execution. Netflix mitigates these issues by enforcing dependency injection, recreating altered states, and using test schedulers to control timing in asynchronous operations, ensuring deterministic outcomes.

Screenshot Testing

Screenshot tests validate the visual output of the app and serve as a robust alternative to manual QA for static screens. Netflix utilizes:

Paparazzi: For Compose UI and screen layouts, using static resources to avoid network dependencies.
Localization Testing: Captures screenshots across all supported locales for UX validation.
Device Screenshot Testing: Assesses visual behavior on real devices.
Accessibility Testing with Espresso: Ensures compliance with touch target size standards, though variations exist between WCAG and Android guidelines.

Device Testing Frameworks

Device tests, while slower, are critical for verifying overall app functionality. Netflix’s device testing arsenal includes:

Espresso: The primary framework for UI instrumentation.
PageObject Pattern: Abstracts test logic from screen implementation, simplifying migration to new UI technologies.
UIAutomator: Used for smoke testing release candidates.
Performance Testing: Monitors screen load times for regressions.
Network Capture/Playback: Reduces test instability by simulating API responses.
Backend Mocking: Ensures deterministic test results by controlling backend data.
Analytics Testing: Verifies correct event sequencing, crucial for monitoring user interactions.

Innovative Device Lab Infrastructure

Netflix pioneered dedicated device labs before cloud-based solutions like Firebase Test Lab existed. Their infrastructure supports:

Targeted device selection and configuration
Video and screenshot capture during tests
Log aggregation
Unique features such as an in-lab cellular tower for network simulation, network throttling, and system update control
Automated test execution using raw adb commands
Hardware and software qualification for partner devices, ensuring support for features like HDR

Combating Test Flakiness

Given the inherent instability of device-based tests, Netflix has developed robust tooling to:

Detect and minimize flakiness
Identify root causes and notify responsible teams
Automatically classify tests as stable, unstable, or disabled based on real-time results
Promote tests to stable status through continuous evaluation
Employ automated rules for retries, failure handling, and device repairs
Generate detailed failure reports segmented by device, OS, and environment for targeted troubleshooting

Continuous Integration and Test Pipelines

Netflix’s CI pipeline runs an extensive battery of unit, lint, and device tests on every pull request. A subset of smoke tests is executed against fully obfuscated builds, while additional automation is reserved for post-merge, daily, and weekly suites. This approach balances test coverage with execution time, ensuring rapid feedback without overburdening developers.

Test Coverage Strategies

Test coverage is managed through a matrix of OS versions and device types. Pull requests run on a “narrow grid” to optimize speed, while post-merge and scheduled suites utilize a “full grid” for comprehensive coverage. Special attention is given to layout issues, especially on tablets and foldables, with ongoing efforts to enhance automation for these form factors.

Looking Ahead: The Future of Testing at Netflix

Netflix continues to refine its testing strategy by exploring emulator-based testing for faster feedback, adopting new frameworks like Roborazzi for interactive screenshot testing, and developing modular demo apps for feature-level validation. The commitment to continuous improvement ensures that testing at scale remains robust, efficient, and adaptable to emerging technologies.

Read more such articles from our Newsletter here.

Prachi Kothiyal

What Is a Monorepo? Benefits for Full‑Stack Development Teams

Kashif Khan July 31, 2025 11:14 am No Comments

Modern software development often involves multiple applications, shared libraries, backend services, frontend UIs, and deployment pipelines—all maintained by a full-stack team. Managing these components across separate repositories (a “polyrepo” structure)

Illustration comparing SRE and DevOps roles, highlighting their key differences and synergy

SRE vs DevOps: What’s the Difference and How They Collaborate

Neel Vithlani July 31, 2025 9:24 am No Comments

Software teams must push code to users without breaking running services. Two professional disciplines shape that objective: DevOps and Site Reliability Engineering. Engineers weighing career paths often compare SRE vs

Spotify Backstage developer portal interface showcasing unified tools and services

Spotify’s AI Music Dilemma Raises Questions of Ethics and Trust

Prachi Kothiyal July 29, 2025 5:29 am No Comments

In recent months, Spotify has become the epicenter of a heated debate over the ethics of artificial intelligence in music. The world’s leading streaming service faced strong backlash after investigations

Inside Netflix’s Approach to Large-Scale Android App Testing

Jump to

Transition in Testing Ownership

Addressing Device Diversity

Netflix’s Multi-Layered Testing Pyramid

Unit Testing Frameworks

Managing Flaky Unit Tests

Screenshot Testing

Device Testing Frameworks

Innovative Device Lab Infrastructure

Combating Test Flakiness

Continuous Integration and Test Pipelines

Test Coverage Strategies

Looking Ahead: The Future of Testing at Netflix

Prachi Kothiyal

Leave a Comment Cancel Reply

You may also like

What Is a Monorepo? Benefits for Full‑Stack Development Teams

SRE vs DevOps: What’s the Difference and How They Collaborate

Spotify’s AI Music Dilemma Raises Questions of Ethics and Trust

Categories

Recent Posts

Interested in working with Newsletters ?

Home

Discover Jobs

Enterprise blog

Professionals blog

About us

Terms of use

Privacy policy

Contact us