Inside Airbnb’s AI Test Migration: 3,500 Files in 6 Weeks

How Airbnb Used AI to Migrate 3,500 React Tests in Record Time

Code migrations are often notorious for being slow and resource-intensive, especially when dealing with legacy dependencies and evolving frameworks. Airbnb faced this challenge head-on when thousands of their React test files still depended on Enzyme, a library no longer aligned with modern React practices. The company’s objective was to transition all tests to React Testing Library (RTL), a process initially projected to take over a year. However, by leveraging AI and automation, Airbnb completed the migration in just six weeks.

The Need for Migration

Enzyme, introduced in 2015, was designed for deep access to React component internals—a method suitable for earlier React versions. By 2020, Airbnb had shifted new test development to RTL, which emphasizes testing components based on user interactions and rendered output, rather than internal implementation details. This approach aligns with current best practices, promoting maintainability and resilience to refactoring.

Despite this shift, thousands of legacy tests still used Enzyme. Migrating them posed significant challenges:

Divergent Testing Models: Enzyme’s focus on internals didn’t translate directly to RTL’s DOM-centric approach, requiring structural rewrites rather than simple code substitutions.
Risk of Coverage Loss: Removing Enzyme tests outright would leave gaps, especially for older components.
Manual Effort: Early estimates suggested over a year’s worth of engineering time for a manual migration, making the task impractical without automation.

Standardizing on RTL was essential for future-proofing Airbnb’s codebase and supporting upcoming React versions, but automation was the only feasible path forward.

Migration Strategy and Proof of Concept

The breakthrough came during a 2023 internal hackathon, where a small team demonstrated that Large Language Models (LLMs) could accurately convert Enzyme-based tests to RTL. Encouraged by the prototype’s speed and accuracy, Airbnb’s engineers formalized the process in 2024, creating a scalable migration pipeline.

The migration was broken down into independent, per-file steps, each handling a specific transformation—such as replacing Enzyme syntax, updating Jest assertions, or resolving lint and TypeScript errors. If a step failed, an LLM would refactor the file using contextual feedback, ensuring minimal manual intervention and preserving test intent and coverage.

Pipeline Design and Automation Techniques

Step-Based Workflow

Each test file moved through a structured, step-based state machine. This approach provided:

Clear Progress Tracking: Every file’s status and history were transparent throughout the pipeline.
Isolated Failures: Issues in one step didn’t halt the entire process, allowing for targeted fixes.
Parallel Execution: Hundreds of files could be processed simultaneously, maximizing throughput.
Step-Specific Retries: Errors at a particular stage could be addressed without affecting others.

Key stages included Enzyme refactoring, Jest fixes, lint and TypeScript checks, and final validation to confirm the migrated test’s correctness.

Automated Retry Loops and Dynamic Prompting

Rather than perfecting prompts upfront, Airbnb’s engineers implemented automated retry loops. If a migration step failed, the system provided the LLM with:

The latest file version
Validation errors from the failed attempt

This allowed the model to iteratively refine its output based on concrete feedback. Most files succeeded after one or two retries, and the process ran automatically at scale, minimizing manual oversight.

Rich Prompt Context for Complex Cases

While retry loops resolved most files, more complex tests—those with custom utilities or intricate setups—required richer context. For these, the LLM received:

The component’s source code
The target test file
Validation errors
Sibling test files to capture team-specific patterns
High-quality RTL examples from the project
Relevant imports and utility modules
Migration guidelines

By providing targeted, meaningful context, the LLM could replicate nuanced testing styles and handle edge cases effectively.

Systematic Cleanup: From 75% to 97% Completion

The initial automated pass migrated 75% of files in under four hours, but around 900 files remained. To tackle these, Airbnb used:

Migration Status Annotations: Each file was tagged with its migration progress, making it easy to identify and address failures.
Step-Specific File Reruns: Engineers could reprocess subsets of files based on failure type, streamlining targeted fixes.
Structured Feedback Loops: By sampling failing files, tuning prompts, and sweeping across similar cases, the team pushed completion from 75% to 97% in just four days.

The remaining 3%—about 100 files—were finalized manually, using LLM-generated outputs as strong starting points rather than rewriting from scratch.

Results and Takeaways

The migration validated both Airbnb’s tooling and strategy:

Efficiency: 75% of files migrated in under four hours; 97% completed after four days of targeted iteration.
Minimal Manual Work: Only the final 3% required manual intervention, and even then, LLMs provided valuable baselines.
Preserved Coverage: Test intent and code coverage were maintained throughout, with migrated tests passing validation and aligning with RTL’s best practices.
Resource Savings: Six engineers completed the migration in six weeks, compared to an 18-month manual estimate.

The project demonstrated that LLMs excel in repetitive, context-driven transformations and can be paired with human oversight for complex edge cases. Airbnb now plans to extend this AI-powered framework to other large-scale code transformations, such as library upgrades and language migrations.

Conclusion

Airbnb’s AI-driven migration showcases how structured automation, powered by LLMs, can dramatically reduce engineering toil, accelerate modernization, and enhance consistency. By combining modular pipelines, dynamic feedback loops, and rich contextual inputs, the team transformed a daunting manual task into a rapid, scalable process—setting a new standard for codebase modernization in the age of AI.

Read more such articles from our Newsletter here.

How Airbnb Used AI to Migrate 3,500 React Tests in Record Time

Jump to

The Need for Migration

Migration Strategy and Proof of Concept

Pipeline Design and Automation Techniques

Step-Based Workflow

Automated Retry Loops and Dynamic Prompting

Rich Prompt Context for Complex Cases

Systematic Cleanup: From 75% to 97% Completion

Results and Takeaways

Conclusion

Prachi Kothiyal

Leave a Comment Cancel Reply

You may also like

GenAI and RAG Digest: Transforming Engineering with LLMs

Midjourney Video V1 Disrupts the AI Video Generation Market

How Businesses Can Transform into AI Engineering Companies

Categories

Recent Posts

Home

Discover Jobs

Enterprise blog

Professionals blog

About us

Terms of use

Privacy policy

Contact us