Twitter is dedicated to providing users with the most relevant content from the vast sea of information available on its platform. With approximately 500 million Tweets generated daily, the challenge lies in filtering this volume down to a select few that appear on a user’s For You timeline. This article delves into the intricacies of Twitter’s recommendation algorithm and how it selects Tweets for individual timelines.
The recommendation system is a complex network of interconnected services and processes. While Tweets are recommended across various sections of the app—such as Search, Explore, and Ads—this discussion will focus specifically on the home timeline’s For You feed.
How Tweets Are Selected
The backbone of Twitter’s recommendation system consists of core models and features designed to extract valuable insights from Tweet data, user behavior, and engagement patterns. These models aim to answer critical questions like “What is the likelihood of future interactions between users?” and “What communities exist within Twitter, and which Tweets are trending in those communities?” By accurately addressing these queries, Twitter enhances its ability to provide relevant recommendations.
The recommendation pipeline operates through three primary stages:
- Candidate Sourcing: The first step involves fetching potential Tweets from various sources.
- Ranking: Each Tweet is then evaluated using a machine learning model.
- Application of Heuristics and Filters: Finally, heuristics are applied to eliminate Tweets from blocked users, inappropriate content, or those already seen.
The service responsible for assembling and delivering the For You timeline is known as Home Mixer. Built on the custom Scala framework called Product Mixer, Home Mixer serves as the software backbone that integrates different candidate sources, scoring functions, heuristics, and filters.
Candidate Sources
Twitter employs multiple Candidate Sources to gather recent and pertinent Tweets for users. Each request aims to extract around 1,500 Tweets from a pool of hundreds of millions. The For You timeline typically comprises an equal mix of In-Network (Tweets from followed users) and Out-of-Network (Tweets from non-followed users) content.
In-Network Source
The In-Network source is the primary candidate source, focusing on delivering timely and relevant Tweets from users that a person follows. It ranks these Tweets based on relevance through a logistic regression model. The most crucial element in ranking In-Network Tweets is known as Real Graph—a model that predicts engagement likelihood between two users. A higher Real Graph score increases the chances of including a Tweet in a user’s feed.
Recently, Twitter has phased out the Fanout Service, which had been in use for over a decade for caching In-Network Tweets. Additionally, efforts are underway to redesign the logistic regression ranking model that had not been updated in several years.
Out-of-Network Sources
Identifying relevant Tweets outside a user’s network poses unique challenges. To determine if a Tweet will resonate with a user who does not follow its author, Twitter employs two main strategies:
- Social Graph Analysis: This method estimates relevance by examining engagements among users that one follows or those with similar interests. By traversing engagement graphs, Twitter can identify which Tweets have been recently engaged with by followed accounts and what other similar content has garnered attention.
- Embedding Spaces: This approach focuses on content similarity by generating numerical representations of user interests and Tweet content. By calculating similarities within this embedding space, Twitter can assess relevance effectively. One significant embedding space utilized is SimClusters, which identifies communities led by influential users through a specialized matrix factorization algorithm.
Ranking Mechanism
The ultimate objective of the For You timeline is to present relevant Tweets to users. At this stage in the pipeline, approximately 1,500 candidates are available for evaluation. Scoring determines each candidate Tweet’s relevance and serves as the primary ranking signal. All candidates are treated equally at this point without regard for their source.
Ranking is accomplished using a neural network with around 48 million parameters that continuously learns from Tweet interactions to optimize for positive engagements such as Likes, Retweets, and Replies. This mechanism considers thousands of features to assign scores to each Tweet based on engagement probability.
Heuristics and Filters
After ranking Tweets, heuristics and filters are applied to ensure a balanced feed. Some key features include:
- Visibility Filtering: Excludes Tweets based on user preferences or blocked accounts.
- Author Diversity: Prevents excessive consecutive Tweets from a single author.
- Content Balance: Maintains an equitable mix of In-Network and Out-of-Network Tweets.
- Feedback-based Fatigue: Reduces scores for Tweets that have received negative feedback.
- Social Proof: Excludes Out-of-Network Tweets lacking connections within a user’s network.
- Conversations: Threads replies with original Tweets for context.
- Edited Tweets: Updates stale content with revised versions when necessary.
Mixing and Serving Content
Once Home Mixer has compiled a selection of Tweets ready for delivery, it blends them with other non-Tweet content such as advertisements and follow recommendations before sending them to user devices.
This entire pipeline operates approximately five billion times daily, completing in under 1.5 seconds on average—though each execution requires significant CPU time.
Future Developments
Twitter remains committed to transparency regarding its recommendation systems. The company has made strides toward openness by releasing code related to its algorithms while planning new features aimed at enhancing user understanding of why certain content appears in their timelines. Upcoming initiatives include:
- An improved analytics platform for creators offering deeper insights into reach and engagement.
- Enhanced transparency regarding safety labels applied to accounts or tweets.
- Increased visibility into the factors influencing tweet appearances on timelines.
As Twitter continues evolving as a hub for global conversations—serving over 150 billion tweets daily—the challenge remains to deliver optimal content while exploring innovative enhancements within its recommendation systems.
Read more such articles from our Newsletter here.