How does a tweet get from the author into the home timelines of all their followers? Three strategies. None is obviously right.
Pull (fanout-on-read)
When you load your home timeline, the Timeline Service:
- Looks up everyone you follow (say 500 people).
- Queries each of their tweet stores for the latest N tweets.
- Merges, sorts by time, returns the top 20.
Pros: Trivial writes. Posting a tweet is one DB insert. No precomputed state.
Cons: Reads are expensive. 500K reads/s × 500 follows = 250M downstream reads/s. Latency is unpredictable (slowest shard wins). Falls apart at scale.
Good for: a system where users follow few people, or read timelines rarely. Not Twitter.
Push (fanout-on-write)
When you post a tweet, a worker pushes the tweet ID into a precomputed list — your home timeline cache — for each of your followers.
TweetCreatedevent hits the fanout worker.- Worker looks up your followers (say 200).
- Worker does 200 writes — one per follower’s
home_timeline:{user_id}Redis list.
Reads become trivial: LRANGE home_timeline:{me} 0 19 → done in 1ms.
Pros: Read latency is constant and tiny. The 100:1 read:write ratio means doing more work on the rare write to save it on the common read is the right trade.
Cons: Write amplification. The celebrity problem — Taylor Swift has 95M followers. One tweet = 95M cache writes. Storage waste — most followers won’t read most tweets you push to them.
Hybrid (the actual answer)
Push for normal users; pull for celebrities.
- Define a celebrity threshold (e.g. > 1M followers).
- Normal users: tweets are fanned out to followers’ timeline caches at write time.
- Celebrities: tweets are not fanned out. Their tweets stay in their user timeline only.
- At read time, the Timeline Service merges (a) your precomputed timeline cache with (b) live-pulled recent tweets from any celebrities you follow.
home_timeline = merge(
redis.lrange("home_timeline:" + me, 0, 19), // pushed
union(celebrity.recent() for c in my_celebs) // pulled
)
This bounds both sides:
- Write fanout cost is bounded — no user has >1M push targets.
- Read merge cost is bounded — most users follow few celebrities.
The numbers that justify it
Without the hybrid, fanout writes from celebrities dominate everything:
1000 celebs × 10 tweets/day × 50M avg followers
= 500B fanout writes/day
= 5.7M writes/s
That’s more than your entire write QPS budget, just from celebrities. The hybrid cuts this to zero for the celebrity tier and pushes the cost to read time, where caching the celebrity’s own user timeline absorbs it.
What to say in the interview
State the three options. Reject pure pull on read latency. Reject pure push on the celebrity problem with one number (“Taylor Swift = 95M writes per tweet”). Land on hybrid. Define the threshold. Show the merge happening at read time.
This is the conversation the interviewer is waiting for.