Chapter 8 set up the user_counts:{user_id} cache with an is_celebrity flag. This chapter is the write side: what the fanout worker does once it has that flag.
Architecture
flowchart TD
Author(["Author"])
TS["Tweet Service
(Fargate)"]
DB[("DynamoDB
— tweets —")]
K["Kafka
(TweetCreated)"]
FW["Fanout Worker
(Fargate)"]
UC[("Redis
user_counts:{user_id}
{follower_count, is_celebrity}")]
RCEL[("Redis
user:{user_id}:tweets
(celebs)")]
RPUSH[("Redis
home_timeline:{user_id}
(non-celebs)")]
Author -->|"POST /tweets"| TS
TS --> DB
DB -.->|"DynamoDB Stream"| K
K --> FW
FW -->|"read is_celebrity"| UC
FW -->|"add to celeb feed if celeb"| RCEL
FW -->|"prepend per follower if not celeb"| RPUSH
Why pure push collapses
Chapter 6 lands on push as the baseline — at write time, the fanout worker prepends the tweet ID onto each follower’s home_timeline:{user_id} Redis list. That’s bounded for normal users. For a celebrity, it isn’t:
1000 celebs × 10 tweets/day × 50M avg followers
= 500B fanout writes/day
= 5.7M writes/s
That’s more than the entire system’s write QPS budget, just from celebrities. Pure push doesn’t survive contact with the data.
DynamoDB partitions by partition key; Redis stores a value per key. Both scale horizontally only when work spreads across keys. A celebrity collapses that — their fanout writes touch tens of millions of follower timelines for a single tweet, and adaptive capacity inside DynamoDB only isolates a hot partition from its neighbors; it doesn’t lift the per-second cap on the partition itself. The job is to keep traffic off the hot key, not to make the hot key faster.
Hybrid push/pull
Split the user base by the is_celebrity flag (chapter 8) and pick a different strategy on each side:
- Normal users: tweets are fanned out to followers’ timeline caches at write time (chapter 6’s push path).
- Celebrities: tweets are not fanned out to follower timelines. Instead, the fanout worker adds the tweet ID to a per-author sorted set in Redis (
user:{user_id}:tweets, scored by Snowflake ID), trimming to the last ~800 entries. Followers pull from that set at read time (chapter 10).
The fanout worker reads is_celebrity from user_counts:{author_id} once per TweetCreated event — a single field read on a key that’s almost always cache-warm — and branches on the flag.
This bounds both sides:
- Write fanout cost is bounded — no user has > 1M push targets.
- Read merge cost is bounded — most users follow few celebrities.
The hybrid cuts fanout writes from celebrities to zero and pushes the cost to read time, where a per-celebrity cache absorbs it.
Where the threshold actually sits
A million is a round number, not a derivation. The real threshold is “wherever fanout cost stops being amortizable across the user’s tweet rate.” A user with 2M followers who tweets twice a year is cheaper to push than a user with 200K followers tweeting 50 times a day. Production systems tier this dynamically — recompute per-user when follower count changes, and special-case the long tail of accounts that flip between tiers.
Recently-promoted celebrities
If a user crosses the threshold, their old tweets are still sitting in followers’ pushed timelines (and their new ones aren’t). That’s fine — the pushed entries age out as the list trims to its cap. The pull side (chapter 10) picks up new tweets immediately.