5. High-level architecture

Before diving into fanout and caching, sketch the boxes. The interviewer wants to see that you can split the system into services with clear responsibilities.

The components

[Client] -> [CDN] -> [API Gateway / Load Balancer]
                          |
            +-------------+-------------+
            |             |             |
            v             v             v
       Tweet Service  Timeline    User / Graph
       (writes)       Service      Service
            |             |             |
            v             v             v
       Tweet DB     Timeline Cache  Graph DB
       (DynamoDB)   (Redis)         (DynamoDB)
            |
            v
       Fanout Worker -> Kafka -> Timeline Cache
            |
            v
       Search Indexer -> Search Index

Five services, three datastores, one message bus, one cache, one CDN. That’s the whole picture.

The write path (post a tweet)

Client POST /v1/tweets → API Gateway → Tweet Service.
Tweet Service generates a Snowflake ID, writes the tweet to the Tweet DB, and publishes a TweetCreated event to Kafka.
The Fanout Worker consumes the event, looks up the author’s followers via the Graph Service, and pushes the tweet ID into each follower’s home timeline cache.
The Search Indexer also consumes the event and writes to the search index.
The Tweet Service returns 201 to the client. Steps 3–4 are async — the user sees their tweet immediately because the user timeline reads from the DB, not the cache.

The read path (load home timeline)

Client GET /v1/timeline/home → API Gateway → Timeline Service.
Timeline Service reads a list of tweet IDs from the Timeline Cache (Redis) — typically the most recent 800.
It hydrates those IDs by reading tweet bodies from the Tweet DB (or a tweet cache).
Returns the rendered timeline to the client.

The whole hot path is Redis → DynamoDB by key → response. No joins, no scans. That’s how you hit < 200ms p99.

Why split Tweet and Timeline services

Two reasons.

Different load profiles. Tweet Service handles 5K writes/s; Timeline Service handles 500K reads/s. They scale on different curves. Co-locating them means scaling for the worst of both.

Different failure tolerances. A timeline read failure should fall back to “show last cached page.” A tweet write failure must propagate — the user needs to know their tweet didn’t post. Different services let you set different SLOs and circuit breakers.

Why Kafka in the middle

The fanout step is the slowest, riskiest part of a write. A celebrity with 100M followers triggers 100M cache writes. You don’t want the user’s POST /tweets call blocked on that.

Kafka decouples write durability (the tweet is saved) from write propagation (it appears in followers’ timelines). If the fanout worker is slow, tweets queue up; the write API stays fast. If the fanout worker crashes, Kafka retains the events and replay catches up.

What this diagram hides

It hides the celebrity problem, the cold-cache problem, and replication. We’ll get to those. But every later optimization slots into one of these boxes — which is why drawing the boxes first matters.