2. Capacity estimation

Anchor numbers

Start with one concrete user count. Pick something defensible and derive everything from it:

Everything else falls out.

Write QPS

300M users × 0.5 tweets/day = 150M tweets/day
150M / 86,400s ≈ 1,700 tweets/s average
Peak ≈ 3× average ≈ 5,000 tweets/s

Five thousand writes per second is small for any modern datastore — a managed DynamoDB table or a single well-tuned Postgres handles it without breaking a sweat. The interesting problems are reads and storage, not writes.

Read QPS

300M × 50 reads/day = 15B reads/day
15B / 86,400 ≈ 175,000 reads/s average
Peak ≈ 500,000 reads/s

Half a million reads per second is the number that drives everything: caching, fanout, read replicas.

Read:write ≈ 100:1. Keep this in your head.

Storage

Per tweet:

150M tweets/day × 1 KB = 150 GB/day of tweet text
× 365 = ~55 TB/year

Manageable. But media dominates:

~10% of tweets have media, average 200 KB
15M media tweets/day × 200 KB = 3 TB/day
× 365 = ~1 PB/year

Media dwarfs text by ~20×. It needs a different storage path than tweet records.

Bandwidth

Outbound timeline traffic:

500K reads/s × 20 tweets/page × 1 KB ≈ 10 GB/s of text

Plus media served by CDN — orders of magnitude more, but that’s the CDN’s problem, not your origin’s.

Numbers at a glance

flowchart TD
DAU["300M DAU"]
W["~5K writes/s
tweet creation"]
R["~500K reads/s
home timeline"]
S1["~55 TB/yr
text storage"]
S2["~1 PB/yr
media storage"]
BW["~10 GB/s
outbound text"]

DAU --> W & R
W --> S1 & S2
R --> BW