2. Capacity estimation

Capacity estimation is not a math test. It’s a way to justify your later choices: “we need a CDN because media bandwidth is X GB/s,” “we need to shard because storage is Y PB/year.”

Round aggressively. Nobody cares whether it’s 4,200 or 5,000 QPS.

Anchor numbers

Start with one concrete user count. Pick something defensible and derive everything from it:

Everything else falls out.

Write QPS

300M users × 0.5 tweets/day = 150M tweets/day
150M / 86,400s ≈ 1,700 tweets/s average
Peak ≈ 3× average ≈ 5,000 tweets/s

Five thousand writes per second is small for any modern datastore — a managed DynamoDB table or a single well-tuned Postgres handles it without breaking a sweat. The interesting problems are reads and storage, not writes.

Read QPS

300M × 50 reads/day = 15B reads/day
15B / 86,400 ≈ 175,000 reads/s average
Peak ≈ 500,000 reads/s

Half a million reads per second is the number that drives everything: caching, fanout, read replicas.

Read:write ≈ 100:1. Keep this in your head.

Storage

Per tweet:

150M tweets/day × 1 KB = 150 GB/day of tweet text
× 365 = ~55 TB/year

Manageable. But media dominates:

~10% of tweets have media, average 200 KB
15M media tweets/day × 200 KB = 3 TB/day
× 365 = ~1 PB/year

This is why media goes to blob storage (S3) behind a CDN, never inline in your tweet records.

Bandwidth

Outbound timeline traffic:

500K reads/s × 20 tweets/page × 1 KB ≈ 10 GB/s of text

Plus media served by CDN — orders of magnitude more, but that’s the CDN’s problem, not your origin’s.

What this buys you

When you later say “we need Redis to cache home timelines,” you can point at 500K reads/s and say “the database can’t take that.” When you say “media goes to S3 + CloudFront,” you can point at 1 PB/year. The numbers turn architectural choices from opinions into consequences.