Capacity estimation is not a math test. It’s a way to justify your later choices: “we need a CDN because media bandwidth is X GB/s,” “we need to shard because storage is Y PB/year.”
Round aggressively. Nobody cares whether it’s 4,200 or 5,000 QPS.
Anchor numbers
Start with one concrete user count. Pick something defensible and derive everything from it:
- DAU: 300M
- Tweets per user per day: 0.5 (most people lurk)
- Timeline reads per user per day: 50
Everything else falls out.
Write QPS
300M users × 0.5 tweets/day = 150M tweets/day
150M / 86,400s ≈ 1,700 tweets/s average
Peak ≈ 3× average ≈ 5,000 tweets/s
Five thousand writes per second is small for any modern datastore — a managed DynamoDB table or a single well-tuned Postgres handles it without breaking a sweat. The interesting problems are reads and storage, not writes.
Read QPS
300M × 50 reads/day = 15B reads/day
15B / 86,400 ≈ 175,000 reads/s average
Peak ≈ 500,000 reads/s
Half a million reads per second is the number that drives everything: caching, fanout, read replicas.
Read:write ≈ 100:1. Keep this in your head.
Storage
Per tweet:
- Text: 280 chars ≈ 280 bytes
- Metadata (id, user_id, timestamps, likes/retweets/views counters): ~300 bytes
- Total: ~600 bytes, round to 1 KB
150M tweets/day × 1 KB = 150 GB/day of tweet text
× 365 = ~55 TB/year
Manageable. But media dominates:
~10% of tweets have media, average 200 KB
15M media tweets/day × 200 KB = 3 TB/day
× 365 = ~1 PB/year
This is why media goes to blob storage (S3) behind a CDN, never inline in your tweet records.
Bandwidth
Outbound timeline traffic:
500K reads/s × 20 tweets/page × 1 KB ≈ 10 GB/s of text
Plus media served by CDN — orders of magnitude more, but that’s the CDN’s problem, not your origin’s.
What this buys you
When you later say “we need Redis to cache home timelines,” you can point at 500K reads/s and say “the database can’t take that.” When you say “media goes to S3 + CloudFront,” you can point at 1 PB/year. The numbers turn architectural choices from opinions into consequences.