Anchor numbers
Start with one defensible user count and derive everything from it:
- DAU: 2B
- Messages sent per user per day: 40
- Average group size for group messages: 10
Everything else falls out.
Message QPS
2B users × 40 messages/day = 80B messages/day
80B / 86,400s ≈ 1M messages/s average
Peak ≈ 3× average ≈ 3M messages/s
Three million writes per second is the headline write number — but most of those writes are tiny rows. The real weight comes from fanout: every group message writes to N recipient queues, so effective writes are several times higher than the message count.
Connections
This is the number that makes WhatsApp WhatsApp:
~2B clients hold a persistent connection open
Half-open TCP/WebSocket connections, mostly idle, kept alive with cheap heartbeats. A general-purpose web server tops out somewhere in the tens of thousands of concurrent connections per box. Hundreds of millions of connections means hundreds of thousands of edge gateways, each tuned to hold ~1M idle sockets. This is why the gateway tier is its own thing in chapter 4 — you can’t do this on the same boxes that serve REST.
Storage
Per text message:
- Body: ~100 bytes (most messages are short)
- Metadata (id, sender, conversation, timestamps): ~200 bytes
- Total: ~300 bytes, round to 0.5 KB
80B messages/day × 0.5 KB = 40 TB/day of message text
× 365 = ~15 PB/year
Messages are kept until the recipient picks them up; once delivered they can usually be evicted from the hot pending store, but most products retain history server-side for a window. Either way, text is large enough that it lives on a sharded, partitioned datastore, not a single relational database.
Media dominates by a wide margin:
~10% of messages carry media, average 200 KB
8B media messages/day × 200 KB ≈ 1.5 PB/day
Media bytes go to blob storage on a separate path (chapter 8). Treat them as an entirely different system than the message rows.
Read pattern
Unlike Twitter, there’s almost no read fanout. Each message is read by a small number of people — one for 1:1, ~10 for the average group. The hard part is delivery push, not query QPS.
Numbers at a glance
flowchart TD DAU["2B DAU"] Conn["~2B connections long-lived"] W["~3M messages/s peak"] S1["~15 PB/yr message text"] S2["~500 PB/yr media"] DAU --> Conn DAU --> W W --> S1 & S2