11. Push notifications

Architecture

flowchart TD
Client(["Client B
(app backgrounded)"])
GW["Edge Gateway
(EC2, WebSocket)"]
MS["Message Service
(Fargate)"]
Sess[("Session Store
(Redis)")]
PQ[("Pending Queue
(DynamoDB)")]
PN["Push Service
(Fargate)"]
Tokens[("Device Tokens
(DynamoDB)
PK user_id
SK device_id")]
APNs(["APNs / FCM"])

MS -->|"① recipient offline"| Sess
MS --> PQ
MS -->|"② trigger push"| PN
PN --> Tokens
PN -->|"③ send"| APNs
APNs -.->|"④ wake device"| Client
Client -->|"⑤ open socket"| GW

Why push exists at all

The chapter 5 design assumes the recipient eventually reconnects. That works when the app is foregrounded, but a phone in your pocket with the app closed isn’t going to spontaneously dial the gateway. The OS won’t let it — modern mobile OSes aggressively kill background sockets to save battery.

The OS does, however, listen on its own socket to APNs (Apple) or FCM (Google). When a notification arrives, the OS wakes the app long enough to do a small amount of work — fetch the latest messages, post a banner, then go back to sleep. The Push Service is how the server gets a message into APNs/FCM.

This is purely a wake mechanism, not a delivery mechanism. The pending queue from chapter 5 is still the source of truth.

Token registration

When the user installs the app, the OS issues a per-(app, device) token. The client registers it with the Push Service, which stores it in DynamoDB (partition key user_id, sort key device_id):

device_tokens
  user_id, device_id, platform, token, updated_at

DynamoDB partition key user_id, sort key device_id. The only read is “give me this user’s tokens” — single-partition, a handful of rows per user, no joins. A relational table would handle the same shape, but at the scale of every WhatsApp user with one to a few devices each, you’d be sharding it by user_id anyway and getting nothing from the relational features in return.

Tokens rotate. The OS may issue a new one after an app update or a device migration; the old token returns “not registered” when used. The Push Service records the failure and the client resends the current token on next launch.

When the Message Service triggers a push

After writing the message and enqueueing it (chapter 5), the Message Service checks the session store. If the recipient has no live socket, it asks the Push Service to wake them. The Push Service:

  1. Reads the recipient’s device tokens.
  2. Builds a small payload (sender name, a snippet of the message body, the conversation id, and the message id).
  3. Hands the payload to APNs/FCM, which delivers it to the device.

The payload is intentionally small — kilobytes, not megabytes. Long messages and media don’t go in the push; they get pulled normally when the woken app opens its socket and drains the pending queue.

Collapse keys

A user who’s been offline for an hour might have 50 messages waiting. Sending 50 push notifications creates 50 banners. Useless and annoying.

APNs and FCM both support collapse keys: a tag that says “if a notification with this key is already pending on the device, replace it with this one.” Use the conversation id as the collapse key. New messages in the same chat overwrite earlier ones, so the user sees one banner per chat (“Alice sent 5 new messages”) instead of one per message.

For receipts, set a single collapse key per recipient — receipts shouldn’t surface as banners at all, they just need to wake the app so it can drain the queue.

Push is best-effort

Three failure modes to handle:

This is why the pending queue and push are separate: push is best-effort UX, the queue is durable delivery. A design that put messages in the push payload and skipped the queue would lose messages whenever a notification was throttled or dropped.

End of the line

That’s the system. From the persistent connection in chapter 4 through fanout in chapter 9, hot-group caps in chapter 10, and the wake-up of last resort here, every user-facing action has a path. The components introduced — edge gateway, session store, message store, pending queue, receipt service, presence service, group metadata, media service, push service — together make up the architecture this series set out to build.

The deferred pieces (end-to-end encryption, multi-device sync, multi-region, voice/video calls) each warrant their own series. Each one would change the model meaningfully — E2E moves trust to the clients, multi-device fans out to the recipient, and voice calls swap durable messaging for real-time media transport — and they don’t fit honestly into a 45-minute interview budget.