Presence is everything that lives for seconds: whether a user is online right now, when they were last seen, and whether they are currently typing in a chat. None of it belongs in the message store. None of it should fan out to a user’s entire contact list on every state change. Build it as its own thin system.
Online and last seen
Architecture
flowchart TD
GW["Edge Gateway
(EC2, WebSocket)"]
PS["Presence Service
(Fargate)"]
Cache[("Presence Store
(Redis)
hash per user, TTL")]
Subs[("Subscriptions
(Redis)
set per target, TTL")]
GW2["Other Gateways
(EC2, WebSocket)"]
GW -->|"① connect / heartbeat"| PS
PS -->|"② set online, TTL"| Cache
PS -->|"③ lookup subscribers"| Subs
PS -->|"④ push to viewers"| GW2
When a client connects to a gateway, the gateway tells the Presence Service that the user is online. The Presence Service writes user_id → { state: online, ts: now } to Redis as a hash per user (presence:{user_id}) with a short TTL — say 60 seconds. The gateway sends a heartbeat every ~30 seconds to refresh the TTL. If the heartbeats stop (the socket dies, network drops), the key expires and the user is implicitly offline; last_seen is the last heartbeat timestamp.
No durable write happens for going online or offline. Presence is ephemeral by definition — if the data centre cold-starts, every user is briefly “offline” until their next heartbeat, and that’s correct.
The subscription model
Naive implementation: when user A goes online, push that fact to every contact who has A in their address book. For a heavy user with thousands of contacts, every connect/disconnect storms the network. Don’t do this.
The right model is pull-on-view, then subscribe. When user B opens A’s chat, B’s client tells the Presence Service “subscribe me to A.” The service:
- Returns A’s current presence on the spot.
- Records
B → Ain Redis as a set per target (subs:{A} → {B, …}) with a TTL refreshed while B has the chat open. - Whenever A’s state changes, the service reads the set and pushes
STATUSframes only to those members.
When B leaves A’s chat, the client sends an unsubscribe and the entry is removed. A user’s contact list is potentially huge; the set of people actively staring at their chat is small — usually one or two people.
This is the same logic that makes typing indicators tractable.
Typing
Architecture
flowchart TD
Client_A(["Client A"])
GW_A["Edge Gateway A
(EC2, WebSocket)"]
PS["Presence Service
(Fargate)"]
Subs[("Subscriptions
(Redis)")]
GW_B["Edge Gateway B
(EC2, WebSocket)"]
Client_B(["Client B"])
Client_A -->|"① TYPING
(every ~5s)"| GW_A
GW_A --> PS
PS -->|"② lookup subs"| Subs
PS -->|"③ push"| GW_B
GW_B --> Client_B
The client sends a TYPING frame when the user starts typing in a chat, and refreshes it every few seconds while typing continues. The receiver shows “typing…” as long as the most recent TYPING frame is fresher than the timeout (~6s).
There is no STOPPED_TYPING event. Two reasons:
- Networks lose packets. A stop event that doesn’t arrive leaves the indicator stuck on. A timeout based on the last seen
TYPINGevent self-heals. - Less wire traffic. No paired start/stop per keystroke burst.
Typing events are not stored anywhere. They go through the Presence Service, fan out to subscribers, and are forgotten.
Why this is its own system
Three things separate presence from messaging:
- Ephemerality. Nothing is durable. The store is in-memory with TTLs. No write-ahead log.
- Fanout shape. Messages go to a fixed recipient set (one person or a group). Presence goes to a dynamic, narrow set of current viewers.
- Loss tolerance. A dropped
TYPINGframe is invisible. A dropped message is a bug. Different SLAs allow a much cheaper transport — best-effort, no pending queue, no retries.
Trying to share infrastructure with the message path drags presence into a durability and consistency model it doesn’t need, and inflates the cost of a feature that should be cheap.