8. Sending media

Architecture

flowchart TD
Client_A(["Client A"])
Media["Media Service
(Fargate)"]
Blob[("S3
— object storage —")]
GW["Edge Gateway
(EC2, WebSocket)"]
MS["Message Service
(Fargate)"]
DB[("Message Store
(DynamoDB)")]
Q[/"SQS
Transcode Queue"/]
W["Transcode Worker
(Fargate)"]

Client_A -->|"① ask for upload URL"| Media
Media -->|"② signed PUT URL"| Client_A
Client_A -->|"③ PUT bytes"| Blob
Client_A -->|"④ SEND msg with media_id"| GW
GW --> MS
MS --> DB
Blob -->|"⑤ enqueue"| Q
Q --> W
W -->|"⑥ thumbnails / transcode"| Blob

Two paths, not one

Media bytes do not travel through the message service. Pushing a 5 MB voice note through the same socket that handles 100-byte text frames would clog every gateway in the fleet. Instead, the client uploads media directly to S3 and sends a tiny message that points at the object key.

This is the same shape the Twitter series used for media — and for the same reason. Different size class, different throughput pattern, different storage system.

The upload flow

  1. Client asks for an upload slot. It hits the Media Service over normal HTTPS with the file’s content type, size, and a content hash.
  2. Service returns a signed PUT URL to S3, time-limited (a few minutes) and scoped to one object key it generated. The signed URL embeds the auth — the client uploads directly to S3 without the service proxying bytes.
  3. Client PUTs the bytes straight to S3. This is where the bandwidth lives. The Media Service sees none of it.
  4. Client sends the message with media_id referencing the object the service handed out, plus a small precomputed thumbnail inline if it’s an image. The text path is now back to ~1 KB messages.

The content hash matters for two reasons. First, deduplication: if the same image has been uploaded before (forwarded WhatsApp memes are a real thing), the service can return an existing media_id and skip the upload entirely. Second, integrity: the receiving client can verify the bytes it pulled from the blob store match what the sender meant to send.

Why signed URLs

The lazy alternative is “client uploads to the Media Service, service uploads to the blob store.” It’s one fewer round trip but it puts every byte through your own boxes. At the volumes from chapter 2 — over a petabyte a day — you’d be running a very expensive proxy.

Signed URLs let S3 handle the upload directly. The Media Service touches only metadata (a small DynamoDB row per media_id: object key, content hash, size, owner, created_at). S3 is built for bulk byte transfer; the Media Service isn’t.

What gets stored

S3 holds the original object plus, after async processing, derived versions: thumbnails for images, lower-bitrate variants for voice notes, poster frames for video. These derivatives are produced by transcode workers running on Fargate, pulling from an SQS queue that S3 publishes to (via S3 event notifications) when a new upload lands. Lambda would seem like the obvious fit for event-driven work, but a single video can run past its 15-minute limit; Fargate has no such cap and the spend pattern is similar.

The message row in the message store (chapter 4) carries:

media_id        — pointer into the blob store
media_type      — image / voice / video
media_meta      — duration, dimensions, mime
thumbnail       — small inline blob for images, optional

The recipient sees the message immediately on delivery (chapter 5), with the thumbnail or a placeholder. The full media is fetched lazily when they tap to view, via a signed GET URL the Media Service produces on demand.

Receiving media

The receive path is symmetric. The recipient’s client gets the message frame with media_id, asks the Media Service for a signed GET URL, and pulls bytes directly from S3. The Message Service never moves a media byte; the Media Service brokers permission but doesn’t proxy data.

For voice notes the client may stream-play while downloading. For video, the recipient hits a CloudFront edge in front of the S3 bucket rather than the origin — at WhatsApp scale the same forwarded video gets pulled millions of times.

Lifecycle

Media outlives the message in the pending queue but not necessarily forever. Common policy: keep the original for a window (months), keep the thumbnail longer, transition cold objects to S3 Glacier. The message row keeps the media_id regardless; if the object has been evicted, the client gets a “media expired” placeholder.

This is also the path where retention and legal-hold policies live, not on the message rows. Bytes are heavy; metadata is cheap.