Hardware‑Attested Ad Insertion for Encrypted Video
Traditional video monetization assumes the server can stitch ads into the stream. In an E2EE messenger like WhatsApp, the server never sees plaintext by design. This essay lays out a path to extend monetization to some person‑to‑person video inventory without moving decryption server‑side by combining:
- on‑device ad insertion (send‑time stitching or playback‑time interception),
- viral vs personal discrimination using privacy‑preserving diffusion signals, and
- a Trusted Execution Environment (TEE) trust layer that produces hardware‑verifiable brand‑safety attestations.
Two Scope Conditions
For monetization, we may need two things. On the theory that people don't want to see ads on personal videos, we may need a way to distinguish personal videos from public videos. And given brands are unwilling to gamble on brand safety, we need a way to identify videos that are safe for brands to advertise on.
Personal vs. Public
One practical way to identify public videos is via perceptual hashing computed on-device, paired with server‑side diffusion analysis:
- Device computes a perceptual hash
H(video)(robust to small transcodes). - Device reports hash only (not video) to a “viral scoring” service.
- Server computes a “viral score” based on how broadly
Happears across disconnected clusters, regions, and time windows. - Only once
Hcrosses a threshold does the video become ad‑eligible.
Brand Safety
Advertisers won’t pay premium video CPMs without credible brand safety guarantees. In an E2EE context, the core question is how can a server trust “this content is safe for ads” if it never sees the content? We propose a TEE‑backed trust layer that (a) runs isolation‑sensitive security logic on device and (b) produces a cryptographic attestation the server can verify.
TEEs is isolated execution areas designed to protect secrets from other software on the host (including privileged components like the OS/hypervisor/firmware). TEE can support attestability—evidence of origin/state signed by hardware to support trust decisions and replay protection. On phones, the hardware primitives vary:
- ARM TrustZone provides hardware‑enforced isolation in the CPU used widely to protect high‑value code/data (ARM).
- On Apple platforms, Secure Enclave is explicitly positioned as a hardware‑based key manager, and Apple provides attestation services (e.g., DeviceCheck/App Attest) that allow an app to assert validity/integrity characteristics to a server (Apple Support; Apple Developer).
- On Android, the most available primitive is hardware‑backed key attestation. Google’s documentation describes key attestation as a way to gain confidence keys are stored in a device’s hardware‑backed keystore and to interpret certificate extension data; AOSP documents it as remotely verifiable evidence of key existence/configuration in secure hardware (Android Developers).
TEE‑backed does not mean every platform lets third‑party apps run arbitrary ML inside the enclave. In practice, you combine isolated execution where available to raise the cost of tampering.
Safety Classification and Attestation
A safety model evaluates sampled frames locally and a hardware‑protected signing key emits an attestation that binds:
- model identity/version,
- result (e.g., SAFE/UNSAFE + coarse categories),
- freshness (timestamp/nonce),
- and device/app integrity signals (platform‑dependent),
- all signed in a way the server can verify.
Diagram: trust layer pipeline
On-device trust layer (TEE-backed primitives)
┌──────────────────────────────────────────────────────────────────────┐
│ Sample frames ──▶ Safety model ──▶ Result ──▶ Sign attestation │
│ (violence/adult/hate/etc.) (HW key) │
└──────────────────────────────────────────────────────────────────────┘
▲ │
│ plaintext stays local ▼
(no upload of frames) attestation blob only
Attestation doesn’t prove the model is perfect. It proves the server is not taking a client’s word for it. Instead, the server verifies that:
- the statement was signed by a hardware‑protected key, and
- the key is bound to a device/app posture consistent with “not obviously tampered.”
Attestation shape
A minimal, privacy‑aware blob:
{
"model_id": "wa_safety_v3",
"model_hash": "sha256:…",
"verdict": "SAFE",
"categories": ["none"],
"nonce": "server_challenge_128bit",
"ts": "2026-01-16T12:34:56Z",
"sig": "device_attested_signature"
}
Server verification checks:
- signature validity / certificate chain,
- model allowlist,
- nonce freshness (anti‑replay),
- timestamp window,
- device/app integrity posture (platform‑specific).
This is directly aligned with the “attestability + freshness” considerations NIST calls out for TEEs (NIST Publications).
Ad decisioning without content
Send a compact on-device embedding (semantic representation) only if:
- the video is public media, and
- the trust layer attests it is safe.
The ad server receives:
- embedding vector,
- attestation,
- coarse context (locale, ad prefs),
and returns: - an ad asset (pre‑encoded variants).
WhatsApp doesn't consider communications with Meta services E2EE (here) so the embedding/ad request channel must be treated and communicated as a separate, consented service interaction, not as part of the E2EE chat channel.
Send‑time stitching vs. playback interception
For encrypted MP4s sent person‑to‑person, the cleanest strategy is: don’t modify the file. Modify playback. Flow:
- Recipient taps video.
- Player computes/looks up hash viral score (often precomputed on receipt).
- Trust layer produces/refreshes safety attestation.
- If eligible → play Ad then Video; else → play Video.
Tap video
│
├─▶ viral? (hash diffusion)
├─▶ safe? (TEE-backed attestation)
└─▶ if yes: [Ad] → [Video]
if no: [Video]
Where the sender is acting as a publisher (Channels/Status) or explicitly opts into monetization, send‑time stitching is viable:
- Trust layer attests safety.
- Device requests ad (embedding + attestation).
- Device stitches
[Ad][Content]. - Combined asset is encrypted and sent via the normal E2EE media path.