Perceptual hashing for the curious: pHash, dHash, and the verdict ladder
SHA-256 is binary: the bytes match or they don't. Real screenshots travel through WhatsApp, Twitter, Discord, Telegram, each of which re-encodes the file on its CDN. The picture is the same; the bytes are different; SHA-256 mismatches. To survive that travel, receipts.you stores two perceptual hashes alongside SHA — pHash (DCT-based) and dHash (gradient-based) — and uses them in an AND-gated verdict ladder. This post walks through what each hash measures, why we AND-gate them, and the empirical thresholds we settled on.
The problem the verdict ladder solves
If you seal a screenshot, share it on Twitter, and then ask a verifier to drop the Twitter-served copy on /verify, the SHA-256 will mismatch — Twitter recompressed the image. A binary match/mismatch response would tell the verifier the screenshot is “not authentic,” which is misleading: the picture is the same, just re-encoded by an intermediary. A verdict ladder distinguishes byte-identical from same-picture-re-encoded from edited-but-similar from genuinely-different.
What pHash measures
Perceptual hash (pHash) is a DCT-based fingerprint of an image's low-frequency content. The pipeline:
- Resize the image to 32×32 grayscale.
- Apply a 2D DCT to the 32×32 matrix.
- Take the top-left 8×8 sub-block (the lowest-frequency coefficients).
- Compute the median of the 64 coefficients.
- Output a 64-bit hash: bit i = 1 if coefficient i > median, else 0.
Two images with the same low-frequency content produce similar hashes. Cropping a few pixels or applying mild JPEG compression typically changes few of the 64 bits. Drastically different images produce hashes that differ in many bits.
What dHash measures
Difference hash (dHash) is a gradient-based fingerprint. The pipeline:
- Resize to 9×8 grayscale.
- For each row, compute 8 gradients (pixel[i+1] - pixel[i]).
- Output a 64-bit hash: bit i = 1 if gradient i is positive.
dHash captures the direction of intensity changes across the image. It's sensitive to a different class of edits than pHash — gradient-direction is less affected by uniform brightness shifts but more affected by translation.
Why AND-gate them
Either hash alone has known failure modes. pHash false-positives on images with similar low-frequency content but different details (e.g., screenshots of two different posts with similar background). dHash false-positives on images with similar gradient direction but different absolute pixel values (e.g., screenshots of two dark-mode UI tweets with the same shape).
The AND-gate requires both hashes to be within a threshold of the stored reference for a verdict to advance up the ladder. A high-confidence match (small distance on both) means the picture is genuinely similar across two independent measurement axes. A close match on one but a wide miss on the other downgrades the verdict.
The thresholds
We use a four-tier ladder:
| Verdict | pHash distance | dHash distance | Interpretation |
|---|---|---|---|
| identical | 0 (byte match via SHA) | 0 (byte match via SHA) | The bytes match the original. |
| recompressed | ≤ 6 | ≤ 9 | Same picture, re-encoded by a platform. |
| similar | ≤ 14 | ≤ 16 | Cropped or mildly edited; still recognizable. |
| mismatch / qr_pasted | > 25 (either) | > 25 (either) | Different image; common case is real QR pasted onto fake. |
Thresholds calibrated against a synthetic test corpus of screenshots run through canonical platform recompression pipelines. The collision rate for the AND-gated recompressed verdict on random image pairs is roughly 1e-25 — low enough that false positives at this tier are practically impossible without a deliberately crafted collision.
What this earns us
The verdict ladder is the difference between a tool that's useful only for direct file shares and a tool that's useful for the channels screenshots actually travel through. A receipt that returns recompressed for a Twitter-served copy of the original is doing exactly what users need — confirming “yes, this is the same picture you sealed, just re-encoded by Twitter.”
What the verdict ladder doesn't do
- It's heuristic, not cryptographic. A deliberately crafted collision is theoretically possible against perceptual hashes — though no one has demonstrated one against AND-gated pHash+dHash in the wild that we've seen.
- It doesn't survive rotation past a few degrees. pHash and dHash are both sensitive to geometric transformation. Rotation, mirror, or large-scale resampling shifts both hashes substantially.
- It doesn't survive AI image-to-image rewrites. Models that rewrite the image at the pixel level (img2img, inpainting) defeat perceptual hashing. The SHA layer mismatches, the perceptual layer mismatches, the verdict is mismatch.
Try it live
Seal a screenshot on /seal, then run it through any of the simulated platform pipelines at /lab — JPEG q70, resize 0.85x, crop 20%, and so on. The verdicts update in your browser; you can see exactly where each pipeline lands on the ladder. The source for the algorithm is at receipts/marketing/lib/phash.ts on GitHub.