WPRC 2026#046

Large-Scale MPC: Scaling Private Iris Code Uniqueness Checks to Millions of Users

WPRC-046· SG· 2025. 11· PRIVACY

Large-Scale MPC: Scaling Private Iris Code Uniqueness Checks to Millions of Users

A system that checks whether an iris has appeared before -- without revealing any biometric data -- while running fast enough for real-world, millions-scale deployments.

Contributors

RongxinWPRC

The WhitePaper Reading Club Privacy Hub \| Research Day [02]	20 Nov 2025
Large‑Scale MPC: Scaling Private Iris Code Uniqueness Checks to Millions of Users	[Rongxin]

Summary

A system that checks whether an iris has appeared before—without revealing any biometric data—while running fast enough for real-world, millions-scale deployments.

Why This Is Important

It enables organizations like World ID and humanitarian groups to enforce one-human-one-signup securely, preventing duplicate registrations without exposing sensitive biometric datasets to abuse or breaches.

Key Innovation

Highly-optimized MPC protocol that “changes” iris matching as efficient dot-products in larger rings—avoiding expensive bitwise MPC—and executes the entire MPC pipeline on GPUs with direct network access, achieving a >1000× speedup over prior work (Janus). Uses masked bit representations, Shamir sharing over Galois rings, and GPU-driven MPC with direct network access to scale to 4.29 billion comparisons/second.

Overview

(i) Changes iris matching by converting masked Hamming distance into efficient dot products, enabling honest-majority MPC (ABY3/Shamir) to compute uniqueness with minimal communication and without exposing raw biometric data. (ii) Galois-ring Shamir sharing packs bits tightly and reduces share sizes from 32·s·l to 16·s·l, preserving compatibility with MSB-based comparisons while avoiding the inefficiencies of traditional bitwise MPC. (iii) The system achieves 690k comparisons/s per CPU core and scales to 4.29B comparisons/s on 24 H100 GPUs by running the entire MPC pipeline directly on GPUs with NCCL-driven network access, eliminating CPU–GPU transfer overhead. (iv) Compared to Janus (≈400 comparisons/s with SHE), the protocol achieves a ~1,725× CPU-only improvement and comfortably meets Worldcoin’s production requirement of handling 10 queries/sec over databases exceeding 10 million iris codes. (v) Unlike centralized biometric systems that store sensitive templates on a single server, this approach distributes iris data across MPC parties, preventing misuse, reducing attack surface, and enabling privacy-preserving deduplication for real-world deployments.

Background

(i) Iris Codes (~12,800 bits entropy) are generated via Daugman’s algorithm, with masks removing unstable regions (e.g., eyelashes). Matching uses normalized Hamming distance under a threshold. (ii) MPC splits data into shares held by independent parties; honest-majority protocols (ABY3, Shamir) allow communication-efficient dot products, which are crucial for large biometric databases. (iii) Prior systems relied on garbled circuits (too much communication), homomorphic encryption (slow), or TEEs (frequently broken by side channels). (iv) Galois rings Z₂ᵏ[X]/Q[X] extend Shamir sharing to power-of-two rings, enabling efficient arithmetic and smooth binary integration for MSB extraction without switching fields.

Team

(i) Daniel Kales (TACEO) — TU Graz; prior: secure computation research, MPC frameworks & Roman Walch (TACEO) — TU Graz; prior: MPC protocol engineering, privacy-preserving ML (ii) Remco Bloemen (Worldcoin Foundation) — MSc Leiden; ex-Google, ex-Chainalysis; blockchain & cryptography engineering & Philipp Sippl (Worldcoin Foundation) — TU Graz; prior: distributed systems, scalable biometric systems (iii) Bryan Gillespie (Inversed Tech) — UC Berkeley; ex-Apple, ex-Cruise; GPU systems, high-performance compute design

Deep Dives

1. Masked Bitvector Representation	Raw XOR + popcount is extremely costly in MPC: (i) Encodes iris+mask using {T = −1, U = 0, F = 1}, merging bit + mask into one state. (ii) Hamming distance reduces to a single dot-product: ⟨c′, C′ᵢ⟩ > (1 − 2·MATCH_RATIO)·ml. (iii) Eliminates two full summations and all bitwise circuits. (iv) Enables vectorized CPU/GPU processing.
2. Galois-Ring Shamir Sharing	Millions of dot-products grow total cost; naive ABY3 2X share size: (i) Uses Z₂¹⁶[X]/(X²−X−1) with 4 exceptional points for efficient 3-party Shamir sharing. (ii) Packs two bits as gᵢ = c₂ᵢ + c₂ᵢ₊₁X; constant term of multiplication gives dot-product contribution. (iii) Precomputing λ·[gᵢ] removes the need for full polynomial arithmetic. (iv) Halves database size vs ABY3 (32·s·l → 16·s·l) and requires only 1 dot-product instead of 2.
3. Lifting Secret Shared Masks	When masks are private, ml cannot be multiplied with threshold directly. (i) Approximates threshold as a/b with b = 2ᵐ for efficient lifting. (ii) Lifts from Z₂ⁿ → Z₂ⁿ⁺ᵐ via bit extraction/injection; lifting is free (multiplying without reduction). (iii) Ensures MSB(a·ml − b·hd) matches the true comparison while avoiding costly recomputation in larger rings.
4. MSB Extraction Circuit	Final decision is “is this iris similar?” which must not leak hd or ml: (i) Uses a ripple-carry adder needing 29 AND gates in 15 rounds for 16-bit values. (ii) Works directly in power-of-two rings, avoiding prime-field modular reductions. (iii) Collected via OR-tree to hide which entry matched.
5. GPU-Native MPC Arch	CPU-only MPC cannot handle multi-million-entry databases at required speeds: (i) Entire MPC runs on GPUs using NCCL for direct NIC access—no CPU involvement. (ii) Dot-products mapped to 3 int8 GEMM calls (for 16-bit rings), maximizing tensor-core throughput. (iii) Comparison rounds remain on GPU across 15+ communication rounds. (iv) Achieves 22.46B dot-products/sec and 4.29B full iris comparisons/sec across 8×H100 GPUs per MPC party.

References

Large‑Scale MPC: Scaling Private Iris Code Uniqueness Checks to Millions of Users: https://arxiv.org/abs/2405.04463