Privacy-preserving Retrieval — Complete Guide (5)

When you build a RAG system over sensitive data — medical records, legal documents, financial reports — every query is a potential privacy leak. The user's question reveals intent; the retrieved chunks reveal data. Privacy-preserving retrieval addresses both threats without sacrificing utility.

The Threat Model

Before choosing a technique, be precise about what you're protecting against:

Query privacy: Server shouldn't learn what the user searched for.
Content privacy: Model provider shouldn't see document contents.
Membership inference: Adversary shouldn't confirm whether a specific record exists in the index.
Reconstruction attacks: From embeddings alone, can an attacker reconstruct the original text?

⚠️

Embeddings are NOT anonymous

Research shows that ~80% of sentences can be reconstructed from their embeddings alone using inversion attacks. Never store raw embeddings of PII without additional protection.

Private Information Retrieval (PIR)

PIR lets a client retrieve a record from a database without the server learning which record was requested. The mathematical guarantees come from cryptography:

Computational PIR: Uses homomorphic encryption. Client encrypts the query, server computes on ciphertext, returns encrypted result. Server sees nothing.
ORAM (Oblivious RAM): Client accesses a re-shuffled, encrypted data store. Even access patterns are hidden.

Differential Privacy for Embeddings

Add calibrated Gaussian noise to query embeddings before sending to a retrieval service. The noise is small enough that semantically similar queries still retrieve similar results, but the exact query cannot be recovered:

Python
import numpy as np

def privatise_embedding(embedding: np.ndarray, epsilon: float = 1.0) -> np.ndarray:
    """Add Gaussian noise calibrated to (epsilon, delta)-DP guarantee."""
    sensitivity = 2.0               # L2 sensitivity of normalised embeddings
    delta       = 1e-5
    sigma = sensitivity * np.sqrt(2 * np.log(1.25 / delta)) / epsilon
    noise = np.random.normal(0, sigma, embedding.shape)
    noisy = embedding + noise
    return noisy / np.linalg.norm(noisy)  # re-normalise

query_emb  = encoder.encode("Patient John Doe's last HbA1c result")
private_q  = privatise_embedding(query_emb, epsilon=0.5)
results    = vector_db.search(private_q, top_k=5)

Federated RAG

Instead of centralising documents in one vector DB, each data owner runs their own retriever locally. The orchestrator sends the query to all nodes, each returns anonymised, top-k results with confidence scores, the orchestrator merges them — no raw documents ever leave their origin.

Technique	Protection	Utility Cost	Complexity
DP Embeddings	Query privacy	~2-5% recall drop	Low
Federated RAG	Content privacy	Latency overhead	Medium
Crypto PIR	Full query privacy	10–100x slower	High
ORAM	Access pattern	Significant overhead	High

Privacy RAG Federated Learning Differential Privacy Security

← Back Portfolio Home Let's talk → Get in Touch with Junaid

Back to Portfolio

Privacy-preserving Retrieval — Complete Guide (5)

The Threat Model

Embeddings are NOT anonymous

Private Information Retrieval (PIR)

Differential Privacy for Embeddings

Federated RAG

Related Articles