When you build a RAG system over sensitive data — medical records, legal documents, financial reports — every query is a potential privacy leak. The user's question reveals intent; the retrieved chunks reveal data. Privacy-preserving retrieval addresses both threats without sacrificing utility.
The Threat Model
Before choosing a technique, be precise about what you're protecting against:
- Query privacy: Server shouldn't learn what the user searched for.
- Content privacy: Model provider shouldn't see document contents.
- Membership inference: Adversary shouldn't confirm whether a specific record exists in the index.
- Reconstruction attacks: From embeddings alone, can an attacker reconstruct the original text?
Embeddings are NOT anonymous
Research shows that ~80% of sentences can be reconstructed from their embeddings alone using inversion attacks. Never store raw embeddings of PII without additional protection.
Private Information Retrieval (PIR)
PIR lets a client retrieve a record from a database without the server learning which record was requested. The mathematical guarantees come from cryptography:
- Computational PIR: Uses homomorphic encryption. Client encrypts the query, server computes on ciphertext, returns encrypted result. Server sees nothing.
- ORAM (Oblivious RAM): Client accesses a re-shuffled, encrypted data store. Even access patterns are hidden.
Differential Privacy for Embeddings
Add calibrated Gaussian noise to query embeddings before sending to a retrieval service. The noise is small enough that semantically similar queries still retrieve similar results, but the exact query cannot be recovered:
Pythonimport numpy as np def privatise_embedding(embedding: np.ndarray, epsilon: float = 1.0) -> np.ndarray: """Add Gaussian noise calibrated to (epsilon, delta)-DP guarantee.""" sensitivity = 2.0 # L2 sensitivity of normalised embeddings delta = 1e-5 sigma = sensitivity * np.sqrt(2 * np.log(1.25 / delta)) / epsilon noise = np.random.normal(0, sigma, embedding.shape) noisy = embedding + noise return noisy / np.linalg.norm(noisy) # re-normalise query_emb = encoder.encode("Patient John Doe's last HbA1c result") private_q = privatise_embedding(query_emb, epsilon=0.5) results = vector_db.search(private_q, top_k=5)
Federated RAG
Instead of centralising documents in one vector DB, each data owner runs their own retriever locally. The orchestrator sends the query to all nodes, each returns anonymised, top-k results with confidence scores, the orchestrator merges them — no raw documents ever leave their origin.
| Technique | Protection | Utility Cost | Complexity |
|---|---|---|---|
| DP Embeddings | Query privacy | ~2-5% recall drop | Low |
| Federated RAG | Content privacy | Latency overhead | Medium |
| Crypto PIR | Full query privacy | 10–100x slower | High |
| ORAM | Access pattern | Significant overhead | High |