Privacy Computing Overview

Federated learning, differential privacy, secure computation, and the privacy-utility tradeoff

Published: January 2026 | Reading Time: 14 minutes | Category: AI & Machine Learning

Secure data infrastructure representing privacy computing

Privacy regulations (GDPR, CCPA, HIPAA) and user expectations increasingly demand that AI systems handle sensitive data responsibly. Yet training ML models often requires large datasets—and that data might contain personal information you'd rather not expose. Privacy computing techniques enable useful computation on sensitive data while providing mathematical guarantees about what cannot be learned.

This article covers the main privacy-preserving technologies: federated learning, differential privacy, secure multi-party computation, and homomorphic encryption.

Federated Learning

Federated learning trains models across distributed datasets without centralizing the data. The key insight: move the model to the data, not the data to the model.

The Federated Learning Process

1. SERVER: Broadcast model to participating devices
2. DEVICES: Train locally on local data
3. DEVICES: Send model updates (not data) to server
4. SERVER: Aggregate updates (FedAvg or variants)
5. SERVER: Update shared model
6. Repeat until convergence

Google uses federated learning for keyboard prediction on Android—your phone trains locally on your typing patterns and only shares model updates, not your keystrokes.

Federated Averaging (FedAvg)

The basic federated algorithm:

Server:
  w̄_k+1 = Σ(n_i / n) × w_i
  
Where:
  n_i = number of samples on device i
  n = total samples across all devices
  w_i = model update from device i
  
This weighted average combines updates proportional to dataset size

Challenges

Non-IID data: Devices may have very different data distributions
Communication cost: Many rounds of communication required
Privacy leakage: Model updates can leak information about training data
Device heterogeneity: Different devices have varying compute and connectivity

Privacy Leakage in Federated Learning

Model updates are not the same as no information. Research has shown:

Gradient inversion attacks can reconstruct training data from gradients
Membership inference attacks can determine if a sample was in training
Model updates can encode sensitive information

Solution: Combine federated learning with differential privacy or secure aggregation.

Differential Privacy

Differential privacy provides mathematical guarantees that individual data points cannot be identified—even with full access to the model and outputs.

The Definition

A mechanism M provides (ε, δ)-differential privacy if for any 
adjacent datasets D and D' (differing in one record) and any 
output O:

Pr[M(D) ∈ O] ≤ e^ε × Pr[M(D') ∈ O] + δ

Where:
  ε = privacy budget (smaller = more private)
  δ = probability of violating ε guarantee

In simpler terms: removing or changing any single record doesn't significantly change the probability of any output.

The Laplace Mechanism

Add noise to the output calibrated to the sensitivity of the query:

M(D) = f(D) + Laplace(Δf / ε)

Where:
  f(D) = query result (e.g., count, mean)
  Δf = sensitivity (max change from removing one record)
  ε = privacy parameter
  Laplace = Laplace distribution

For a count query, sensitivity is 1 (removing one person changes count by at most 1). For a sum, sensitivity is the maximum value any individual can contribute.

The Gaussian Mechanism

For queries with many outputs or compositions:

M(D) = f(D) + N(0, σ²)

Where σ² ≥ 2 × log(1.25/δ) × (Δf)² / ε²

Gaussian mechanism provides (ε, δ)-DP compared to pure ε-DP for Laplace.

Privacy Budget and Composition

Privacy budget (ε) depletes with each query:

Sequential composition: ε accumulates additively
Parallel composition: ε doesn't accumulate across disjoint datasets
Advanced composition: Better bounds for many queries

        Practical ε values: ε = 1-3 provides strong privacy (research usage). ε = 8-10 provides moderate privacy (product analytics). ε > 50 provides minimal privacy—often used in "anonymization" claims that don't meet academic standards.
    

Apple's Differential Privacy

Apple uses differential privacy for analytics:

Users can opt in to share analytics
Data is collected with ε ≈ 4-8
Used for emoji frequency, Safari crashes, health metrics

Differential Privacy in Machine Learning

DP-SGD (Differentially Private Stochastic Gradient Descent) trains neural networks with differential privacy guarantees:

For each batch:
  1. Compute gradients
  2. Clip gradients to maximum norm C
  3. Add noise calibrated to C/ε
  4. Average and update
  
Result: Trained model provides (ε, δ)-DP guarantee

OpenAI, Google, and Microsoft have published DP-trained models. Apple uses DP-SGD for iOS keyboard learning.

Secure Multi-Party Computation

Secure Multi-Party Computation (MPC) enables multiple parties to compute a function on their combined inputs without revealing inputs to each other.

The Example: Yao's Millionaires' Problem

Two millionaires want to know who is richer without revealing their wealth.
Solution: Use garbled circuits and oblivious transfer

Result: Both learn only which is richer, nothing else about the other's wealth

Secret Sharing

Split a secret into shares distributed to parties:

Shamir's Secret Sharing:
  - Split secret into n shares
  - Any t shares can reconstruct secret
  - Fewer than t shares reveal nothing

Example: (2,3) sharing of 42
  Share 1: (1, 17)
  Share 2: (2, -8)
  Share 3: (3, 33)
  
Any 2 shares can reconstruct the line and find the secret (42)

MPC Protocols

Protocol	Setting	Security	Performance
Garbled Circuits	2-party	Semihonest, malicious	Moderate
Secret Sharing	n-party	Various	Varies
GMW (Goldreich-Micali-Wigderson)	n-party	Semihonest	Good for arithmetic
SPDZ	n-party	Malicious	Slower but actively secure

Practical MPC Use Cases

Private set intersection: Find common contacts without revealing contact lists
Secure auction: Compute winning bid without revealing other bids
Privacy-preserving analytics: Compute statistics across hospitals without sharing patient data
Crytpo asset custody: Keys split across devices

Google's Private Join and Compute

Google open-sourced a library for private set intersection with aggregations:

Match records across datasets without revealing matches
Compute sums, counts on matched records only
Uses MPC with differential privacy amplification

Homomorphic Encryption

Homomorphic encryption (HE) allows computation on encrypted data. You can encrypt data, perform computations, and decrypt results—without ever seeing the plaintext.

Classical: Enc(f(Dec(x))) = f(x) is impossible
HE: Enc(f(x)) = f(Enc(x)) is possible

Order of operations:
  1. Client encrypts data: Enc(x)
  2. Server computes on ciphertext: Enc(f(x)) = f(Enc(x))
  3. Client decrypts: Dec(Enc(f(x))) = f(x)
  
Server never sees x or f(x)

Types of Homomorphic Encryption

Type	Operations	Performance	Use Case
Somewhat (SHE)	Limited (+, -, ×)	Moderate	Specific applications
Leveled FHE	Bounded circuit depth	Slow	Research
FHEW/TFHE	Arbitrary (bootstrapping)	Very slow	Research
CKKS (approximate)	Addition, multiplication	Moderate	Machine learning

CKKS for Machine Learning

CKKS (Cheon-Kim-Kim-Song) is the scheme most practical for ML:

CKKS properties:
  - Encrypted vectors of real numbers
  - Addition and multiplication work
  - Results are approximate (rounding errors)
  - Rescaling needed after multiplications
  
ML operations possible:
  - Matrix multiplication (convolution)
  - Activation functions (approximated)
  - Polynomials

Microsoft SEAL, IBM HElib, and PALISADE are popular HE libraries.

Performance Challenges

HE is slow—10,000-100,000x slower than plaintext computation:

A single encrypted multiplication might take milliseconds
Training a neural network would take years
Inference is more feasible but still expensive

Hybrid Approaches

Practical systems combine HE with other techniques:

Cryptonets: Use HE for final classification, plaintext for rest
Quantization + HE: Smaller ciphertexts enable faster HE
GPU acceleration: Some speedup possible

Privacy-Utility Tradeoff

More privacy requires more noise or less information, which reduces model utility. This tradeoff is fundamental.

What Each Technique Provides

Technique	Privacy Guarantee	Utility Impact	Computational Cost
Federated Learning	None alone	Minimal	Moderate
Federated + DP	Strong	Moderate	Moderate
Centralized DP	Strong	Moderate to High	Low
MPC	Strong	Minimal	High
HE	Strong	Minimal	Very High

Realistic Expectations

DP-SGD typically requires 10-50% more data to achieve same accuracy
Federated learning non-IID data can degrade accuracy 10-30%
HE inference is practical only for small models on encrypted data
MPC for large-scale ML is still research

Combining Techniques

Production privacy systems often combine multiple approaches:

Federated Learning + Differential Privacy + Secure Aggregation

1. Federated: Model trained on distributed devices
2. DP: Each update clipped and noised
3. Secure Aggregation: Server sees only aggregate, not individual updates

Result: Privacy guarantees even against malicious server

Conclusion

Privacy computing techniques have matured significantly. Federated learning enables model training across distributed data. Differential privacy provides mathematical guarantees. MPC enables secure joint computation. Homomorphic encryption allows computation on encrypted data.

Each technique involves tradeoffs between privacy strength, utility, and computational cost. The right approach depends on the use case:

Cross-device federated learning: Use with DP for user-facing applications
Cross-organization analytics: MPC or HE depending on scale
Regulated industries: Centralized DP with proper auditing

The field is advancing rapidly. As techniques mature and computation becomes cheaper, expect privacy-preserving ML to become increasingly practical.

MLOps Engineering Practice Edge Computing and IoT Blockchain Beyond Crypto