Privacy Computing Overview

Federated learning, differential privacy, secure computation, and the privacy-utility tradeoff

Published: January 2026 | Reading Time: 14 minutes | Category: AI & Machine Learning

Secure data infrastructure representing privacy computing

Privacy regulations (GDPR, CCPA, HIPAA) and user expectations increasingly demand that AI systems handle sensitive data responsibly. Yet training ML models often requires large datasets—and that data might contain personal information you'd rather not expose. Privacy computing techniques enable useful computation on sensitive data while providing mathematical guarantees about what cannot be learned.

This article covers the main privacy-preserving technologies: federated learning, differential privacy, secure multi-party computation, and homomorphic encryption.

Federated Learning

Federated learning trains models across distributed datasets without centralizing the data. The key insight: move the model to the data, not the data to the model.

The Federated Learning Process

1. SERVER: Broadcast model to participating devices
2. DEVICES: Train locally on local data
3. DEVICES: Send model updates (not data) to server
4. SERVER: Aggregate updates (FedAvg or variants)
5. SERVER: Update shared model
6. Repeat until convergence
    

Google uses federated learning for keyboard prediction on Android—your phone trains locally on your typing patterns and only shares model updates, not your keystrokes.

Federated Averaging (FedAvg)

The basic federated algorithm:

Server:
  w̄_k+1 = Σ(n_i / n) × w_i
  
Where:
  n_i = number of samples on device i
  n = total samples across all devices
  w_i = model update from device i
  
This weighted average combines updates proportional to dataset size
    

Challenges

Privacy Leakage in Federated Learning

Model updates are not the same as no information. Research has shown:

Solution: Combine federated learning with differential privacy or secure aggregation.

Differential Privacy

Differential privacy provides mathematical guarantees that individual data points cannot be identified—even with full access to the model and outputs.

The Definition

A mechanism M provides (ε, δ)-differential privacy if for any 
adjacent datasets D and D' (differing in one record) and any 
output O:

Pr[M(D) ∈ O] ≤ e^ε × Pr[M(D') ∈ O] + δ

Where:
  ε = privacy budget (smaller = more private)
  δ = probability of violating ε guarantee
    

In simpler terms: removing or changing any single record doesn't significantly change the probability of any output.

The Laplace Mechanism

Add noise to the output calibrated to the sensitivity of the query:

M(D) = f(D) + Laplace(Δf / ε)

Where:
  f(D) = query result (e.g., count, mean)
  Δf = sensitivity (max change from removing one record)
  ε = privacy parameter
  Laplace = Laplace distribution
    

For a count query, sensitivity is 1 (removing one person changes count by at most 1). For a sum, sensitivity is the maximum value any individual can contribute.

The Gaussian Mechanism

For queries with many outputs or compositions:

M(D) = f(D) + N(0, σ²)

Where σ² ≥ 2 × log(1.25/δ) × (Δf)² / ε²
    

Gaussian mechanism provides (ε, δ)-DP compared to pure ε-DP for Laplace.

Privacy Budget and Composition

Privacy budget (ε) depletes with each query:

Practical ε values: ε = 1-3 provides strong privacy (research usage). ε = 8-10 provides moderate privacy (product analytics). ε > 50 provides minimal privacy—often used in "anonymization" claims that don't meet academic standards.

Apple's Differential Privacy

Apple uses differential privacy for analytics:

Differential Privacy in Machine Learning

DP-SGD (Differentially Private Stochastic Gradient Descent) trains neural networks with differential privacy guarantees:

For each batch:
  1. Compute gradients
  2. Clip gradients to maximum norm C
  3. Add noise calibrated to C/ε
  4. Average and update
  
Result: Trained model provides (ε, δ)-DP guarantee
    

OpenAI, Google, and Microsoft have published DP-trained models. Apple uses DP-SGD for iOS keyboard learning.

Secure Multi-Party Computation

Secure Multi-Party Computation (MPC) enables multiple parties to compute a function on their combined inputs without revealing inputs to each other.

The Example: Yao's Millionaires' Problem

Two millionaires want to know who is richer without revealing their wealth.
Solution: Use garbled circuits and oblivious transfer

Result: Both learn only which is richer, nothing else about the other's wealth
    

Secret Sharing

Split a secret into shares distributed to parties:

Shamir's Secret Sharing:
  - Split secret into n shares
  - Any t shares can reconstruct secret
  - Fewer than t shares reveal nothing

Example: (2,3) sharing of 42
  Share 1: (1, 17)
  Share 2: (2, -8)
  Share 3: (3, 33)
  
Any 2 shares can reconstruct the line and find the secret (42)
    

MPC Protocols

Protocol Setting Security Performance
Garbled Circuits 2-party Semihonest, malicious Moderate
Secret Sharing n-party Various Varies
GMW (Goldreich-Micali-Wigderson) n-party Semihonest Good for arithmetic
SPDZ n-party Malicious Slower but actively secure

Practical MPC Use Cases

Google's Private Join and Compute

Google open-sourced a library for private set intersection with aggregations:

Homomorphic Encryption

Homomorphic encryption (HE) allows computation on encrypted data. You can encrypt data, perform computations, and decrypt results—without ever seeing the plaintext.

Classical: Enc(f(Dec(x))) = f(x) is impossible
HE: Enc(f(x)) = f(Enc(x)) is possible

Order of operations:
  1. Client encrypts data: Enc(x)
  2. Server computes on ciphertext: Enc(f(x)) = f(Enc(x))
  3. Client decrypts: Dec(Enc(f(x))) = f(x)
  
Server never sees x or f(x)
    

Types of Homomorphic Encryption

Type Operations Performance Use Case
Somewhat (SHE) Limited (+, -, ×) Moderate Specific applications
Leveled FHE Bounded circuit depth Slow Research
FHEW/TFHE Arbitrary (bootstrapping) Very slow Research
CKKS (approximate) Addition, multiplication Moderate Machine learning

CKKS for Machine Learning

CKKS (Cheon-Kim-Kim-Song) is the scheme most practical for ML:

CKKS properties:
  - Encrypted vectors of real numbers
  - Addition and multiplication work
  - Results are approximate (rounding errors)
  - Rescaling needed after multiplications
  
ML operations possible:
  - Matrix multiplication (convolution)
  - Activation functions (approximated)
  - Polynomials
    

Microsoft SEAL, IBM HElib, and PALISADE are popular HE libraries.

Performance Challenges

HE is slow—10,000-100,000x slower than plaintext computation:

Hybrid Approaches

Practical systems combine HE with other techniques:

Privacy-Utility Tradeoff

More privacy requires more noise or less information, which reduces model utility. This tradeoff is fundamental.

What Each Technique Provides

Technique Privacy Guarantee Utility Impact Computational Cost
Federated Learning None alone Minimal Moderate
Federated + DP Strong Moderate Moderate
Centralized DP Strong Moderate to High Low
MPC Strong Minimal High
HE Strong Minimal Very High

Realistic Expectations

Combining Techniques

Production privacy systems often combine multiple approaches:

Federated Learning + Differential Privacy + Secure Aggregation

1. Federated: Model trained on distributed devices
2. DP: Each update clipped and noised
3. Secure Aggregation: Server sees only aggregate, not individual updates

Result: Privacy guarantees even against malicious server
    

Conclusion

Privacy computing techniques have matured significantly. Federated learning enables model training across distributed data. Differential privacy provides mathematical guarantees. MPC enables secure joint computation. Homomorphic encryption allows computation on encrypted data.

Each technique involves tradeoffs between privacy strength, utility, and computational cost. The right approach depends on the use case:

The field is advancing rapidly. As techniques mature and computation becomes cheaper, expect privacy-preserving ML to become increasingly practical.