Privacy Analysis

Does Federated Learning Keep Patient Data Private?

Short answer: No. Federated learning transmits gradients that can be mathematically inverted to reconstruct training data.

Federated learning is marketed as a privacy solution: "data stays local, only model updates are shared." This framing omits the critical finding from peer-reviewed research: model gradients contain enough information to reconstruct training data with high fidelity.

Discussion: ML security researcher

"I've reproduced gradient inversion attacks on medical imaging data. Using the Deep Leakage from Gradients technique (Zhu et al., 2019), I can reconstruct chest X-rays from federated learning gradient updates with recognizable diagnostic features. If your federated learning system transmits gradients from a model training on patient data, those gradients are functionally equivalent to transmitting the data itself — just in a compressed, lossy format that the research community has repeatedly shown can be inverted. The 'privacy by staying local' claim is technically true for raw data, but misleading when applied to the actual information flow in FL systems."

Key research on gradient leakage

The alternative: compute-to-data with hardware TEE

Rapha Protocol runs full training locally inside an SGX/TDX enclave at the hospital. No gradients are transmitted. No intermediate states leave the institution. Only trained model weights exit after training completes. The gradient leakage attack vector does not exist because gradients never cross the network boundary.

Community Q&A

Q: Can differential privacy fix the gradient leakage problem in FL?

"Adding DP noise to gradients can reduce inversion fidelity, but it also degrades model quality — especially on rare classes and minority populations, which are exactly the cases clinical AI needs to capture. There's no free lunch: you trade privacy for utility. With hardware TEE-based compute-to-data, you get both."

Q: What about secure aggregation in FL?

"Secure aggregation prevents the central server from seeing individual gradients, but it doesn't prevent gradient leakage. The gradients are still transmitted — just encrypted in transit. The aggregator still receives and processes them. And secure aggregation adds computational overhead that scales with the number of participants."