Clinical AI Training Regulatory Compliance Guide
Clinical AI training lives at the intersection of multiple regulatory frameworks
Training a clinical AI model involves processing health data — which triggers a cascade of regulatory requirements across jurisdictions. This guide maps the major frameworks and explains how compute-to-data architecture aligns with each.
Regulatory frameworks mapped to compute-to-data
HIPAA (United States) — 45 CFR Parts 160, 162, and 164
- Security Rule (164.308-164.316): Requires administrative, physical, and technical safeguards for ePHI. Rapha Protocol's SGX/TDX enclave (technical), Rust air-gap (physical), and OPA policy engine (administrative) collectively address Security Rule controls.
- Privacy Rule (164.502-164.514): Limits uses and disclosures of PHI. Rapha Protocol does not disclose PHI to the AI company — only trained weights. The disclosure analysis is simplified because PHI does not change custody.
- Breach Notification Rule (164.400-414): Requires notification if PHI is compromised. Rapha Protocol eliminates the most common breach scenario — unauthorised access to exported PHI — because PHI is never exported.
- BAA requirement: A Business Associate Agreement is required when a third party creates, receives, maintains, or transmits PHI on behalf of a covered entity. Rapha Protocol's architecture is explicitly designed so the AI company does not receive PHI — potentially simplifying or eliminating BAA requirements between the hospital and the AI company. (The relationship between the hospital and Rapha Protocol as infrastructure operator may still require a BAA.)
UK GDPR & Data Protection Act 2018
- Data minimisation (Article 5(1)(c)): "Adequate, relevant and limited to what is necessary." Exporting full patient records to train an AI model arguably exceeds what is necessary. Returning only trained model weights is inherently minimised — the AI company receives exactly what it needs (a trained model) and nothing more (no patient data).
- Purpose limitation (Article 5(1)(b)): Data collected for care delivery cannot be repurposed for AI training without a lawful basis. Research is a permitted secondary purpose — but only with appropriate safeguards. The hospital retains control; the AI company accesses data only through the policy-governed edge appliance.
- Data Protection Impact Assessment (DPIA) — Article 35: Required for processing likely to result in high risk to individuals. AI training on health data will almost always trigger DPIA requirements. Rapha Protocol's architecture provides documented technical and organisational measures (TOMs) that directly address DPIA risk factors — data minimisation, access control, encryption, audit logging, and data residency.
- International data transfers (Articles 44-49): Not triggered — data does not leave the UK institution. The adequacy decision, standard contractual clauses, and binding corporate rules frameworks are not engaged because no international transfer occurs.
NHS DSPT & Caldicott (United Kingdom)
- DSPT Standard 1 (Personnel): Staff handling patient data must be appropriately trained. Rapha Protocol's OPA policy enforces role-based access controls that can be mapped to DSPT training requirements.
- DSPT Standard 4 (Data access): Access to data must be on a need-to-know basis. The edge appliance provides dataset-manifest-scoped, policy-gated, read-only access that satisfies this standard.
- DSPT Standard 8 (IT security): "All IT systems should have security measures in place." SGX/TDX enclave + kernel air-gap + OPA policy + TPM 2.0 constitute some of the strongest IT security measures available for clinical compute.
- Caldicott Principle 4: "Access to patient identifiable information should be on a strict need-to-know basis." The AI model needs the data for training — but only during training, inside the enclave, under policy control. This is arguably the narrowest possible implementation of the "need to know" principle.
FDA (United States) — Software as a Medical Device (SaMD)
- FDA regulates AI/ML-based Software as a Medical Device (SaMD) under 21 CFR. The agency has published guidance on predetermined change control plans (PCCP) for AI/ML SaMD.
- For models trained using Rapha Protocol, the training data provenance is cryptographically verifiable — the proof receipt documents which dataset was used, when training occurred, and what model artifact was produced. This documentation supports regulatory submissions where training data traceability is required.
- Rapha Protocol is not itself a medical device. It is infrastructure for training models. Whether a specific trained model constitutes a medical device depends on its intended use — determined by the model developer, not by the training infrastructure.
This guide is informational, not legal advice. Regulatory analysis must be performed by qualified counsel for each specific deployment. Compliance depends on institutional governance, contractual agreements, and operational implementation — not on technical architecture alone.