Clinical AI Training Without Data Export
AI companies need clinical data. Hospitals can't export it.
The core tension in healthcare AI: training useful clinical models requires real patient data, but governance, regulation, and ethics block raw data from leaving the institution. Copying scans, EHR records, and lab results into cloud GPU clusters creates an unacceptable risk surface.
This is not a technology limitation. It is not a model architecture problem. It is a data custody problem — and the correct answer is not "ask hospitals to export anyway."
The compute-to-data approach
Rapha Protocol inverts the pipeline. Instead of moving data to centralised compute, the model training workload moves into the hospital environment. An edge computing appliance sits inside the institution's firewall, connected to the local PACS, EHR, or data repository. The AI model trains locally against clinical data under policy enforcement, hardware attestation, and network isolation. Only trained model weights, metrics, and cryptographic proof receipts leave the boundary.
This is not federated learning. The entire training job executes at the data source. The researcher submits a model artifact, dataset intent, output policy, and budget. The network orchestrates routing, verification, and settlement — without ever touching raw patient data.
What the researcher receives
- Trained model weights — fine-tuned on real clinical data inside the hospital boundary.
- Training metrics and hashes — proof that the job executed against the declared dataset.
- Cryptographic proof receipt — anchored on Polygon mainnet for auditability.
- No raw PHI, DICOM pixels, FHIR bundles, or patient identifiers — ever.
Security and governance controls
Every training job passes through multiple fail-closed gates:
- OPA policy engine enforces institutional rules before compute starts — dataset allowlists, output constraints, network policy, consent requirements.
- SGX/TDX hardware attestation verifies the edge node's trusted execution environment before any model code executes.
- Rust kernel air-gap severs the WAN interface during training — no data can exfiltrate over the network.
- Go compliance scanner analyses model containers for network-capable dependencies (socket, requests, http.client, gRPC) before execution.
- RaphaDataLoader counts unique records before each batch reaches model code — settlement is per-record, not per-epoch.
UK and EU compliance alignment
The architecture aligns with UK GDPR data minimisation principles, NHS Data Security and Protection Toolkit (DSPT) standards, Caldicott Principle 4 (minimum necessary access), and HIPAA Security Rule controls (45 CFR 164.308-164.316). The system maps 34 compliance controls to NHS and HIPAA requirements. OPA policy configuration is institution-specific and must be reviewed by the deploying trust's Caldicott Guardian and DPO.
Production use requires written agreements, institutional approval, security review, privacy review, and applicable BAA/DPA analysis. Rapha Protocol is private-alpha infrastructure. Public demos must not receive real PHI or regulated production data.