Medical Imaging AI Without Data Sharing
Data sharing is the default. It should not be.
Every radiology AI company faces the same onboarding roadblock: the hospital's legal team will not sign a data sharing agreement that exports DICOM studies. The reasons are legitimate:
- DICOM headers contain PHI: patient name, MRN, date of birth, study date, accession number.
- DICOM pixel data can contain burned-in PHI: patient information rendered directly into ultrasound images, scanned documents, or screen captures.
- DICOM studies are large. A single MRI study can be hundreds of megabytes. Exporting a cohort of 10,000 studies means transferring terabytes of PHI-containing data.
- Once exported, the data is outside the hospital's control. The hospital's DPO and Caldicott Guardian are accountable for data that has left their custody — a risk they are professionally incentivised to avoid.
The result: radiology AI companies spend 50-80% of their go-to-market time negotiating data access. Not building models. Not validating performance. Not serving patients. Negotiating data sharing.
Compute-to-data eliminates the sharing problem
With Rapha Protocol, the AI company does not receive DICOM data. Instead:
- The model is containerised and cryptographically signed.
- The container is deployed to the hospital's edge appliance — inside the PACS network, behind the firewall.
- The model trains locally against DICOM studies in their native location. The DICOM files are never copied, transferred, or transmitted outside the hospital.
- Patient identifiers and StudyInstanceUIDs are keyed-hashed with a hospital-held HMAC key. The training runtime sees hashed identifiers, not real PHI.
- Only trained model weights, validation metrics, and cryptographic proof receipts leave the institution.
- The hospital earns 70% of the training fee through Polygon USDC settlement.
The data sharing agreement is replaced by a compute access agreement — a fundamentally simpler legal instrument because no data changes custody.
DICOM-specific security controls
- QIDO-RS scoping: The DICOMweb client queries only scoped study metadata. Broad PACS scans are rejected at the client level.
- PHI HMAC hashing: Patient identifiers, StudyInstanceUIDs, and AccessionNumbers are hashed with a hospital-held key before any metadata reaches the training runtime.
- Free text suppression: StudyDescription and ProtocolName are suppressed by default in metadata extraction. They can contain incidental PHI.
- Pixel data confinement: DICOM pixel data is read directly from the PACS mount into the SGX enclave. It never exists in unencrypted form outside the enclave's protected memory region.
- Output validation: The output guard rejects any file with .dcm extension. Trained model weights cannot contain DICOM data.
Production deployment requires: real hospital DICOMweb endpoints, hospital-held PHI HMAC keys (32-byte), scoped study queries, OPA policy approval, SGX/DCAP + TPM attestation, and configured enterprise-node trainer command. Demo-only clients are isolated from production.