AI Training on Protected Health Information
Can you train AI on PHI? Yes — if the data never leaves the covered entity.
Protected Health Information (PHI) is the most regulated data category in the United States. Under HIPAA, PHI includes any individually identifiable health information held by a covered entity or business associate. Training an AI model on PHI creates a complex compliance puzzle: the model needs the data, but the data cannot be exported, shared, or exposed outside the covered entity's control.
The conventional solution — de-identify the data, sign a BAA, move it to the cloud — turns the AI company into a business associate with direct PHI custody obligations. This is not a compliance strategy. It is a liability transfer.
Compute-to-data: PHI training without PHI custody
Rapha Protocol's architecture eliminates the PHI custody problem entirely. The AI company never receives, stores, processes, or transmits PHI. The model training job is dispatched into the hospital's SGX/TDX enclave. Training executes locally against PHI-containing clinical data. Only trained model weights — mathematical artifacts with no pathway back to individual patient records — leave the enclave.
Key compliance characteristics:
- No PHI transmitted to AI company. The AI company's infrastructure never touches PHI. The secure API handles authentication and session management without PHI transit.
- No PHI stored outside the covered entity. Training data remains on hospital-controlled storage behind the hospital firewall.
- No PHI exposed during training. SGX/TDX memory encryption prevents the hospital's own IT administrators from inspecting data in use.
- No PHI in training artifacts. RaphaDataLoader counts records. OPA policy validates output format. The output guard rejects raw-data-shaped files (.csv, .txt, .jsonl, .dcm, .fhir, .ndjson, .parquet).
What compute-to-data does NOT replace
Rapha Protocol is technical infrastructure. It does not replace:
- The HIPAA Privacy Rule's requirement for patient authorisation or waiver of authorisation for research uses of PHI.
- The HIPAA Security Rule's requirement for administrative, physical, and technical safeguards — the protocol provides the technical safeguards; the covered entity must still implement administrative and physical controls.
- Institutional review board (IRB) or privacy board review where applicable under the Common Rule.
- State-level health data privacy laws (e.g., California's CMIA, Washington's My Health My Data Act).
- Contractual requirements in data use agreements, BAAs, and research collaboration agreements.
Not legal advice. Consult qualified healthcare regulatory counsel for your specific PHI training use case. Rapha Protocol provides technical infrastructure designed to support HIPAA compliance — it does not itself constitute HIPAA compliance.