Clinical AI compute-to-data guide

How to train AI on real clinical data without moving patient data

Short answer: do not download hospital data. Route the model into the clinical data boundary. That is the core Rapha Protocol thesis: compute moves, patient data stays.

Rapha Protocol is building private-alpha infrastructure for AI teams that need real clinical signal without creating a raw-PHI export problem. The system is designed to send approved model workloads into controlled edge nodes, enforce policy before execution, return approved artifacts, and anchor proof metadata for auditability.

Register for API Early Access Return to Rapha Protocol

Problem

Clinical AI needs real data, but the data cannot simply leave.

Useful healthcare models need real clinical signal: EHR notes, labs, outcomes, radiology workflows, device telemetry, and patient-generated health data. The blocker is not model architecture. The blocker is governance. Hospitals cannot hand raw records to every AI company that wants better training data.

The usual cloud training pattern creates the wrong risk surface: copy sensitive data out, centralize it, then try to control the blast radius. Rapha Protocol inverts the path. The model goes to the controlled environment, not the other way around.

What Rapha Protocol does

Rapha Protocol routes AI workloads into controlled clinical edge nodes.

An AI team defines the training job, dataset intent, model artifact, output rules, and compute budget. Rapha Protocol is designed to route that workload into an on-prem or controlled edge node where the clinical data already lives. The raw dataset remains inside the hospital or institution-controlled boundary.

The goal is simple: give AI researchers access to real clinical learning signal while giving hospitals a defensible technical boundary. The researcher receives approved outputs such as trained weights, metrics, hashes, and receipts. They do not receive raw patient records.

Security model

Policy checks happen before the model touches data.

The edge runtime is built around fail-closed controls. OPA policy checks the workload. Dataset manifests must allowlist the selected cohort. Dataset mounts are expected to be read-only. The training runtime refuses execution when TEE posture, dataset path, trainer command, or required output artifacts are invalid.

This matters because the dangerous failure mode is silent exfiltration. Rapha Protocol treats network access, output files, logs, and trainer behavior as security boundaries. If a configured node cannot prove the required runtime posture, training should stop instead of guessing.

Output model

The output is a model artifact and receipt, not a data dump.

A successful job should return only approved outputs: trained weights, metrics, hashes, cryptographic receipt metadata, and settlement references. Raw PHI, DICOM exports, FHIR bundles, Apple Health samples, and genetic data should not be sent to the AI company, Polygon, IPFS, Vercel, Render, or any general-purpose web surface.

That is the practical difference between a data marketplace and compute-to-data infrastructure. Rapha Protocol is not trying to sell raw patient files. It is building the control plane for training against clinical data while the data wall remains intact.

Audit and settlement

Proof metadata can be anchored, but proof is not a clinical approval.

Rapha Protocol uses public proof surfaces to make execution claims auditable. Hashes, receipts, event metadata, and settlement references can be anchored on Polygon mainnet. The clearing-vault settlement path requires trusted-attestor verification before funds can release against a training job.

A mainnet proof receipt proves transaction inclusion and a cryptographic commitment. It does not prove model safety, clinical validity, regulatory clearance, HIPAA compliance, de-identification, or hospital approval by itself. Those still require contracts, security review, privacy review, and institutional signoff.

How it works

The workflow in five steps.

  1. Declare the job: the AI developer submits model artifact, cohort intent, output policy, and budget.
  2. Authenticate access: developer credentials and proof-session state are handled server-side, not trusted to the browser.
  3. Run at the edge: the workload executes beside local records under OPA policy and runtime checks.
  4. Return approved artifacts: the researcher receives trained weights, metrics, hashes, and receipt metadata.
  5. Verify and settle: trusted proof material can support settlement after the required attestation path is satisfied.

What is live, what is private alpha

Mainnet proof surface

Rapha Protocol has a public Polygon mainnet proof receipt and deployed settlement surfaces. This is a proof and audit surface, not a claim that production hospital PHI has been processed:

0xfadab8cc5e6bdb531d7ddfd64fd2a325a5dabda1c0f1eb7a21f05d15c618f9a0

Contract: 0xB27704CA8A01Bc151181D1d53E2F0eF11B39B32F

Open the Rapha Protocol mainnet receipt

What this does and does not prove

The receipt proves that a public cryptographic commitment exists on Polygon mainnet. It does not prove clinical validity, model safety, regulatory clearance, de-identification, or healthcare compliance by itself.

Important: Rapha Protocol is private-alpha software. Public demos must not receive real PHI, DICOM exports, FHIR bundles, Apple Health exports, genetic data, private keys, seed phrases, or regulated production data. Production use requires written agreements, security review, privacy review, institutional approval, applicable BAA/DPA analysis, and verified hardware attestation.

Useful links