NHS Data for AI Training
The NHS holds one of the world's most valuable clinical datasets. Almost none of it is accessible for AI training.
The UK National Health Service provides universal healthcare to 67 million people. Every GP visit, hospital admission, A&E attendance, diagnostic scan, lab test, and prescription generates data. Aggregated across decades, this represents one of the largest, most comprehensive, and most longitudinal clinical datasets on the planet.
For AI researchers, this data would be transformative. Models trained on NHS data could understand disease progression across entire populations, identify rare disease patterns invisible in smaller datasets, and capture real-world treatment outcomes across diverse demographic groups.
But NHS data is governed by some of the strictest data protection frameworks in the world: UK GDPR, the Data Protection Act 2018, the NHS Data Security and Protection Toolkit (DSPT), Caldicott Principles, and the Common Law Duty of Confidentiality. Getting access through traditional channels — NHS Digital data releases, research ethics committee approval, data sharing agreements — takes years and typically only grants access to de-identified extracts for specific, pre-defined research questions.
NHS data governance: the acronyms that block AI training
- UK GDPR (2018) — The UK's implementation of the General Data Protection Regulation. Article 5(1)(c) requires data minimisation: "adequate, relevant and limited to what is necessary." Exporting millions of patient records to train an AI model arguably violates this principle. Rapha Protocol's compute-to-data architecture aligns with it: only model weights leave, not data.
- Data Protection Act 2018 — Supplements UK GDPR with specific provisions for health data (Schedule 1, Part 2). Processing of health data requires a specific condition. Research is a permitted condition — but only with appropriate safeguards. Compute-to-data's hardware enclave, kernel air-gap, and policy engine constitute those safeguards.
- NHS DSPT (Data Security and Protection Toolkit) — Mandatory annual self-assessment for all NHS organisations handling patient data. Covers 10 data security standards across people, process, and technology. Rapha Protocol maps 34 OPA policy controls directly to DSPT standards.
- Caldicott Principles — Eight principles governing use of patient-identifiable information in the NHS. Principle 4: "Access to patient identifiable information should be on a strict need-to-know basis." Principle 7: "The duty to share information can be as important as the duty to protect patient confidentiality." Compute-to-data satisfies both: the AI model needs the data (Principle 7), but only during training and never exported (Principle 4).
- NHS Research Ethics Committee (REC) — Required for research involving NHS patients or their data. REC approval is separate from data access — you need both. Rapha Protocol is the technical infrastructure for data access; the research protocol and REC application are the researcher's responsibility.
NHS data access pathways compared
NHS Digital Data Access Request Service (DARS)
Formal application to NHS Digital for data extracts. Requires: sponsor organisation, data sharing agreement, data protection impact assessment, IG Toolkit compliance, and approval from an NHS Digital Data Access Advisory Group. Typical timeline: 12-18 months. Data received: de-identified, aggregated, or pseudonymised extracts. Cannot be used for commercial AI product development in many cases.
NHS Research Ethics Committee + Trust R&D approval
Per-trust approval for research studies. Requires: IRAS application, REC favourable opinion, HRA approval, local trust R&D sign-off, and honorary research contracts for study team. Typical timeline: 6-12 months per trust. Data received: limited to approved protocol. Significant administrative overhead for multi-site studies.
NHS AI Lab / AI Award programme
Funding programme for AI development in NHS. Recipients include Kheiron Medical, Behold.ai, and other radiology AI companies. Provides funding and NHS partnership access. Competitive application process. Limited to UK-registered organisations. Data access still requires trust-by-trust negotiation.
Compute-to-Data (Rapha Protocol)
Technical infrastructure for NHS data access without data export. AI company submits model through Rapha secure API. Model trains on NHS data inside hospital's edge appliance using SGX/TDX enclave and OPA policy. Only trained weights leave. NHS trust earns 70% of fee. Timeline: technical deployment measured in weeks, not months — institutional governance review remains a prerequisite. Data never moves. DSPT and Caldicott alignment through configurable OPA policy.
UK-specific compliance advantage of compute-to-data
The UK Information Commissioner's Office (ICO) has repeatedly signalled that data minimisation and purpose limitation are central to lawful AI development. The ICO's guidance on AI and data protection emphasises that organisations should "consider whether you can achieve your purpose by using less data, or by using a different technique that is less intrusive."
Rapha Protocol's compute-to-data architecture directly implements this guidance. By training models at the data source without exporting raw records, the data processing is both more minimised (only weights leave) and more auditable (every job produces a cryptographically signed proof receipt) than the alternative of exporting de-identified NHS data to cloud infrastructure.
Important: Rapha Protocol is private-alpha. NHS data access requires NHS trust governance approval, Caldicott Guardian sign-off, and applicable data sharing agreements regardless of the technical architecture. Rapha Protocol is designed to support — not replace — NHS governance processes. The technical deployment reduces data export risk; it does not eliminate the need for institutional approval.