NHS Data Access

NHS Data for AI Training

The NHS holds one of the world's most valuable clinical datasets. Almost none of it is accessible for AI training.

The UK National Health Service provides universal healthcare to 67 million people. Every GP visit, hospital admission, A&E attendance, diagnostic scan, lab test, and prescription generates data. Aggregated across decades, this represents one of the largest, most comprehensive, and most longitudinal clinical datasets on the planet.

For AI researchers, this data would be transformative. Models trained on NHS data could understand disease progression across entire populations, identify rare disease patterns invisible in smaller datasets, and capture real-world treatment outcomes across diverse demographic groups.

But NHS data is governed by some of the strictest data protection frameworks in the world: UK GDPR, the Data Protection Act 2018, the NHS Data Security and Protection Toolkit (DSPT), Caldicott Principles, and the Common Law Duty of Confidentiality. Getting access through traditional channels — NHS Digital data releases, research ethics committee approval, data sharing agreements — takes years and typically only grants access to de-identified extracts for specific, pre-defined research questions.

NHS data governance: the acronyms that block AI training

NHS data access pathways compared

NHS Digital Data Access Request Service (DARS)

Formal application to NHS Digital for data extracts. Requires: sponsor organisation, data sharing agreement, data protection impact assessment, IG Toolkit compliance, and approval from an NHS Digital Data Access Advisory Group. Typical timeline: 12-18 months. Data received: de-identified, aggregated, or pseudonymised extracts. Cannot be used for commercial AI product development in many cases.

NHS Research Ethics Committee + Trust R&D approval

Per-trust approval for research studies. Requires: IRAS application, REC favourable opinion, HRA approval, local trust R&D sign-off, and honorary research contracts for study team. Typical timeline: 6-12 months per trust. Data received: limited to approved protocol. Significant administrative overhead for multi-site studies.

NHS AI Lab / AI Award programme

Funding programme for AI development in NHS. Recipients include Kheiron Medical, Behold.ai, and other radiology AI companies. Provides funding and NHS partnership access. Competitive application process. Limited to UK-registered organisations. Data access still requires trust-by-trust negotiation.

Compute-to-Data (Rapha Protocol)

Technical infrastructure for NHS data access without data export. AI company submits model through Rapha secure API. Model trains on NHS data inside hospital's edge appliance using SGX/TDX enclave and OPA policy. Only trained weights leave. NHS trust earns 70% of fee. Timeline: technical deployment measured in weeks, not months — institutional governance review remains a prerequisite. Data never moves. DSPT and Caldicott alignment through configurable OPA policy.

UK-specific compliance advantage of compute-to-data

The UK Information Commissioner's Office (ICO) has repeatedly signalled that data minimisation and purpose limitation are central to lawful AI development. The ICO's guidance on AI and data protection emphasises that organisations should "consider whether you can achieve your purpose by using less data, or by using a different technique that is less intrusive."

Rapha Protocol's compute-to-data architecture directly implements this guidance. By training models at the data source without exporting raw records, the data processing is both more minimised (only weights leave) and more auditable (every job produces a cryptographically signed proof receipt) than the alternative of exporting de-identified NHS data to cloud infrastructure.

Important: Rapha Protocol is private-alpha. NHS data access requires NHS trust governance approval, Caldicott Guardian sign-off, and applicable data sharing agreements regardless of the technical architecture. Rapha Protocol is designed to support — not replace — NHS governance processes. The technical deployment reduces data export risk; it does not eliminate the need for institutional approval.