Oncology AI Training Data Access
Cancer AI is starving for data
Oncology is the highest-value application of clinical AI. Early cancer detection, treatment response prediction, and recurrence monitoring all depend on models trained on real clinical data. But cancer imaging and pathology data is both the most valuable and the most protected in any healthcare system.
Every major cancer type — breast, lung, prostate, colorectal, skin — has AI startups working on detection and diagnosis models. Every one of them faces the same blocker: access to diverse, high-quality clinical data that they cannot get from public datasets alone.
Cancer imaging modalities supported
- Breast cancer — mammography (2D and 3D tomosynthesis), breast ultrasound, breast MRI, contrast-enhanced mammography.
- Lung cancer — low-dose CT screening, diagnostic chest CT, PET-CT for staging and treatment response.
- Prostate cancer — multiparametric MRI (T2, DWI, DCE), transrectal ultrasound, PSMA PET-CT.
- Colorectal cancer — CT colonography, abdominal CT, MRI rectum for staging.
- Skin cancer — dermoscopy imaging, clinical photography, total body photography for melanoma surveillance.
- Brain tumours — multiparametric MRI (T1, T1-contrast, T2, FLAIR, DWI), MR spectroscopy.
- Digital pathology — whole slide imaging (WSI) for histopathology, immunohistochemistry slides, cytology.
UK NHS cancer screening programmes
The UK runs three national cancer screening programmes that generate millions of images annually:
- NHS Breast Screening Programme — mammography for women aged 50-71, ~2 million screens per year.
- NHS Bowel Cancer Screening Programme — faecal immunochemical testing with follow-up colonoscopy.
- NHS Targeted Lung Health Check programme — low-dose CT for high-risk populations, rolling out nationally.
InHealth and Alliance Medical are contracted NHS screening providers. The imaging data from these programmes represents one of the largest curated oncology imaging datasets in the world — and it is largely inaccessible to AI researchers under current governance models.
How compute-to-data enables oncology AI research
Rapha Protocol works at the intersection: an edge appliance installed at the screening centre or hospital runs external AI training jobs against screening data. The data stays inside the institution. Only trained model weights, metrics, and proof receipts exit. NHS screening providers earn 70% of each research fee. AI companies get access to real screening data for model development.
Important: Screening data is sensitive NHS patient data. Any AI research involving NHS screening data requires NHS Research Ethics Committee approval, Caldicott Guardian sign-off, and institutional data sharing agreements. Rapha Protocol is the technical infrastructure — it does not replace these governance requirements.