𝗗𝗮𝘁𝗮𝘀𝗲𝘁𝘀: Imaging, Labels - Grand Challenge

The PI-CAI Challenge Banner

Data Splits 🗃️

Data is sampled into four splits, with the following use-cases:

Public Training and Development Dataset (1500 cases):
Available for all participants and researchers, to train and develop AI models. All data is fully anonymized and made available under a non-commercial CC BY-NC 4.0 license. Includes 328 cases from the ProstateX Challenge. For all updates/fixes regarding this dataset, please join the challenge and check out our dedicated forum post on this topic.

Imaging data has been released via: zenodo.org/record/6624726 (DOI: 10.5281/zenodo.6624726)
Annotations have been released and are maintained via: github.com/DIAGNijmegen/picai_labels

Private/Sequestered Training Dataset (7607 cases):
Used exclusively by the organizers to retrain the top-ranking 5 AI algorithms, with large-scale data, during the Closed Testing Phase.

Hidden Tuning Cohort (100 cases):
Used for a live, public leaderboard that enables model selection and tuning, during the Open Development Phase.

Hidden Testing Cohort (1000 cases):
Used to determine the top 5 AI algorithms at the end of the Open Development Phase. Used to benchmark AI, radiologists, and test all hypotheses at the end of the Closed Testing Phase. Includes internal testing data (unseen cases from seen centers) and external testing data (unseen cases from an unseen center). A subset of 400 cases from this cohort is used to facilitate the PI-CAI: Reader Study.

Imaging Data 🏥

The complete dataset used for the PI-CAI challenge comprises a cohort of 9000–11,000 prostate MRI exams, curated from three Dutch centers {Radboud University Medical Center (RUMC), Ziekenhuis Groep Twente (ZGT), Prostaat Centrum Noord-Nederland (PCNN)} and one Norwegian center {St. Olav’s Hospital, Trondheim University Hospital (STOH)}. Institutional review boards of all four centers have waived the need for informed patient consent, with respect to the retrospective scientific use of anonymized clinical data in this challenge.

All patient exams are of men suspected of harboring csPCa (e.g. due to elevated levels of PSA, abnormal DRE findings). Patients are included only if they do not have a history of treatment or prior ISUP ≥ 2 findings.

All patient exams include basic clinical variables {patient age, prostate volume, PSA level, PSA density} as reported in their diagnostic reports, basic acquisition variables {scanner manufacturer, scanner model name, diffusion b-value}, and bpMRI scans, acquired using Siemens Healthineers or Philips Medical Systems-based scanners with surface coils. Imaging consists of the following sequences:

Axial, sagittal and coronal T2-weighted imaging (T2W).
Axial high b-value (≥ 1000 s/mm²) diffusion-weighted imaging (DWI).
Axial apparent diffusion coefficient maps (ADC).

⚠️Absolute intensity values of ADC scans used in the PI-CAI challenge are not universal or clinically meaningful on their own (e.g., unlike Hounsfield units (HU) in CT scans, where -1000 HU will always indicate air), due to non-standardized acquisition protocols across centers and/or inconsistent image scaling (T.L. Chenevert et al., 2014). Furthermore, PI-RADS v2 recommends that absolute ADC values should be used with caution, as these can vary substantially depending on the value and number of b-values selected, the magnet strength, the vendor, and inter-patient variability (T. Barrett et al., 2015).

For the Public Training and Development Dataset and the Private/Sequestered Training Dataset:

Every patient case will at least have three imaging sequences: axial T2W, axial DWI and axial ADC scans (i.e. files ending in `_t2w.mha`, `_hbv.mha`, `_adc.mha`). Additionally, they can also have either, both or none of these optional imaging sequences: sagittal and coronal T2W scans (i.e. files ending in `_sag.mha`, `_cor.mha` here). No patient case will include dynamic contrast-enhanced (DCE) sequences.

For the Hidden Tuning Cohort and the Hidden Testing Cohort:

Every patient case will have exactly five imaging sequences: axial, sagittal and coronal T2W; axial DWI and axial ADC scans (i.e. files ending in `_t2w.mha`, `_sag.mha`, `_cor.mha`, `_hbv.mha`, `_adc.mha` here). For part of the Hidden Testing Cohort, DCE sequences will only be available to radiologists participating in the PI-CAI: Reader Study. But they will not be available for AI algorithms, within the context of this grand challenge, at any given stage.

To dive deeper into the clinical significance of different prostate MRI sequences, and why they are useful for csPCa detection/diagnosis, feel free to have a look at:

R.R.M. Engels, B. Israël, A.R. Padhani, J.O. Barentsz, "Multiparametric Magnetic Resonance Imaging for the Detection of Clinically Significant Prostate Cancer: What Urologists Need to Know. Part 1: Acquisition", European Urology. DOI: 10.1016/j.eururo.2019.09.021

B. Israël, M. van der Leest, M. Sedelaar, A.R. Padhani, P. Zámecnik, J.O. Barentsz, "Multiparametric Magnetic Resonance Imaging for the Detection of Clinically Significant Prostate Cancer: What Urologists Need to Know. Part 2: Interpretation", European Urology. DOI: 10.1016/j.eururo.2019.10.024

Clinical and Scanner Information 🧪

For the Public Training and Development Dataset and the Private/Sequestered Training Dataset:

PSA⁰, prostate volume⁰, PSA density⁰, patient age^, MRI scanner manufacturer^, MRI scanner model name^ and diffusion b-value of the high b-value DWI/HBV scan^, will be available to every AI algorithm per case.

For the Hidden Tuning Cohort and the Hidden Testing Cohort:

PSA^, prostate volume^¹, PSA density^², patient age^, MRI scanner manufacturer^, MRI scanner model name^ and diffusion b-value of the high b-value DWI/HBV scan^, will be available to every AI algorithm per case.

⁰ available, if value is reported during clinical routine
¹ if value is not reported during clinical routine, it is retrospectively calculated by an expert radiologist
² if value is not reported during clinical routine, it is retrospectively calculated from the PSA and prostate volume
^ always available

Image Registration 🖼️

Imaging sequences (T2W, DWI, ADC) for each case in the Public Training and Development Dataset and the Private/Sequestered Training Dataset are in principle not co-registered. Although, the vast majority are reasonably well-aligned, there are several cases with substantial deviations. For 54/9107 (0.6%) of training cases we did perform manual image registration. We expect all participants to handle misalignment in the remaining cases in their algorithm design as they best see fit (if deemed necessary), to incentivize the development of automatic co-registration methods or AI models that are invariant to training on misaligned sequences. We believe that this is the only way of developing AI models using thousands of cases, as manual registration is too labour intensive at scale.

However, we can confirm that all sequences for each case in the Hidden Tuning Cohort and the Hidden Testing Cohort are co-registered by the organizers (given that we only want to evaluate diagnostic performance, and thereby try to minimize the effects of external factors). Manual registration, when deemed necessary, is performed using ITK-SNAP v3.80 (rigid transformation with six degrees of freedom for 3D translation and rotation). This pas performed for 85/1000 (8.5%) of the test cases.

Annotations ✍️

Annotations for the Private/Sequestered Training Dataset, Hidden Tuning Cohort and Hidden Testing Cohort will not be released publicly. Annotations for the Public Training and Development Dataset have been released and maintained via: github.com/DIAGNijmegen/picai_labels

Human Expert-Derived Annotations

Voxel-level csPCa lesion annotations are delineated and/or patient-level csPCa outcomes are recorded, by one of 10 trained investigators or 1 radiology resident, under supervision of one of 3 expert radiologists, at RUMC, PCNN or STOH. Each annotation is derived using all available MRI scans, diagnostic reports (radiology, pathology) and whole-mount prostatectomy specimen (if applicable). Lesion delineations are created using ITK-SNAP v3.80.

Out of the 1500 cases shared in the Public Training and Development Dataset, 1075 cases have benign tissue or indolent PCa (i.e. their labels should be empty or full of 0s) and 425 cases have csPCa (i.e. their labels should have lesion blobs of value 2, 3, 4 or 5). Out of these 425 positive cases, only 220 cases carry an annotation derived by a human expert. Remaining 205 positive cases have not been annotated. In other words, only 17% (220/1295) of the annotations provided in picai_labels/csPCa_lesion_delineations/human_expert should have csPCa lesion annotations, while the remaining 83% (1075/1295) of annotations should be empty. This is intentional, because as it is practically infeasible to annotate all lesions at the scale of the Private/Sequestered Training Dataset (7607 cases). Hence, we encourage participants to develop methods that can account for or figure out how to use non-annotated cases in the Public Training and Development Dataset as well.

Human expert-derived csPCa annotations have been provided for the Public Training and Development Dataset via the picai_labels repo in two formats:

Original Annotations (picai_labels/csPCa_lesion_delineations/human_expert/original):
All axial bpMRI sequences (T2W, DWI, ADC) per case, were used to localize and annotate csPCa lesions. However, depending on the annotator/center and their preference, some annotations have been created at the spatial resolution and orientation of the T2W image, while others have been created at the resolution and orientation of the DWI/ADC images. Either way, for every annotation in this folder, all lesion delineations (if any) will always clearly map to observations in DWI/ADC imaging.

Resampled Annotations (picai_labels/csPCa_lesion_delineations/human_expert/resampled):
For a given case, we expect all submitted AI models to predict a csPCa detection map with the same spatial dimensions and resolution as the T2W image. Hence, we have also converted and provided all original annotations at the same dimensions and spatial resolution as their corresponding T2W images here.

AI-Derived Annotations

At RUMC, we deal with non-annotated training cases with a semi-supervised learning strategy (Bosma et al., 2022). We have released AI-derived csPCa lesion annotations for all 1500 cases in the Public Training and Development Dataset (picai_labels/csPCa_lesion_delineations/AI), using this method. Participants can choose to use these AI-derived annotations for non-annotated training cases or use their own methodology for the same. In a similar manner (see algorithm), we have also released AI-derived whole-gland segmentations of the prostate for all 1500 cases in the Public Training and Development Dataset:
picai_labels/anatomical_delineations/whole_gland/AI

Reference Standard 🧬

Hidden Tuning and Testing Cohorts

For accurate validation of AI and human-reader performance, and in turn, to substantiate any conclusions derived from PI-CAI, a strong reference standard for csPCa is crucial. The PI-CAI reference standard aims to utilize the best possible evidence to define the ground-truth for every case in the validation and testing cohorts, i.e. histologically-confirmed (ISUP ≥ 2) positives, and histopathology (ISUP ≤ 1) or MRI (PI-RADS ≤ 2) negatives, with follow-up (≥ 3 years), as detailed below:

Patients with negative MRI (i.e. benign or carrying PI-RADS 1–2 lesions) generally do not undergo biopsies or RP and lack histologically-confirmed evidence for the absence of csPCa. It is likely that they do not harbor csPCa, but a small percentage (<1% at RUMC; Venderink et al., 2019) can still be missed. To alleviate this, upto 40% of the validation and testing cohorts is composed of multi-center patient data from the 4M cohort (van der Leest et al., 2019), where all patients with negative MRI had received systematic biopsies and subsequent grading was supervised by an expert uropathologist (> 25 years of experience). In other words, by using data from the 4M cohort, we are able to acquire histopathology evidence for a large fraction of the patient population, that is encountered, but typically not histologically-confirmed during clinical routine.

Biopsies alone can still be prone to undersampling csPCa, especially in the case of smaller lesions (Srivastava et al., 2019). Hence, all negative cases (negative MRI and/or histopathology) in the validation and testing cohorts are confirmed with follow-up data (e.g. using the national Dutch Pathology Registry (PALGA) for centers based in The Netherlands). Negative patient exams found to be positive (via MRI or histopathology) in ≥ 3 years of follow-up, were inspected with an expert radiologist for retrospective signs of potentially missed csPCa. If the presence of csPCa can be definitively confirmed, they are included as positive cases; otherwise, they are excluded. Negative patient exams with 100% csPCa diagnosis-free survival (DFS) after at least 3 years, are included.

Training Datasets

Patient cases used for the training datasets of PI-CAI are annotated with the same reference standard as used for the ProstateX challenge, i.e. histopathology (ISUP ≥ 2) positives, and histopathology (ISUP ≤ 1) or MRI (PI-RADS ≤ 2) negatives, without follow-up.

Figure. Typical workflow used to establish the ground-truth for each lesion in the hidden validation and testing cohorts. If systematic biopsies (SysBx) were performed in addition to MRI-targeted biopsies (MRBx), then SysBx findings are only used to upgrade the ISUP score not downgrade. If RP is performed, its corresponding findings supersede that of any prior histopathology or radiology findings. Cases for which pathology findings cannot be localized on MRI (e.g. MRI-invisible lesions without prostatectomy specimen, SysBx diagnostic reports with ambiguous or missing location information) are excluded.