Data Splits ๐๏ธ
Data is sampled into four splits, with the following use-cases:
- Public Training and Development Dataset (1500 cases):
Available for all participants and researchers, to train and develop AI models. All data is fully anonymized and made available under a non-commercial CC BY-NC 4.0 license. Includes 328 cases from the ProstateX Challenge. For all updates/fixes regarding this dataset, please join the challenge and check out our dedicated forum post on this topic.
Imaging data has been released via: zenodo.org/record/6624726 (DOI: 10.5281/zenodo.6624726)
Annotations have been released and are maintained via: github.com/DIAGNijmegen/picai_labels
Available for all participants and researchers, to train and develop AI models. All data is fully anonymized and made available under a non-commercial CC BY-NC 4.0 license. Includes 328 cases from the ProstateX Challenge. For all updates/fixes regarding this dataset, please join the challenge and check out our dedicated forum post on this topic.
Imaging data has been released via: zenodo.org/record/6624726 (DOI: 10.5281/zenodo.6624726)
Annotations have been released and are maintained via: github.com/DIAGNijmegen/picai_labels
- Private/Sequestered Training Dataset (7607 cases):
Used exclusively by the organizers to retrain the top-ranking 5 AI algorithms, with large-scale data, during the Closed Testing Phase.
Used exclusively by the organizers to retrain the top-ranking 5 AI algorithms, with large-scale data, during the Closed Testing Phase.
- Hidden Tuning Cohort (100 cases):
Used for a live, public leaderboard that enables model selection and tuning, during the Open Development Phase.
Used for a live, public leaderboard that enables model selection and tuning, during the Open Development Phase.
- Hidden Testing Cohort (1000 cases):
Used to determine the top 5 AI algorithms at the end of the Open Development Phase. Used to benchmark AI, radiologists, and test all hypotheses at the end of the Closed Testing Phase. Includes internal testing data (unseen cases from seen centers) and external testing data (unseen cases from an unseen center). A subset of 400 cases from this cohort is used to facilitate the PI-CAI: Reader Study.
Used to determine the top 5 AI algorithms at the end of the Open Development Phase. Used to benchmark AI, radiologists, and test all hypotheses at the end of the Closed Testing Phase. Includes internal testing data (unseen cases from seen centers) and external testing data (unseen cases from an unseen center). A subset of 400 cases from this cohort is used to facilitate the PI-CAI: Reader Study.
Imaging Data ๐ฅ
The complete dataset used for the PI-CAI challenge comprises a cohort of 9000โ11,000 prostate MRI exams, curated from three Dutch centers {Radboud University Medical Center (RUMC), Ziekenhuis Groep Twente (ZGT), Prostaat Centrum Noord-Nederland (PCNN)} and one Norwegian center {St. Olavโs Hospital, Trondheim University Hospital (STOH)}. Institutional review boards of all four centers have waived the need for informed patient consent, with respect to the retrospective scientific use of anonymized clinical data in this challenge.
All patient exams are of men suspected of harboring csPCa (e.g. due to elevated levels of PSA, abnormal DRE findings). Patients are included only if they do not have a history of treatment or prior ISUP โฅ 2 findings.
All patient exams include basic clinical variables {patient age, prostate volume, PSA level, PSA density} as reported in their diagnostic reports, basic acquisition variables {scanner manufacturer, scanner model name, diffusion b-value}, and bpMRI scans, acquired using Siemens Healthineers or Philips Medical Systems-based scanners with surface coils. Imaging consists of the following sequences:
- Axial, sagittal and coronal T2-weighted imaging (T2W).
- Axial high b-value (โฅ 1000 s/mmยฒ) diffusion-weighted imaging (DWI).
- Axial apparent diffusion coefficient maps (ADC).
โ ๏ธAbsolute intensity values of ADC scans used in the PI-CAI challenge are not universal or clinically meaningful on their own (e.g., unlike Hounsfield units (HU) in CT scans, where -1000 HU will always indicate air), due to non-standardized acquisition protocols across centers and/or inconsistent image scaling (T.L. Chenevert et al., 2014). Furthermore, PI-RADS v2 recommends that absolute ADC values should be used with caution, as these can vary substantially depending on the value and number of b-values selected, the magnet strength, the vendor, and inter-patient variability (T. Barrett et al., 2015).
For the Public Training and Development Dataset and the Private/Sequestered Training Dataset:
- Every patient case will at least have three imaging sequences: axial T2W, axial DWI and axial ADC scans (i.e. files ending in
_t2w.mha
, _hbv.mha
, _adc.mha
). Additionally, they can also have either, both or none of these optional imaging sequences: sagittal and coronal T2W scans (i.e. files ending in _sag.mha
, _cor.mha
here). No patient case will include dynamic contrast-enhanced (DCE) sequences.
_t2w.mha
, _hbv.mha
, _adc.mha
). Additionally, they can also have either, both or none of these optional imaging sequences: sagittal and coronal T2W scans (i.e. files ending in _sag.mha
, _cor.mha
here). No patient case will include dynamic contrast-enhanced (DCE) sequences.For the Hidden Tuning Cohort and the Hidden Testing Cohort:
- Every patient case will have exactly five imaging sequences: axial, sagittal and coronal T2W; axial DWI and axial ADC scans (i.e. files ending in
_t2w.mha
, _sag.mha
, _cor.mha
, _hbv.mha
, _adc.mha
here). For part of the Hidden Testing Cohort, DCE sequences will only be available to radiologists participating in the PI-CAI: Reader Study. But they will not be available for AI algorithms, within the context of this grand challenge, at any given stage.
_t2w.mha
, _sag.mha
, _cor.mha
, _hbv.mha
, _adc.mha
here). For part of the Hidden Testing Cohort, DCE sequences will only be available to radiologists participating in the PI-CAI: Reader Study. But they will not be available for AI algorithms, within the context of this grand challenge, at any given stage.To dive deeper into the clinical significance of different prostate MRI sequences, and why they are useful for csPCa detection/diagnosis, feel free to have a look at:
Clinical and Scanner Information ๐งช
For the Public Training and Development Dataset and the Private/Sequestered Training Dataset:
- PSAโฐ, prostate volumeโฐ, PSA densityโฐ, patient age^, MRI scanner manufacturer^, MRI scanner model name^ and diffusion b-value of the high b-value DWI/HBV scan^, will be available to every AI algorithm per case.
For the Hidden Tuning Cohort and the Hidden Testing Cohort:
- PSA^, prostate volume^ยน, PSA density^ยฒ, patient age^, MRI scanner manufacturer^, MRI scanner model name^ and diffusion b-value of the high b-value DWI/HBV scan^, will be available to every AI algorithm per case.
โฐ available, if value is reported during clinical routineยน if value is not reported during clinical routine, it is retrospectively calculated by an expert radiologist
ยฒ if value is not reported during clinical routine, it is retrospectively calculated from the PSA and prostate volume
^ always available
ยฒ if value is not reported during clinical routine, it is retrospectively calculated from the PSA and prostate volume
^ always available
^ always available