Cohort

CRC Prediction: patients who were histologically diagnosed with adenocarcinoma of colon. Resection samples are used for the evaluation of AI learning models.

Scanned image data of resected tissue slides will be provided. All cases include the tumor tissues of the colon diagnosed at SNUH, SNUBH, and SMG-SNU BMC from January 2005 to June 2018. All personal labels in scanned images were removed in order to protect the patient’s privacy. 

All cases are randomly arranged for training, validation, and test sets. All WSIs were stained by hematoxylin and eosin and scanned by the Aperio AT2 at 40X magnification.


Context Information

  1. Case number: randomly applied number after removing the labeling of the original specimen.
  2. Pathological information: organ (colon), histology (adenocarcinoma), molecular subtypes (MSI classification).
  3. Additional clinicopathological data are not provided.


Training Data: 47 WSIs

We provide (1) the original WSIs, (2) XML annotation made by pathologists, and (3) the MSI classification by pathologists.


Detailed information contains the following:

  • WSI: Original scanned image compressed in an SVS format. 
  • XML annotation for tumor area: Multiple closed areas lined by the colored line. This overall area is defined as boundary enclosing dispersed viable tumor cell nests, necrosis, and peri- and intratumoral stromal tissue.
  • Non-masked regions (= normal tissue) contain no MSI : Masked regions could include the MSI or not, however, non-masked regions are all normal tissues. Thus, no MSI information are in the non-masked regions.
  • Blank area  in the masked regions (= empty space = background) : Based on weak supervision, our task has been set to be more challenging than last year. Thus, the blank area has not been removed, participants have to deal with it if necessary. (you can refer the criteria of blank area in last challenge PAIP2019.)
  • MSI classification: This MSI classification is one of the molecular subtypes of colorectal cancer. “MSI-H” classification by pathologists means that the whole tumor area has been identified as “microsatellite instability” with two more microsatellite markers. This information in every single case will be provided in a CSV format.


Validation Data: 31 WSIs

  • Thirty-one patient cases with adenocarcinoma resection. Each case per one slide and each slide must have at least one tumor area.
  • Provided slides are randomly mixed. Slides are not annotated, e.g., “whole tumor area”
  • The result generated using the Validation Data can be submitted for scoring. You can submit the result as many times as needed, and the leaderboard will be updated based on the highest score obtained.


Test Data: 40 WSIs

  • Forty patient cases with resected colon or colon biopsy specimen. Each case per one WSI and each slide must have at least one tumor area.
  • Provided slides are randomly mixed. Slides are not annotated, e.g., “whole tumor area”.
  • The Test Data does not include the Validation Data and is used for final scoring. Once you submit the result of the Test Data, it will be evaluated but the leaderboard will not be updated.
  • The list of the final top 10 contestants will be announced immediately after the final submission deadline, and the contestant ranking will be announced at the workshop.

Access to the Data