About the Example MoNuSeg Dataset

from https://monuseg.grand-challenge.org: Training data containing 30 images and around 22,000 nuclear boundary annotations has been released to the public previously as a dataset article in IEEE Transactions on Medical imaging in 2017.

  • The train dataset (images and annotations) can be downloaded from https://drive. google.com/file/d/1ZgqFJomqQGNnsx7w7QBzQQMVA16lbVCA/view

How to download and preprocess the data can be found in the corresponding Notebook.

MonNuSegCover.png

Setup

In this section, you will set up the training environment, install all dependencies and connect to the drive with the prepared datasets.

!pip install -Uqq deepflash2
import numpy as np
from deepflash2.all import *
from pathlib import Path

Settings

Prior to training and predicting directorys need to be specified and parameters need to be set. For convenience exissting Google Drive folders can be used.

# Connect to drive
try:
  from google.colab import drive
  drive.mount('/gdrive')
except:
  print('Google Drive is not available.')

SEED = 0 # We used seeds [0,1,2] in our experiemnts
OUTPUT_PATH = Path("/content/predictions") # Save predictions here
MODEL_PATH = Path("/content/models") # Save models here
TRAINED_MODEL_PATH= Path('/gdrive/MyDrive/deepflash2-paper/models/')
DATA_PATH = Path('/gdrive/MyDrive/deepflash2-paper/data')

#################### Parameters ####################
DATASET = 'monuseg' 
mask_directory='masks_preprocessed'

# Datasets have different numbers of classes - 2 in case of monuseg
num_classes = 2
# Diameters are calculated using the median sizes from the respective training sets - 21 in case of monuseg
diameter = 21 

# Create deepflash2 config class
cfg = Config(random_state=SEED, 
            num_classes=num_classes, scale= 1.)
Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).

Data preprocessing

  • Initialize EnsembleLearner
  • Plot images and masks to show if they are correctly loaded
train_data_path = DATA_PATH/DATASET/'train'
ensemble_path = MODEL_PATH/DATASET/f'{SEED+1}' 

el = EnsembleLearner(image_dir='images', 
                     mask_dir=mask_directory, 
                     config=cfg, 
                     path=train_data_path, 
                     ensemble_path=ensemble_path)

el.ds.show_data(max_n=2)
Found 37 images in "/gdrive/MyDrive/deepflash2-paper/data/monuseg/train/images".
Found 37 corresponding masks.
Preprocessing data
100.00% [37/37 00:37<00:00]
Calculated stats {'channel_means': array([164.25767168, 114.08958843, 153.99864962]), 'channel_stds': array([49.35636671, 50.20060809, 40.10077354]), 'max_tiles_per_image': 4}

Train models

  • Train model ensemble with 5 models. For each model,
    • 1 epochs fine-tuning with freezed encoder weights
    • 25 epochs training with all weights unfreezed
  • You can skip this step use the trained models from our paper (see next section).
el.fit_ensemble()

Prediction on test set

We save

  • Segmentations:
    • Semantic segmentation masks (.png)
    • Instance segmentation masks (.tif) using the cellpose flow representations
  • Uncertainties
    • Uncertainty masks (.png)
    • Foreground uncertainty scores U in the 'uncertainty_scores.csv' file
test_data_path = DATA_PATH/DATASET/'test'

# Use the trained model from our paper
ensemble_name = f'{DATASET}_ensemble_{SEED+1}.pt'
ensemble_trained_dir = Path("/content/trained_models")/DATASET
ensemble_trained_dir.mkdir(exist_ok=True, parents=True)
ensemble_trained_path = ensemble_trained_dir/ensemble_name
!wget -O {ensemble_trained_path.as_posix()} https://github.com/matjesg/deepflash2/releases/download/model_library/{ensemble_name}

# Uncomment to use your own trained model from the section above
# ensemble_trained_path = ensemble_path

# Save the predictions here
prediction_path = OUTPUT_PATH/DATASET/f'{SEED+1}'

cfg.instance_labels = True # Test masks are saved as instance labels
ep = EnsemblePredictor('images',
                       'masks', # Uncomment if no masks are avaible
                        path=test_data_path, 
                        config=cfg, 
                        ensemble_path=ensemble_trained_path) 

Predict, save, and show semantic segmentation masks

  • Calculate similarity scores (Dice score) on the test set
  • Only show two example files
_ = ep.get_ensemble_results(export_dir=prediction_path)
_ = ep.score_ensemble_results()
ep.show_ensemble_results(files=['TCGA-2Z-A9J9-01A-01-TS1.tif', 'TCGA-44-2665-01B-06-BS6.tif'])

Predict, save, and show instance segmentation masks

  • Calculate similarity scores (Dice score) on the test set
  • Only show two example files
ep.config.cellpose_diameter=diameter
_ = ep.get_cellpose_results(export_dir=prediction_path)
_ = ep.score_cellpose_results()
ep.show_cellpose_results(files=['TCGA-2Z-A9J9-01A-01-TS1.tif', 'TCGA-44-2665-01B-06-BS6.tif'])

Save and show results table

ep.df_ens.to_csv(prediction_path/'uncertainty_scores.csv', index=False)
display(ep.df_ens[['file', 'dice_score', 'mAP_class1', 'uncertainty_score']])
file dice_score mAP_class1 uncertainty_score
0 TCGA-2Z-A9J9-01A-01-TS1.tif 0.806269 0.353489 0.323787
1 TCGA-44-2665-01B-06-BS6.tif 0.841194 0.413365 0.322402
2 TCGA-69-7764-01A-01-TS1.tif 0.792159 0.309253 0.414765
3 TCGA-A6-6782-01A-01-BS1.tif 0.787972 0.276940 0.416899
4 TCGA-AC-A2FO-01A-01-TS1.tif 0.754228 0.252180 0.400931
5 TCGA-AO-A0J2-01A-01-BSA.tif 0.813167 0.314197 0.336633
6 TCGA-CU-A0YN-01A-02-BSB.tif 0.817137 0.388173 0.339663
7 TCGA-EJ-A46H-01A-03-TSC.tif 0.828593 0.395314 0.320607
8 TCGA-FG-A4MU-01B-01-TS1.tif 0.853829 0.393040 0.280658
9 TCGA-GL-6846-01A-01-BS1.tif 0.833096 0.414631 0.367452
10 TCGA-HC-7209-01A-01-TS1.tif 0.832757 0.384773 0.361541
11 TCGA-HT-8564-01Z-00-DX1.tif 0.809505 0.328213 0.390579
12 TCGA-IZ-8196-01A-01-BS1.tif 0.838419 0.426437 0.338511
13 TCGA-ZF-A9R5-01A-01-TS1.tif 0.877717 0.502591 0.275315