Skip to content

Nuclear Segmentation Pipelines

The GoNuclear repository hosts the code and guides for the pipelines used in the paper A deep learning-based toolkit for 3D nuclei segmentation and quantitative analysis in cellular and tissue context. It is structured in to four folders:

  • stardist/ contains a 3D StarDist training and inference pipeline, run-stardist.
  • plantseg/ contains configuration files for training and inference with PlantSeg.
  • cellpose/ contains scripts for training and inference with Cellpose.
  • evaluation/ contains modules for evaluating the segmentation results.

and are described in this documentation.

Tools and Workflows

StarDist

See GoNuclear Documentation - run-stardist for more details.

This is one of the most important contribution of this repository. If your nuclei are more or less uniform in shape, please consider using the run-stardist pipeline in this repository. It generate separate and round instance segmentation masks for your nuclei images.

  • The code and tutorial for running StarDist inference is in the stardist/ folder
  • The pretrained model is automatically downloaded during inference (also available at BioImage.IO: StarDist Plant Nuclei 3D ResNet)
  • An example of segmentation results is shown below.

stardist_raw_and_segmentation

PlantSeg

See GoNuclear Documentation - PlantSeg for more details.

If your nuclei have irregular shapes, please consider using the PlantSeg pipeline. It generates instance masks for your nuclei images regardless of their nucleus size and shape.

  • The code and tutorial for running PlantSeg inference is in the plantseg/ folder
  • The pretrained model is automatically downloaded during inference (also available at BioImage.IO: PlantSeg Plant Nuclei 3D UNet)
  • An example of segmentation results is shown below.

plantseg_raw_and_gasp_segmentation

Cellpose

See GoNuclear Documentation - Cellpose for more details.

  • The guide for running Cellpose inference and training is in the cellpose/ folder

Data and Models

Training Data and Trained Models

The training data is publicly available on BioImage Archive S-BIAD1026. I organised them in the following structure:

Training data
├── 2d/
│   ├── isotropic/
│      ├── gold/
│      └── initial/
│   └── original/
│       ├── gold/
│       └── README.txt
└── 3d_all_in_one/
    ├── 1135.h5
    ├── 1136.h5
    ├── 1137.h5
    ├── 1139.h5
    └── 1170.h5

Models
├── cellpose/
│   ├── cyto2_finetune/
│      └── gold/
│   ├── nuclei_finetune/
│      ├── gold/
│      └── initial/
│   └── scratch_trained/
│       └── gold/
├── plantseg/
│   └── 3dunet/
│       ├── gold/
│       ├── initial/
│       ├── platinum/
│       └── train_example.yml
└── stardist/
    ├── resnet/
       ├── gold/
       ├── initial/
       └── platinum/
    ├── train_example.yml
    └── unet/
        └── gold/

An example of the raw image:

raw

Some key information about the training data is listed below:

original_voxel_size = {  # z, y, x
    1135: [0.28371836501901143, 0.12678642066720086, 0.12678642066720086],  # validation
    1136: [0.2837183895131086,  0.12756971653115998, 0.12756971653115998],  # training
    1137: [0.2837183895131086,  0.1266211463645486,  0.1266211463645486 ],  # training
    1139: [0.2799036917562724,  0.12674335484590543, 0.12674335484590543],  # training
    1170: [0.27799632231404964, 0.12698523961670266, 0.12698522349145364],  # training
}  # [0.2837, 0.1268, 0.1268] is taken as the median

original_median_extents = {  # z, y, x
    1135: [16, 32, 33],  # validation
    1136: [16, 32, 32],  # training
    1137: [16, 32, 32],  # training
    1139: [16, 32, 33],  # training
    1170: [16, 29, 30],  # training
    'average':
}  # [16, 32, 32] is taken as the median

Note for training Cellpose: The best image form for training StarDist and PlantSeg models are the original forms, i.e. the linked dataset is the one that provide the best results. However, to train Cellpose which only takes 2D training data, the images are prepared to be 2D slices of the rescaled isotropic 3D images. The 2D slices includes all XY, XZ and YZ slices ordered randomly by a random prefix in the file name. The 2D slices are saved as TIFF files and are provided along with the 3D images in the same BioImage Archive S-BIAD1026 repository.

Preparing Data for Inference

Both HDF5 files and TIFF files can be directly used for both run-stardist and plant-seg inference. Go to the respective GoNuclear documentation for more details.

Cite

If you find this work useful, please cite our paper and the respective tools' papers:

@article{vijayan2024deep,
  title={A deep learning-based toolkit for 3D nuclei segmentation and quantitative analysis in cellular and tissue context},
  author={Vijayan, Athul and Mody, Tejasvinee Atul and Yu, Qin and Wolny, Adrian and Cerrone, Lorenzo and Strauss, Soeren and Tsiantis, Miltos and Smith, Richard S and Hamprecht, Fred A and Kreshuk, Anna and others},
  journal={Development},
  volume={151},
  number={14},
  year={2024},
  publisher={The Company of Biologists}
}