Snuffy: Efficient Whole Slide Image Classifier

1Sharif University of Technology, 2Amirkabir University of Technology (Tehran Polytechnic)

ECCV 2024

Overview

Overview of Snuffy (a) The WSIs are segmented into 256 X 256 patches at 20X magnification, followed by embedding extraction via a pre-trained ViT. Subsequently, these embeddings are inputted into the Snuffy for patch and WSI classification. (b) The connectivity matrix illustrates the Snuffy attention sparsity patterns, with Class-related Global Attentions, highlighted in darker colors either vertical or horizontal (the darker the more important), Diagonal Attentions depicted with pink, and Random Global Attentions shown in the lightest pink.

Abstract

Whole Slide Image (WSI) classification with multiple instance learning (MIL) in digital pathology faces significant computational challenges. Current methods mostly rely on extensive self-supervised learning (SSL) for satisfactory performance, requiring long training periods and considerable computational resources. At the same time, no pre-training affects performance due to domain shifts from natural images to WSIs. We introduce Snuffy architecture, a novel MIL-pooling method based on sparse transformers that mitigates performance loss with limited pre-training and enables continual few-shot pre-training as a competitive option. Our sparsity pattern is tailored for pathology and is theoretically proven to be a universal approximator with the tightest probabilistic sharp bound on the number of layers for sparse transformers, to date. We demonstrate Snuffy's effectiveness on CAMELYON16 and TCGA Lung cancer datasets, achieving superior WSI and patch-level accuracies. The code is available on https://github.com/jafarinia/snuffy.

Video

Method

Snuffy architecture comprises two key components:

  1. Self-Supervised Continual Pre-Training with PEFT: We leverage Parameter Efficient Fine Tuning (PEFT) in the pathology domain, utilizing the Adaptformer due to its effective design.
  2. Snuffy MIL-Pooling Architecture: Inspired by the complexity of cancer biology and the tissue microenvironment's role in detection, we present the Snuffy MIL-pooling architecture. This architecture introduces a new sparsity pattern for sparse transformers.

Results

Comparison of Performance & Efficiency

There are two families within our architecture:

  • Efficient Snuffy: Trained initially on a natural image dataset, followed by continued training with PEFT on whole-slide images (WSIs).
  • Exhaustive Snuffy: Trained from scratch on WSIs.
Both families utilize the Snuffy MIL-pooling architecture.

Performance (AUC) vs. efficiency (size and time) trade off on CAMELYON16. Although the performance of Efficient Snuffy may be slightly inferior to Exhaustive Snuffy, both methods significantly outperform existing benchmarks in Region-of-Interest (ROI) detection and WSI classification, setting a new state-of-the-art (SOTA)

Leaderboard

AUC Chart on Camelyon16 - Snuffy is SOTA.

Region of Interests

Qualitative view of ROIs recognized by Suffy through its Patch Classification. (a) An example WSI from the test set of the CAMELYON16 dataset. (b) ROIs are identified by Snuffy with black lines delineating the ground truth ROIs.

BibTeX

@misc{jafarinia2024snuffyefficientslideimage,
      title={Snuffy: Efficient Whole Slide Image Classifier}, 
      author={Hossein Jafarinia and Alireza Alipanah and Danial Hamdi and Saeed Razavi and Nahal Mirzaie and Mohammad Hossein Rohban},
      year={2024},
      eprint={2408.08258},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.08258}, 
}