MedMNIST
Overview
MedMNIST is a large-scale MNIST-like collection of standardized biomedical images. In stable-datasets, MedMNIST is exposed via the MedMNIST class, but the actual dataset depends on the selected variant (passed as config_name, e.g. dermamnist, pathmnist). Each variant provides train/validation/test splits.
All 2D variants are pre-processed to 28×28 images and all 3D variants are pre-processed to 28×28×28 volumes, with corresponding labels.
Variants (2D)
2D variants use 28×28 images.
Variant |
Classes |
Train |
Validation |
Test |
Notes |
|---|---|---|---|---|---|
|
9 |
89,996 |
10,004 |
7,180 |
|
|
14 |
78,468 |
11,219 |
22,433 |
multi-label |
|
7 |
7,007 |
1,003 |
2,005 |
|
|
4 |
97,477 |
10,832 |
1,000 |
|
|
2 |
4,708 |
524 |
624 |
|
|
5 |
1,080 |
120 |
400 |
ordinal regression |
|
2 |
546 |
78 |
156 |
|
|
8 |
11,959 |
1,712 |
3,421 |
|
|
8 |
165,466 |
23,640 |
47,280 |
|
|
11 |
34,561 |
6,491 |
17,778 |
|
|
11 |
12,975 |
2,392 |
8,216 |
|
|
11 |
13,932 |
2,452 |
8,827 |
Variants (3D)
3D variants use 28×28×28 volumes.
Variant |
Classes |
Train |
Validation |
Test |
|---|---|---|---|---|
|
11 |
971 |
161 |
610 |
|
2 |
1,158 |
165 |
310 |
|
2 |
1,188 |
98 |
298 |
|
3 |
1,027 |
103 |
240 |
|
2 |
1,335 |
191 |
382 |
|
2 |
1,230 |
177 |
352 |
Data Structure
When accessing an example using ds[i], you will receive a dictionary with the following keys.
2D variants
Key |
Type |
Description |
|---|---|---|
|
|
28×28 image |
|
int / list[int] |
Class label (range depends on the selected variant; |
3D variants
Key |
Type |
Description |
|---|---|---|
|
list |
28×28×28 volume |
|
int |
Class label (range depends on the selected variant) |
Usage Example
Basic Usage (2D variant)
from stable_datasets.images.med_mnist import MedMNIST
# Pick a 2D variant via config_name
variant = "dermamnist"
ds_train = MedMNIST(split="train", config_name=variant)
ds_val = MedMNIST(split="validation", config_name=variant)
ds_test = MedMNIST(split="test", config_name=variant)
sample = ds_train[0]
print(sample.keys()) # {"image", "label"}
image = sample["image"] # PIL.Image.Image
label = sample["label"] # int
# Optional: make it PyTorch-friendly
ds_train_torch = ds_train.with_format("torch")
Basic Usage (2D multi-label variant: chestmnist)
from stable_datasets.images.med_mnist import MedMNIST
variant = "chestmnist"
ds_train = MedMNIST(split="train", config_name=variant)
sample = ds_train[0]
image = sample["image"] # PIL.Image.Image
label = sample["label"] # multi-label vector (length 14)
# Example: indices of positive labels
positives = [i for i, v in enumerate(label) if int(v) == 1]
print("positive label indices:", positives)
Basic Usage (3D variant)
from stable_datasets.images.med_mnist import MedMNIST
variant = "organmnist3d"
ds_train = MedMNIST(split="train", config_name=variant)
sample = ds_train[0]
image = sample["image"] # nested list, shape (28, 28, 28)
label = sample["label"] # int
References
Official website: https://medmnist.com/
License: CC BY 4.0
Citation
@article{medmnistv2,
title={MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification},
author={Yang, Jiancheng and Shi, Rui and Wei, Donglai and Liu, Zequan and Zhao, Lin and Ke, Bilian and Pfister, Hanspeter and Ni, Bingbing},
journal={Scientific Data},
volume={10},
number={1},
pages={41},
year={2023},
publisher={Nature Publishing Group UK London}
}