DataModule#
- class stable_ssl.data.DataModule(train: dict | DictConfig | DataLoader | None = None, test: dict | DictConfig | DataLoader | None = None, val: dict | DictConfig | DataLoader | None = None, predict: dict | DictConfig | DataLoader | None = None, **kwargs)[source]#
Bases:
LightningDataModule
PyTorch Lightning DataModule for handling train/val/test/predict dataloaders.
- load_state_dict(state_dict)[source]#
Called when loading a checkpoint, implement to reload datamodule state given datamodule state_dict.
- Parameters:
state_dict – the datamodule state returned by
state_dict
.
- predict_dataloader()[source]#
An iterable or collection of iterables specifying prediction samples.
For more information about multiple dataloaders, see this section.
It’s recommended that all data downloads and preparation happen in
prepare_data()
.predict()
prepare_data()
Note
Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.
- Returns:
A
torch.utils.data.DataLoader
or a sequence of them specifying prediction samples.
- setup(stage)[source]#
Called at the beginning of fit (train + validate), validate, test, or predict. This is a good hook when you need to build models dynamically or adjust something about them. This hook is called on every process when using DDP.
- Parameters:
stage – either
'fit'
,'validate'
,'test'
, or'predict'
Example:
class LitModel(...): def __init__(self): self.l1 = None def prepare_data(self): download_data() tokenize() # don't do this self.something = else def setup(self, stage): data = load_data(...) self.l1 = nn.Linear(28, data.num_classes)
- state_dict()[source]#
Called when saving a checkpoint, implement to generate and save datamodule state.
- Returns:
A dictionary containing datamodule state.
- teardown(stage: str)[source]#
Called at the end of fit (train + validate), validate, test, or predict.
- Parameters:
stage – either
'fit'
,'validate'
,'test'
, or'predict'
- test_dataloader()[source]#
An iterable or collection of iterables specifying test samples.
For more information about multiple dataloaders, see this section.
For data processing use the following pattern:
download in
prepare_data()
process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
test()
prepare_data()
Note
Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
Note
If you don’t need a test dataset and a
test_step()
, you don’t need to implement this method.
- train_dataloader()[source]#
An iterable or collection of iterables specifying training samples.
For more information about multiple dataloaders, see this section.
The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
For data processing use the following pattern:
download in
prepare_data()
process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
fit()
prepare_data()
Note
Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
- val_dataloader()[source]#
An iterable or collection of iterables specifying validation samples.
For more information about multiple dataloaders, see this section.
The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
It’s recommended that all data downloads and preparation happen in
prepare_data()
.fit()
validate()
prepare_data()
Note
Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.
Note
If you don’t need a validation dataset and a
validation_step()
, you don’t need to implement this method.