stable_pretraining.data#
The data module provides comprehensive tools for dataset handling, transforms, sampling, and data loading in self-supervised learning contexts.
Core Components#
|
PyTorch Lightning DataModule for handling train/val/test/predict dataloaders. |
|
Custom collate function that optionally builds an affinity (or “graph”) matrix based on a specified field. |
|
Base dataset class with transform support and PyTorch Lightning integration. |
Real Data Wrappers#
|
Wrapper for PyTorch datasets with custom column naming and transforms. |
|
Hugging Face dataset wrapper with transform and column manipulation support. |
|
Subset of a dataset at specified indices. |
Synthetic Data Generators#
|
Gaussian Mixture Model dataset for synthetic data generation. |
|
Dataset for Minari reinforcement learning data with step-based access. |
|
Dataset for Minari reinforcement learning data with episode-based access. |
|
Generate Swiss Roll dataset points. |
|
Generate 2D Perlin noise. |
|
Generate 3D Perlin noise at given coordinates. |
Noise Models#
|
Categorical distribution for sampling discrete values with given probabilities. |
|
Exponential mixture noise model for data augmentation or sampling. |
|
Exponential-normal noise model combining exponential and normal distributions. |
Samplers#
|
Samples elements randomly. |
|
Wraps another sampler to yield a mini-batch of indices. |
|
Wraps another sampler to yield a mini-batch of indices. |
Utility Functions#
|
Fold a tensor containing multiple views back into separate views. |
|
Randomly split a dataset into non-overlapping new datasets of given lengths. |
|
Download a file from a URL with progress tracking. |
|
Download multiple files concurrently. |
Modules#
Dataset statistics for normalization. |
|
Synthetic and simulated data generators. |