cerebras.modelzoo.data.common.HDF5IterableDataset.RestartableDataLoader#

class cerebras.modelzoo.data.common.HDF5IterableDataset.RestartableDataLoader(*args, **kwargs)[source]#

Bases: torch.utils.data.DataLoader

This custom dataloader class specifies the ‘state_dict’, ‘aggregate_state_dict’, ‘load_state_dict’ and ‘deaggregate_state_dict’ methods. These methods dictate what worker state information is stored (local or global streaming info) and how it is to be aggregated and retrieved. To deterministically restart an instance of HDF5IterableDataset, it requires the number of samples already seen in the previous run. This info is stored in the samples_streamed key inside the worker state dict. Upon restart, the load_state_dict method sets the samples_seen class variable which determines the number of samples to be skipped.

Methods

`aggregate_state_dict`
`deaggregate_state_dict`
`load_state_dict`
`state_dict`

cerebras.modelzoo.data.common.HDF5IterableDataset.HDF5IterableDataset

cerebras.modelzoo.data.common.SyntheticDataProcessor