cerebras.modelzoo.data.common.h5_map_dataset.dataset#

Classes

HDF5Dataset

Dynamically read samples from disk for using mapping paradigms.

MultiModalHDF5Dataset

Specialized HDF5 dataset class to handle image preprocessing in multimodal datasets Functionality is largely the same as HDF5Dataset except with added image loading and preprocessing :param params: a dictionary containing the following added fields: - "img_data_dir" (str): the path to the directory containing the images. - "fp16_type" (str): the half dtype cast for the image - "image_data_size" (list[int]): the final C x H x W shape of the image - "transforms" (list[dict]): a specification of the torchvision transforms :type params: dict.

RestartableDataLoader

The state we care about for allowing deterministic restart of instances of HDF5Dataset is the total number of samples streamed globally, which gets consumed by the sampler.