cerebras.modelzoo.data.common.input_utils.check_sharding_sanity#

cerebras.modelzoo.data.common.input_utils.check_sharding_sanity(examples_per_file, batch_size, num_workers, drop_last)[source]#

Checks if with the given sharding, at least one batch is generated.

Note that this method is operating based on how shard_and_shuffle_data is sharding the data across workers.

Parameters
  • examples_per_file (list) – Total examples per file for this task.

  • batch_size (int) – Batch size of the model.

  • num_workers (int) – Number of workers to use in the dataloader.

  • drop_last (bool) – Boolean indicating whether the last incomplete batch of the dataloader is dropped.

Raises

ValueError – If no batches are generated with the given sharding.