cerebras.modelzoo.data.common.input_utils#

Functions

check_sharding_sanity

Checks if with the given sharding, at least one batch is generated.

cluster_config

Returns (ClusterSpec, TaskSpec). The TaskSpec contains the following fields:

get_data_for_task

Function to get distribute files with given number of examples such that each distributed task has access to exactly the same number of examples

is_distributed

Returns True if DDP is enabled.

num_tasks

shard_list_contiguous

Shards a list by splitting it into num_workers contiguous segments. Only the `worker_id`th shard is returned. If the length of the list is not divisible by the number of workers, the last worker will be assigned all remainder elements.

shard_list_interleaved

Shards a list by assigning consecutive elements to alternating workers (i.e.

shard_list_of_chunks_contiguous

Shards a list of chunks by distributing contiguous segments of each chunk across shards.

task_id

Classes

ShardedSampler

Modified from: https://pytorch.org/docs/stable/_modules/torch/utils/data/distributed.html#DistributedSampler Sampler that restricts data loading to a subset of the dataset.

SubsetSequentialSampler

Samples elements sequentially, starting from given start_index,