cerebras.modelzoo.data.nlp.dpo.DPOSyntheticDataset.DPOSyntheticDataProcessor#

class cerebras.modelzoo.data.nlp.dpo.DPOSyntheticDataset.DPOSyntheticDataProcessor[source]#

Bases: object

Synthetic dataset generator.

Parameters

params (dict) – dict containing training input parameters for creating dataset.

Expects the following fields:

  • “num_examples (int): Number of training examples

  • “vocab_size” (int): Vocabulary size

  • “max_seq_length (int): Maximum length of the sequence to generate

  • “batch_size” (int): Batch size.

  • “shuffle” (bool): Flag to enable data shuffling.

  • “shuffle_seed” (int): Shuffle seed.

Methods

create_dataloader

Create dataloader.

__init__(params)[source]#
create_dataloader()[source]#

Create dataloader.

Returns

dataloader