cerebras.modelzoo.data_preparation.nlp.t5.utils.construct_denoising_objective#

cerebras.modelzoo.data_preparation.nlp.t5.utils.construct_denoising_objective(tokens, vocab_size, sos_token, eos_token, rng)[source]#

Formats a raw sequence into a corrupted sequence and corresponding denoising targets. :param list tokens: A list of uncorrupted token indices. :param int vocab_size: The size of the vocabulary. :param int sos_token: The index of the SOS token in the vocabulary. :param int eos_token: The index of the EOS token in the vocabulary. :param np.random.Generator rng: The numpy random generator to be used as

the source of randomness for this function.

Returns

a tuple (feature_dict, label) of denoising source and target numpy arrays.