cerebras.modelzoo.data_preparation.nlp.hdf5_preprocessing.utils.pad_helper#

cerebras.modelzoo.data_preparation.nlp.hdf5_preprocessing.utils.pad_helper(samples_lst, diff, fim_pad_tok_id)[source]#

Helper for padding. We put all padding tokens into the last sequence.

Parameters
  • samples_lst (List[List[int]]) – List of lists that contain token ids

  • diff (int) – Number of tokens to pad

  • fim_pad_tok_id (int) – Id for padding token

Returns

List of lists of token ids with padding

Return type

(List[List[int]])