modelzoo.transformers.data_processing.scripts.hdf5_preprocessing.utils.create_features_summarization_vsl#
- modelzoo.transformers.data_processing.scripts.hdf5_preprocessing.utils.create_features_summarization_vsl(bin, max_sequence_length, num_pad, pad_id=0, eos_id=0, sep_id=None, inverted_mask=False, input_ids_dtype='int32', input_mask_dtype='int32', labels_dtype='int32', attention_span_dtype='int32', position_ids_dtype='int32')[source]#
Given a list of VSL sequences, generate input features and labels.
- Parameters
bin (list(sequence)) – list of VSL sequences.
max_sequence_length (int) – Maximum sequence length for data writes.
num_pad (int) – number of padding tokens in the sequence.
pad_id (int) – Id for pad token. Defaults to 0.
eos_id (int) – Id for end of sequence token. Defaults to 0.
sep_id (int) – Id for separator token. Defaults to None.
inverted_mask (bool) – Invert mask if specified for runtime execution. Defaults to False.
input_ids_dtype (str) – Dtype as string for input ids. Defaults to int32.
input_mask_dtype (str) – Dtype as string for input mask. Defaults to int32.
labels_dtype (str) – Dtype as string for labels. Defaults to int32.
attention_span_dtype (str) – Dtype as string for keys attention span in VSL. Defaults to int32.
position_ids_dtype (str) – Dtype as string for position ids in VSL. Defaults to int32.
- Returns
Tuple containing features and labels