cerebras.modelzoo.data.nlp.bert.BertSumCSVDataProcessor.create_bertsum_feature#

cerebras.modelzoo.data.nlp.bert.BertSumCSVDataProcessor.create_bertsum_feature(input_ids, segment_ids, cls_indices, labels, max_sequence_length, max_cls_tokens, pad_id)[source]#

Creates the feature dict for bertsum model after applying padding.

Parameters

input_ids (list) – Token ids to pad.
segment_ids (list) – Segment ids to pad.
cls_indices (list) – Class ids to pad.
labels (list) – Labels to pad.
max_sequence_length (int) – Maximum sequence length.
max_cls_tokens (int) – Max class tokens.
pad_id (int) – Padding id.
tokenize (callable) – Method to tokenize the input sequence.

Returns

dict for feature which includes keys: * ‘input_tokens’: Numpy array with input token indices.

shape: (max_sequence_length), dtype: int32.

’attention_mask’: Numpy array with attention mask.
shape: (max_sequence_length), dtype: int32.
’token_type_ids’: Numpy array with segment ids.
shape: (max_sequence_length), dtype: int32.
’labels’: Numpy array with labels.
shape: (max_cls_tokens), dtype: int32.
’cls_indices’: Numpy array with class indices.
Shape: (max_cls_tokens).
’cls_weights’: Numpy array with class weights.
Shape: (max_cls_tokens).

cerebras.modelzoo.data.nlp.bert.BertSumCSVDataProcessor

cerebras.modelzoo.data.nlp.bert.BertSumCSVDataProcessor.BertSumCSVDataProcessor