cerebras.modelzoo.data.nlp.bert.BertSumCSVDataProcessor.create_bertsum_feature#

cerebras.modelzoo.data.nlp.bert.BertSumCSVDataProcessor.create_bertsum_feature(input_ids, segment_ids, cls_indices, labels, max_sequence_length, max_cls_tokens, pad_id)[source]#

Creates the feature dict for bertsum model after applying padding.

Parameters
  • input_ids (list) – Token ids to pad.

  • segment_ids (list) – Segment ids to pad.

  • cls_indices (list) – Class ids to pad.

  • labels (list) – Labels to pad.

  • max_sequence_length (int) – Maximum sequence length.

  • max_cls_tokens (int) – Max class tokens.

  • pad_id (int) – Padding id.

  • tokenize (callable) – Method to tokenize the input sequence.

Returns

dict for feature which includes keys: * ‘input_tokens’: Numpy array with input token indices.

shape: (max_sequence_length), dtype: int32.

  • ’attention_mask’: Numpy array with attention mask.

    shape: (max_sequence_length), dtype: int32.

  • ’token_type_ids’: Numpy array with segment ids.

    shape: (max_sequence_length), dtype: int32.

  • ’labels’: Numpy array with labels.

    shape: (max_cls_tokens), dtype: int32.

  • ’cls_indices’: Numpy array with class indices.

    Shape: (max_cls_tokens).

  • ’cls_weights’: Numpy array with class weights.

    Shape: (max_cls_tokens).