cerebras.modelzoo.data_preparation.nlp.bert.bertsum_data_processor#

Common pre-processing functions for BERTSUM data processing

Functions

check_output

convert_to_json_files

Format input tokenized files into simpler json files.

create_parser

tokenize

Split sentences and perform tokenization.

Classes

BertData

Converts input into bert format.

JsonConverter

JsonConverter simplifies the input and convert it into json files format with source and target (summarized) texts.

RougeBasedLabelsFormatter

Based on the reference n-grams, RougeBasedLabelsFormatter selects sentences from the input with the highest rouge-score calculated between them and the reference.

Tokenizer

Tokenizes files from the input path into output path.