cerebras.modelzoo.data.nlp.bert.bert_utils#

Functions

build_vocab

Load up the vocab file. :param: str vocab_file: Path to the vocab file. :param: bool do_lower: Whether the tokens should be converted to lower case. :param str oov_token: Token reserved for the out of vocabulary tokens.

convert_to_unicode

Converts text to unicode, assuming utf-8 input.

create_masked_lm_predictions

Creates the predictions for the masked LM objective.

get_meta_data

Read data from meta files.

get_whole_word_span

Returns the whole word start and end indices.

parse_text

Postprocessing of the CSV file.

shard_and_shuffle_data

Shard the data across the processes.

Classes

Vocab

Class to store vocab related attributes.