cerebras.modelzoo.data.nlp.bert.bert_utils.build_vocab#

cerebras.modelzoo.data.nlp.bert.bert_utils.build_vocab(vocab_file, do_lower, oov_token)[source]#

Load up the vocab file. :param: str vocab_file: Path to the vocab file. :param: bool do_lower: Whether the tokens should be

converted to lower case.

Parameters

oov_token (str) – Token reserved for the out of vocabulary tokens.

Returns

A tuple with: * dict vocab: Contains the words from the vocab as keys

and indices as values.

  • int vocab_size: Size of the resulted vocab.