cerebras.modelzoo.data_preparation.nlp.hdf5_preprocessing.utils.wikitext_detokenizer#

cerebras.modelzoo.data_preparation.nlp.hdf5_preprocessing.utils.wikitext_detokenizer(string)[source]#

Detokenizer for wikitext. Used for special handling of data for substrings.

Parameters

string (str) – String to detoknize before tokenization.

Returns

Detokenized string