cerebras.modelzoo.data_preparation.nlp.tokenizers.BPETokenizer.get_pairs#

cerebras.modelzoo.data_preparation.nlp.tokenizers.BPETokenizer.get_pairs(word)[source]#

Return set of symbol pairs in a word.

Word is represented as tuple of symbols (symbols being variable-length strings).