cerebras.modelzoo.data.nlp.gpt.InferenceDataProcessor.get_token_ids#

cerebras.modelzoo.data.nlp.gpt.InferenceDataProcessor.get_token_ids(text: str, tokenizer: Union[tokenizers.Tokenizer, transformers.PreTrainedTokenizerBase]) List[int][source]#

Get encoded token ids from a string using the specified tokenizer.

Parameters
  • text (str) – The input string.

  • tokenizer (Tokenizer) – Tokenizer class from huggingface tokenizers library.

Returns

List of token ids.

Return type

List[int]