cerebras.modelzoo.data_preparation.nlp.chunk_data_processing.fim_data_token_generator#
FIMTokenGenerator Module
This module offers the FIMTokenGenerator class, an extension of the LMDataTokenGenerator class, tailored for fill in the middle (FIM) tasks.
- Usage:
from your_module_name import FIMTokenGenerator
# Initialize the token generator with the required parameters tokenizer = FIMTokenGenerator(params, tokenizer_impl, eos_id, pad_id)
# Tokenize and encode text data tokenized_data, stats = tokenizer.encode(“Your sample text to process.”)
Classes
Initialize the FIMPreprocessor class. |