modelzoo.transformers.data_processing.scripts.chunk_preprocessing.lm_data_token_generator#
LMDataTokenGenerator Module
This module provides the LMDataTokenGenerator class which is designed to process text data and create features suitable for language modeling tasks.
- Usage:
tokenizer = LMDataTokenGenerator(dataset_params,max_sequence_length,tokenizer) tokenized_features = tokenizer.encode(“Sample text for processing.”)
Functions
Given a list of token_ids, generate input sequence and labels. |
Classes
Initialize the LMDataTokenGenerator class. |