modelzoo.transformers.data_processing.scripts.chunk_preprocessing.summarization_data_token_generator#
SummarizationTokenGenerator Module
This module provides the SummarizationTokenGenerator class which is designed to tokenize prompt/completion data and create features suitable for summarization tasks. The class utilizes the BPETokenizer from the modelzoo.transformers.data_processing.tokenizers package for tokenization.
- Usage:
tokenizer = SummarizationTokenizer(dataset_params,max_sequence_length,tokenizer) tokenized_features = tokenizer.encode((“prompt_text”,”completion_text”))
Functions
Given a list of prompt_ids and completion_ids, generate input sequence and labels. |
Classes
Initialize the SummarizationTokenizer class. |