cerebras.modelzoo.data_preparation.data_preprocessing.vsl_finetuning_token_generator#

This module provides the VSLFinetuningTokenGenerator class, which extends the FinetuningTokenGenerator for processing tokenized text data specifically for variable-length sequence summarization (VSLS). The class includes methods for processing chunks of tokenized text, encoding documents for text summarization, and optimizing the representation of tokenized data by merging shorter sequences within a specified maximum sequence length.

Functions

create_features_finetuning_vsl

Given a list of VSL sequences, generate input features and labels.

Classes

VSLFinetuningTokenGenerator

Token generator for variable-length sequence summarization (VSLS).