modelzoo.transformers.data_processing.scripts.hdf5_preprocessing.hdf5_dataset_preprocessors#

Functions

split_text_and_tokenize

Function to split the text into smaller sequences of length max_tok_len and then tokenize each of the smaller sequences.

Classes

LMDataPreprocessor

SummarizationPreprocessor