modelzoo.transformers.data_processing.scripts.chunk_preprocessing#

chunk_data_preprocessor

This module implements a generic data preprocessor called ChunkDataPreprocessor.

create_hdf5_dataset

Script to generate an HDF5 dataset for GPT Models.

data_reader

This module contains helper functions and classes to read data from different formats, process them, and save in HDF5 format.

fim_data_token_generator

FIMTokenGenerator Module

lm_data_token_generator

LMDataTokenGenerator Module

summarization_data_token_generator

SummarizationTokenGenerator Module

utils