cerebras.modelzoo.data_preparation.data_preprocessing.nlg_token_generator.NLGTokenGenerator#

class cerebras.modelzoo.data_preparation.data_preprocessing.nlg_token_generator.NLGTokenGenerator(max_seq_length)[source]#

Bases: object

Token Generator for NLG data sets such as E2E, DART, and WebNLG. Assumes the dataset has already been tokenized. Expect .jsonl input files that contains a “context” and a “completion” key. Used with GptHDF5DataProcessor.

Methods

encode

parse_semantic_data_array