cerebras.modelzoo.data_preparation.nlp.chunk_data_processing.nlg_token_generator.NLGTokenGenerator#

class cerebras.modelzoo.data_preparation.nlp.chunk_data_processing.nlg_token_generator.NLGTokenGenerator[source]#

Bases: object

Token Generator for NLG data sets such as E2E, DART, and WebNLG. Assumes the dataset has already been tokenized. Expect .jsonl input files that contains a “context” and a “completion” key. Used with GptHDF5DataProcessor.

Methods

encode

__init__(max_seq_length)[source]#