modelzoo.transformers.pytorch.bert.fine_tuning.qa.input.BertQADataProcessor.BertQADataProcessor#

class modelzoo.transformers.pytorch.bert.fine_tuning.qa.input.BertQADataProcessor.BertQADataProcessor[source]#

Bases: torch.utils.data.IterableDataset

Reads csv file containing the input token ids, and label_ids. Creates attention_masks and sedment_ids on the fly

Parameters

params – dict containing input parameters for creating dataset.

Expects the following fields:

  • “data_dir” (str or list of str): Path to the metadata files.

  • “batch_size” (int): Batch size.

  • “shuffle” (bool): Flag to enable data shuffling.

  • “shuffle_buffer” (int): Shuffle buffer size.

  • “shuffle_seed” (int): Shuffle seed.

  • “num_workers” (int): Number of PyTorch data workers (see PyTorch docs).

  • “prefetch_factor” (int): How much data to prefetch.

    for better performance (see PyTorch docs).

  • “persistent_workers” (bool): For multi-worker dataloader controls if the

    workers are recreated at the end of each epoch ((see PyTorch docs).

  • “max_sequence_length” (int): Maximum sequence length for the model.

Methods

create_dataloader

Classmethod to create the dataloader object.

load_buffer

Generator to read the data in chunks of size of data_buffer.

__call__(*args: Any, **kwargs: Any) Any#

Call self as a function.

__init__(params)[source]#
static __new__(cls, *args: Any, **kwargs: Any) Any#
create_dataloader()[source]#

Classmethod to create the dataloader object.

load_buffer()[source]#

Generator to read the data in chunks of size of data_buffer.

Returns: Yields the data stored in the data_buffer.