cerebras.modelzoo.common.utils.input.utils.SamplesSaver#

class cerebras.modelzoo.common.utils.input.utils.SamplesSaver[source]#

Bases: object

Manages data samples chunking and saving for numpy arrays.

Parameters
  • data_dir – Path to mounted dir where the samples are dumped

  • max_file_size – Maximum file size (in bytes) for the .npy samples file(s)

  • filename_prefix – (Optional) filename prefix for the .npy file(s)

Methods

add_sample

Adds the np array to internally maintained list of data samples and dumps these to file if the total size exceeds max_file_size threshold.

delete_data_dumps

Cleans up by deleting all dumped data.

flush

Dumps any remaining data samples not yet written to file.

Attributes

dataset_size

Returns the total numer of data samples.

samples_files

Returns the list of .npy file(s).

__init__(data_dir: str, max_file_size: int, filename_prefix: Optional[str] = None)[source]#
Parameters
  • data_dir – Path to mounted dir where the samples are dumped

  • max_file_size – Maximum file size (in bytes) for the .npy samples file(s)

  • filename_prefix – (Optional) filename prefix for the .npy file(s)

property dataset_size: int#

Returns the total numer of data samples.

property samples_files: List[str]#

Returns the list of .npy file(s).

add_sample(data_sample: numpy.array) None[source]#

Adds the np array to internally maintained list of data samples and dumps these to file if the total size exceeds max_file_size threshold.

Parameters

data_sample – np array data sample

flush()[source]#

Dumps any remaining data samples not yet written to file.

delete_data_dumps() None[source]#

Cleans up by deleting all dumped data.