cerebras.modelzoo.common.utils.input.utils.SamplesSaver#

class cerebras.modelzoo.common.utils.input.utils.SamplesSaver(data_dir, max_file_size, filename_prefix=None)[source]#

Bases: object

Manages data samples chunking and saving for numpy arrays.

Constructs a SamplesSaver instance.

Parameters

data_dir (str) – Path to mounted dir where the samples are dumped
max_file_size (int) – Maximum file size (in bytes) for the .npy samples file(s)
filename_prefix (Optional[str]) – (Optional) filename prefix for the .npy file(s)

Methods

`add_sample`	Adds the np array to internally maintained list of data samples and dumps these to file if the total size exceeds max_file_size threshold.
`delete_data_dumps`	Cleans up by deleting all dumped data.
`flush`	Dumps any remaining data samples not yet written to file.

Attributes

`dataset_size`	Returns the total number of data samples.
`samples_files`	Returns the list of .npy file(s).

property samples_files: List[Tuple[str, int]]#: Returns the list of .npy file(s).

add_sample(data_sample)[source]#

Adds the np array to internally maintained list of data samples and dumps these to file if the total size exceeds max_file_size threshold.

delete_data_dumps()[source]#

Cleans up by deleting all dumped data.

cerebras.modelzoo.common.utils.input.utils

cerebras.modelzoo.common.utils.model