cerebras.modelzoo.tools.checkpoint_converters.streaming_checkpoints.StreamingShardedHFReader#

class cerebras.modelzoo.tools.checkpoint_converters.streaming_checkpoints.StreamingShardedHFReader[source]#

Bases: object

Allows sharded HuggingFace checkpoints to be read in a streaming manner rather than loading all shards into memory all at once. The underlying checkpoint is read-only.

Only one shard is stored into memory at a time. For this reason, accessing random keys may slow due to the switching cost (loading) between shards. For this reason, it is recommend that keys are accessed in the order given by self.keys() or self.__iter__() as keys that appear in the same shard are in consecutive order.

Parameters

index_file – Path to .index.json file.

Methods

items

keys

load_shard

values

__init__(index_file: str) None[source]#