cerebras.modelzoo.data_preparation.nlp.pile.download.get_urls_from_split#

cerebras.modelzoo.data_preparation.nlp.pile.download.get_urls_from_split(split)[source]#

Get urls given split of dataset.

Parameters

split (str) – Split of dataset to get urls for.

Returns

List of urls, containing jsonl.zst file names for downloading.