cerebras.modelzoo.data_preparation.nlp.pile.download#

Functions

debug_or_download_individual_file

Download a single file from url to specified filepath.

download_pile

Download The Pile dataset from eye.ai website.

download_tokenizer_files

Download files needed for tokenization for dataset creation.

get_urls_for_tokenizer_files

Get urls for downloading files for tokenization.

get_urls_from_split

Get urls given split of dataset.

main

Main function for execution.

parse_args

Argparser definition for command line arguments from user.