cerebras.modelzoo.data_preparation.nlp.pubmed#

Downloader

Wrapper script to download PubMed datasets Reference: https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT

TextFormatting

Script to format PubMed Fulltext commercial, PubMed Baseline and Update file Abstracts

TextSharding

Script to shard into separate train and test dataset files