cerebras.modelzoo.data_preparation.nlp.write_csv_ner#

File: write_csv_ner.py

Use to create pre-processed CSV files for the Data Processor from the NER raw dataset CSV files.

Based on https://github.com/NVIDIA/DeepLearningExamples/blob/master/TensorFlow/LanguageModeling/BERT/run_ner.py with minor modifications

Example Usage:

python write_csv_ner.py –data_dir /cb/ml/language/datasets/blurb/data_generation/data/BC5CDR-chem/ –vocab_file /cb/ml/language/datasets/pubmed_abstracts_baseline_fulltext_vocab/Pubmed_fulltext_vocab.txt –output_dir /cb/ml/language/datasets/ner-pt/bc5cdr-chem-csv –do_lower_case

Functions

convert_examples_to_features_and_write

convert_single_example

update_parser

Add required command-line arguments.

write_csv_files

Classes

InputFeatures

InputFeatures(tokens, labels)