cerebras.modelzoo.data_preparation.nlp.data_dedup#

deduplicate_dataset

generate_connected_components

generate_duplicate_pairs

This script is used for duplicate pairs generation.

to_hash