cerebras.modelzoo.data_preparation.nlp.data_dedup.generate_duplicate_pairs.optimal_param#

cerebras.modelzoo.data_preparation.nlp.data_dedup.generate_duplicate_pairs.optimal_param(threshold, num_perm, false_positive_weight, false_negative_weight)[source]#

Compute the optimal MinHashLSH parameter that minimizes the weighted sum of probabilities of false positive and false negative.