cerebras.modelzoo.data.nlp.t5.t5_utils.random_spans_noise_mask#

cerebras.modelzoo.data.nlp.t5.t5_utils.random_spans_noise_mask(length, noise_density=0.15, mean_noise_span_length=3.0, rng=None)[source]#

Noise mask consisting of random spans of noise tokens. The number of noise tokens and the number of noise spans and non-noise spans are determined deterministically as follows:

num_noise_tokens = round(length * noise_density) num_nonnoise_spans = num_noise_spans = round( num_noise_tokens / mean_noise_span_length)

Spans alternate between non-noise and noise, beginning with non-noise. Subject to the above restrictions, all masks are equally likely. :param int length: Length of the incoming token sequence. :param float noise_density: A float - approximate density of output mask. :param float mean_noise_span_length: A number used in the noise mask calculation. :param np.random.Generator rng: The numpy random generator to be used as

the source of randomness for this function.

Returns

A boolean np.array with shape [length].