modelzoo.transformers.pytorch.transformer_utils#

Functions

`build_broadcastable_attention_mask`	Create broadcastable attention mask (full or causal) so that masked positions are ignored.
`create_2D_autoregressive_mask`	Creates a reverted autoregressive (upper triangular) mask where the 0s refers to the tokens
`create_2D_full_mask`	Create autoregressive (triangular) mask.
`create_vsl_autoregressive_mask`	Create autoregressive (triangular) mask for variable sequence length.
`get_extended_attention_mask`	Makes broadcastable attention and causal masks so that future and masked tokens are ignored. :param attention_mask: Mask with ones indicating tokens to attend to, zeros for tokens to ignore. :type attention_mask: `torch.Tensor` :param input_shape: The shape of the input to the model (required for causal masks) :type input_shape: `Tuple[int]` :param causal: (bool): if enabled the returned mask will be causal :param device: (`torch.device`): The device of the input to the model.
`make_key_padding_mask_broadcastable`	Makes broadcastable key_padding masks so that padding tokens are ignored.
`make_sparse_mask_broadcastable`	Create broadcastable sparse mask so that masked positions are ignored.
`smooth_loss`	Add label smoothing to loss function, this is a workaround method of label smoothing in our system

modelzoo.transformers.pytorch.transformer.utils.set_defaults

modelzoo.transformers.pytorch.transformer_utils.build_broadcastable_attention_mask