modelzoo.vision.pytorch.dit.layers.DiTDecoderLayer.DiTDecoderLayer#

class modelzoo.vision.pytorch.dit.layers.DiTDecoderLayer.DiTDecoderLayer[source]#

Bases: modelzoo.common.pytorch.layers.TransformerDecoderLayer.TransformerDecoderLayer

Methods

forward

Pass the inputs (and mask) through the decoder layer.

reset_parameters

__call__(*args: Any, **kwargs: Any) Any#

Call self as a function.

__init__(gate_res=True, **kwargs)[source]#
static __new__(cls, *args: Any, **kwargs: Any) Any#
forward(tgt: torch.Tensor, memory: Optional[torch.Tensor] = None, tgt_mask: Optional[torch.Tensor] = None, memory_mask: Optional[torch.Tensor] = None, tgt_key_padding_mask: Optional[torch.Tensor] = None, memory_key_padding_mask: Optional[torch.Tensor] = None, rotary_position_embedding_helper: Optional[modelzoo.common.pytorch.model_utils.RotaryPositionEmbeddingHelper.RotaryPositionEmbeddingHelper] = None, past_kv: Optional[Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]]] = None, cache_present_kv: bool = False, self_attn_position_bias: Optional[torch.Tensor] = None, cross_attn_position_bias: Optional[torch.Tensor] = None, **extra_args) Union[torch.Tensor, Tuple[torch.Tensor, Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]]]][source]#

Pass the inputs (and mask) through the decoder layer.

Parameters
  • tgt – the sequence to the decoder layer (required).

  • memory – the sequence from the last layer of the encoder (required).

  • tgt_mask – the mask for the tgt sequence (optional).

  • memory_mask – the mask for the memory sequence (optional).

  • tgt_key_padding_mask – the mask for the tgt keys per batch (optional).

  • memory_key_padding_mask – the mask for the memory keys per batch (optional).

  • rotary_position_embedding_helper (Optional[RotaryPositionEmbeddingHelper]) – A helper class to apply rotary embedding on the input tensor.

  • past_kv – Past keys and values for self attention and (if applicable) cross attention modules. Key/value tensors have shape [batch_size, num_heads, seq_length, embed_dim / num_heads]. (optional).

  • cache_present_kv – Specifies if the present keys and values must be cached and returned. Needed to speed up the computations when the decoder is called within an autoregressive loop. (optional).

  • self_attn_position_bias – the tensor containing position bias to apply in self-attention, can be obtained from relative or alibi position embeddings.

Shape:

see the docs in Transformer class.