modelzoo.vision.pytorch.dit.layers.DiTDecoderLayer.DiTDecoderLayer#
- class modelzoo.vision.pytorch.dit.layers.DiTDecoderLayer.DiTDecoderLayer[source]#
Bases:
modelzoo.common.pytorch.layers.TransformerDecoderLayer.TransformerDecoderLayer
Methods
Pass the inputs (and mask) through the decoder layer.
reset_parameters
- __call__(*args: Any, **kwargs: Any) Any #
Call self as a function.
- static __new__(cls, *args: Any, **kwargs: Any) Any #
- forward(tgt: torch.Tensor, memory: Optional[torch.Tensor] = None, tgt_mask: Optional[torch.Tensor] = None, memory_mask: Optional[torch.Tensor] = None, tgt_key_padding_mask: Optional[torch.Tensor] = None, memory_key_padding_mask: Optional[torch.Tensor] = None, rotary_position_embedding_helper: Optional[modelzoo.common.pytorch.model_utils.RotaryPositionEmbeddingHelper.RotaryPositionEmbeddingHelper] = None, past_kv: Optional[Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]]] = None, cache_present_kv: bool = False, self_attn_position_bias: Optional[torch.Tensor] = None, cross_attn_position_bias: Optional[torch.Tensor] = None, **extra_args) Union[torch.Tensor, Tuple[torch.Tensor, Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]]]] [source]#
Pass the inputs (and mask) through the decoder layer.
- Parameters
tgt – the sequence to the decoder layer (required).
memory – the sequence from the last layer of the encoder (required).
tgt_mask – the mask for the tgt sequence (optional).
memory_mask – the mask for the memory sequence (optional).
tgt_key_padding_mask – the mask for the tgt keys per batch (optional).
memory_key_padding_mask – the mask for the memory keys per batch (optional).
rotary_position_embedding_helper (Optional[RotaryPositionEmbeddingHelper]) – A helper class to apply rotary embedding on the input tensor.
past_kv – Past keys and values for self attention and (if applicable) cross attention modules. Key/value tensors have shape
[batch_size, num_heads, seq_length, embed_dim / num_heads]
. (optional).cache_present_kv – Specifies if the present keys and values must be cached and returned. Needed to speed up the computations when the decoder is called within an autoregressive loop. (optional).
self_attn_position_bias – the tensor containing position bias to apply in self-attention, can be obtained from relative or alibi position embeddings.
- Shape:
see the docs in Transformer class.