modelzoo.vision.pytorch.dit.modeling_dit.DiT#

class modelzoo.vision.pytorch.dit.modeling_dit.DiT[source]#

Bases: torch.nn.Module

Methods

forward

forward_dit

forward_dit_with_cfg

Forward pass of DiT, but also batches the unconditional forward pass for classifier-free guidance.

reset_parameters

__call__(*args: Any, **kwargs: Any) Any#

Call self as a function.

__init__(num_diffusion_steps, schedule_name, beta_start, beta_end, embedding_dropout_rate=0.0, embedding_nonlinearity='silu', position_embedding_type='learned', hidden_size=768, num_hidden_layers=12, layer_norm_epsilon=1e-05, num_heads=12, attention_module_str='aiayn_attention', extra_attention_params={}, attention_type='scaled_dot_product', attention_softmax_fp32=True, dropout_rate=0.0, nonlinearity='gelu', attention_dropout_rate=0.0, use_projection_bias_in_attention=True, use_ffn_bias_in_attention=True, filter_size=3072, use_ffn_bias=True, initializer_range=0.02, default_initializer=None, projection_initializer=None, position_embedding_initializer=None, init_conv_like_linear=False, attention_initializer=None, ffn_initializer=None, timestep_embeddding_initializer=None, label_embedding_initializer=None, head_initializer=None, norm_first=True, latent_size=[32, 32], latent_channels=4, patch_size=[16, 16], use_conv_patchified_embedding=False, frequency_embedding_size=256, num_classes=1000, label_dropout_rate=0.1, block_type=BlockType.ADALN_ZERO, use_conv_transpose_unpatchify=False)[source]#
static __new__(cls, *args: Any, **kwargs: Any) Any#
forward_dit_with_cfg(noised_latent, label, timestep, guidance_scale, num_cfg_channels=3)[source]#

Forward pass of DiT, but also batches the unconditional forward pass for classifier-free guidance. Assumes inputs are already batched with conditional and unconditional parts

Note: For exact reproducibility reasons, classifier-free guidance is applied only three channels by default, hence num_cfg_channels defaults to 3. The standard approach to cfg applies it to all channels.