modelzoo.vision.pytorch.dit.samplers.DDIMSampler.DDIMSampler#

class modelzoo.vision.pytorch.dit.samplers.DDIMSampler.DDIMSampler[source]#

Bases: modelzoo.vision.pytorch.dit.samplers.SamplerBase.SamplerBase

Denoising diffusion implicit models is a scheduler that extends the denoising procedure introduced in denoising diffusion probabilistic models (DDPMs) with non-Markovian guidance.

For more details, see the original paper: https://arxiv.org/abs/2010.02502

Parameters
  • num_diffusion_steps (int) – number of diffusion steps used to train the model.

  • beta_start (float) – the starting beta value of inference.

  • beta_end (float) – the final beta value.

  • schedule_name (str) – the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear.

  • eta (float) – weight of noise for added noise in diffusion step. Refer to Eqn 16. DDPM when η=1 and DDIM when η=0

  • clip_sample (bool, default False) – option to clip predicted sample for numerical stability.

  • set_alpha_to_one (bool, default True) – each diffusion step uses the value of alphas product at that step and at the previous one. For the final step there is no previous alpha. When this option is True the previous alpha product is fixed to 1, otherwise it uses the value of alpha at step 0.

  • thresholding (bool, default False) – whether to use the “dynamic thresholding” method (introduced by Imagen, https://arxiv.org/abs/2205.11487). Note that the thresholding method is unsuitable for latent-space diffusion models (such as stable-diffusion).

  • dynamic_thresholding_ratio (float, default 0.995) – the ratio for the dynamic thresholding method. Default is 0.995, the same as Imagen (https://arxiv.org/abs/2205.11487). Valid only when thresholding=True.

  • sample_max_value (float, default 1.0) – the threshold value for dynamic thresholding. Valid only when thresholding=True.

  • clip_sample_range (float, default 1.0) – the maximum magnitude for sample clipping. Valid only when clip_sample=True.

  • rescale_betas_zero_snr (bool, default False) – whether to rescale the betas to have zero terminal SNR (proposed by https://arxiv.org/pdf/2305.08891.pdf). This can enable the model to generate very bright and dark samples instead of limiting it to samples with medium brightness. Loosely related to [–offset_noise](https://github.com/huggingface/diffusers/blob/74fd735eb073eb1d774b1ab4154a0876eb82f055/examples/dreambooth/train_dreambooth.py#L506).

  • use_clipped_model_output (bool) – if True, compute “corrected” model_output from the clipped predicted original sample. Necessary because predicted original sample is clipped to [-1, 1] when self.config.clip_sample is True. If no clipping has happened, “corrected” model_output would coincide with the one provided as input and use_clipped_model_output will have not effect.

  • num_inference_steps (str) – string containing comma-separated numbers, indicating the step count per section. For example, if there’s 300 num_diffusion_steps and num_inference_steps=`10,15,20` then the first 100 timesteps are strided to be 10 timesteps, the second 100 are strided to be 15 timesteps, and the final 100 are strided to be 20. Can either pass custom_timesteps (or) num_inference_steps, but not both.

  • custom_timesteps (List[int]) – List of timesteps to be used during sampling. Should be in decreasing order. Can either pass custom_timesteps (or) num_inference_steps, but not both.

Methods

previous_timestep

Returns the previous timestep based on current timestep.

set_timesteps

Computes timesteps to be used during sampling

step

Predict the sample at the previous timestep by reversing the SDE.

__init__(num_diffusion_steps: int = 1000, beta_start: float = 0.0001, beta_end: float = 0.02, schedule_name: str = 'linear', eta: float = 0.0, clip_sample: bool = False, set_alpha_to_one: bool = True, thresholding: bool = False, dynamic_thresholding_ratio: float = 0.995, sample_max_value: float = 1.0, clip_sample_range: float = 1.0, rescale_betas_zero_snr: bool = False, use_clipped_model_output: bool = False, num_inference_steps: Optional[int] = None, custom_timesteps: Optional[List[int]] = None)[source]#
previous_timestep(timestep)[source]#

Returns the previous timestep based on current timestep. Depends on the timesteps computed in self.set_timesteps

set_timesteps(num_diffusion_steps, num_inference_steps, custom_timesteps)[source]#

Computes timesteps to be used during sampling

Parameters
  • num_diffusion_steps (int) – Total number of steps the model was trained on

  • num_inference_steps (str) – string containing comma-separated numbers, indicating the step count per section. For example, if there’s 300 num_diffusion_steps and num_inference_steps=`10,15,20` then the first 100 timesteps are strided to be 10 timesteps, the second 100 are strided to be 15 timesteps, and the final 100 are strided to be 20. Can either pass custom_timesteps (or) num_inference_steps, but not both.

  • custom_timesteps (List[int]) – User specified list of timesteps to be used during sampling.

step(pred_noise: torch.FloatTensor, pred_var: torch.FloatTensor, timestep: int, sample: torch.FloatTensor, generator=None, return_dict: bool = True) Union[modelzoo.vision.pytorch.dit.samplers.DDIMSampler.DDIMSamplerOutput, Tuple][source]#

Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).

Parameters
  • pred_noise (torch.FloatTensor) – predicted eps output from learned diffusion model.

  • pred_var (torch.FloatTensor) – Model predicted values used in variance computation.`υ` in Eqn 15.

  • timestep (int) – current discrete timestep in the diffusion chain.

  • sample (torch.FloatTensor) –

  • generator – random number generator.

  • return_dict (bool) – option for returning tuple rather than DDIMSchedulerOutput class

Returns

prev_sample, pred_original_sample)] if return_dict is True (or) tuple. When returning a tuple, the first element is the prev_sample tensor and second element is pred_original_sample

Return type

[DDIMSamplerOutput (with keys