.. _supported-pt-learning-rate-schedulers: 

Supported PyTorch Learning Rate Schedulers
==========================================

:ref:`lrscheduler`
  Cerebras specific learning rate scheduler base class.

:ref:`ConstantLR`
  Maintains a constant learning rate for each parameter group (no decaying).

:ref:`PolynomialLR`
  Decays the learning rate of each parameter group using a polynomial function in the given ``decay_steps``.

:ref:`ExponentialLR`
  Decays the learning rate of each parameter group by ``decay_rate`` every step.

:ref:`InverseExponentialTimeDecayLR`
  Decays the learning rate inverse-exponentially over time, as described `here <https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/InverseTimeDecay>`_.

:ref:`InverseSquareRootDecayLR`
  Decays the learning rate inverse-squareroot over time.

:ref:`CosineDecayLR`
  Applies the cosine decay schedule as described `here <https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/CosineDecay>`_.

:ref:`SequentialLR`
  Receives the list of schedulers that is expected to be called sequentially during optimization process and milestone points that provides exact intervals to reflect which scheduler is supposed to be called at a given step.

:ref:`PiecewiseConstantLR`
  Adjusts the learning rate to a predefined constant at each milestone and holds this value until the next milestone.

:ref:`MultiStepLR`
  Decays the learning rate of each parameter group by gamma once the number of steps reaches one of the milestones.

:ref:`StepLR`
  Decays the learning rate of each parameter group by gamma every ``step_size``.

:ref:`CosineAnnealingLR`
  Set the learning rate of each parameter group using a cosine annealing schedule, where 𝜂𝑚𝑎𝑥 is set to the initial lr and 𝑇𝑐𝑢𝑟 is the number of steps since the last restart in SGDR.

:ref:`LambdaLR`
  Sets the learning rate of each parameter group to the initial lr times a given function (which is specified by overriding ``set_lr_lambda``).

:ref:`CosineAnnealingWarmRestarts`
  Set the learning rate of each parameter group using a cosine annealing schedule, where 𝜂𝑚𝑎𝑥 is set to the initial lr, 𝑇𝑐𝑢𝑟 is the number of steps since the last restart and 𝑇𝑖 is the number of steps between two warm restarts in SGDR.

:ref:`MultiplicativeLR`
  Multiply the learning rate of each parameter group by the supplied coefficient.

:ref:`ChainedScheduler`
  Chains list of learning rate schedulers.

:ref:`CyclicLR`
  Sets the learning rate of each parameter group according to cyclical learning rate policy (CLR).

:ref:`OneCycleLR`
  Sets the learning rate of each parameter group according to the 1cycle learning rate policy.