PyTorch Learning Rate Scheduling

You can schedule learning rates for your PyTorch models. See custom learning rate scheduler classes in cerebras.framework.torch.optim.lr_scheduler.

Configuring learning rates

Learning rates can be configured much in the same way as in a typical PyTorch workflow. For example:

from modelzoo.common.pytorch.optim import lr_scheduler

optimizer: torch.optim.Optimizer = ...

scheduler = lr_scheduler.PiecewiseConstant(
    learning_rates=[0.1, 0.001, 0.0001],
    milestones=[1000, 2000]


Unlike a typical PyTorch workflow, Cerebras learning rate schedulers must be stepped every single iteration as opposed to every single epoch. This is done to match the behavior of Cerebras kernels more closely.

For example:

with cbtorch.Session(dataloader, mode="train") as session:
    for epoch in range(num_epochs):
        for inputs, targets in dataloader:
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            scheduler.step()  # lr_scheduler step

Supported learning rate schedulers

The following is a list of supported learning rate schedulers, with their configuration parameters:


The parameter names listed here will differ from how they are specified in the configuration YAML files. The names used in the configuration YAML files were chosen to match the names used in TensorFlow. The names used in our custom classes match the names used in our kernels.

  1. Constant

    Required Params:

    • val: The learning rate.

    Optional Params:

    • decay_steps: The number of steps to use this learning rate. Does nothing if it is the last scheduler specified.

  2. Exponential

    Required Params:

    • learning_rate: The initial learning rate.

    • decay_steps: The number of steps to decay for.

    • decay_rate: The rate at which to decay.

    Optional Params:

    • staircase: Default value: False.

  3. PiecewiseConstant

    Required Params:

    • learning_rates: The learning rate values.

    • milestones: The step boundaries for which to change learning rate values.


      The number of learning rates must be exactly one greater than the number of milestones.

  4. Polynomial

    Required Params:

    • learning_rate: The initial learning rate.

    • end_learning_rate: The end learning rate.

    • decay_steps: The number of steps to decay for.

    • power: The exponent value of the polynomial.


      Only linear polynomial learning rate scheduling is supported at this time. That means that the only value of power that is supported is 1.0.