PyTorch Learning Rate Scheduling
On This Page
PyTorch Learning Rate Scheduling¶
You can schedule learning rates for your PyTorch models. See custom learning rate scheduler classes in cerebras.framework.torch.optim.lr_scheduler
.
Configuring learning rates¶
Learning rates can be configured much in the same way as in a typical PyTorch workflow. For example:
from modelzoo.common.pytorch.optim import lr_scheduler
optimizer: torch.optim.Optimizer = ...
scheduler = lr_scheduler.PiecewiseConstant(
optimizer,
learning_rates=[0.1, 0.001, 0.0001],
milestones=[1000, 2000]
)
Stepping¶
Unlike a typical PyTorch workflow, Cerebras learning rate schedulers must be stepped every single iteration as opposed to every single epoch. This is done to match the behavior of Cerebras kernels more closely.
For example:
with cbtorch.Session(dataloader, mode="train") as session:
for epoch in range(num_epochs):
for inputs, targets in dataloader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
scaler(loss).backward()
optimizer.step()
scheduler.step() # lr_scheduler step
Supported learning rate schedulers¶
The following is a list of supported learning rate schedulers, with their configuration parameters:
Important
The parameter names listed here will differ from how they are specified in the configuration YAML files. The names used in the configuration YAML files were chosen to match the names used in TensorFlow. The names used in our custom classes match the names used in our kernels.
Constant
Required Params:
val
: The learning rate.
Optional Params:
decay_steps
: The number of steps to use this learning rate. Does nothing if it is the last scheduler specified.
Exponential
Required Params:
learning_rate
: The initial learning rate.decay_steps
: The number of steps to decay for.decay_rate
: The rate at which to decay.
Optional Params:
staircase
: Default value:False
.
PiecewiseConstant
Required Params:
learning_rates
: The learning rate values.milestones
: The step boundaries for which to change learning rate values.Alert
The number of learning rates must be exactly one greater than the number of milestones.
Polynomial
Required Params:
learning_rate
: The initial learning rate.end_learning_rate
: The end learning rate.decay_steps
: The number of steps to decay for.power
: The exponent value of the polynomial.Alert
Only linear polynomial learning rate scheduling is supported at this time. That means that the only value of
power
that is supported is1.0
.