.. _pytorch-dynamic-loss-scaling: PyTorch Dynamic Loss Scaling ============================ .. attention:: This document presents dynamic loss scaling for PyTorch. For TensorFlow, see :ref:`tf-dynamic-loss-scaling`. .. seealso:: :ref:`dynamic-loss-scaling` on Cerebras system. Dynamic loss scaling is supported for PyTorch. It is configurable via the ``cbtorch.amp.GradScaler`` module. The following are the supported configuration parameters: - ``loss_scale``: Must be ``"dynamic"`` for dynamic loss scaling. - ``initial_loss_scale``: Default value: ``2e15``. - ``steps_per_increase``: Default value: ``2000``. - ``min_loss_scale``: Default value: ``2e-14``. - ``max_loss_scale``: Default value: ``2e15``. - ``max_gradient_norm``: For dynamic loss scaling with global gradient clipping. See :ref:`pytorch-gradient-clipping`. These can be passed in via the ``amp.GradScaler`` constructor. For example: .. code-block:: python from cerebras.framework.torch import amp scaler = amp.GradScaler( loss_scale="dynamic" # DLS optimizer (loss_scale=='dynamic') initial_loss_scale=2e15, steps_per_increase=2000, min_loss_scale=2e-14, max_loss_scale=2e15, max_gradient_norm=..., ) The ``GradScaler`` is used to wrap the loss Tensor and scales it before the backwards pass occurs .. code-block:: python from cerebras.framework.torch import amp ... scaler = amp.GradScaler(...) ... for inputs in dataloader: loss = model(inputs) scaler(loss).backwards()