.. _pytorch-dynamic-loss-scaling:

PyTorch Dynamic Loss Scaling
============================

.. attention::

	This document presents dynamic loss scaling for PyTorch. For TensorFlow, see :ref:`tf-dynamic-loss-scaling`.

.. seealso::

    :ref:`dynamic-loss-scaling` on Cerebras system.

Dynamic loss scaling is supported for PyTorch. It is configurable via the ``cbtorch.amp.GradScaler`` module. The following are the supported configuration parameters:

- ``loss_scale``: Must be ``"dynamic"`` for dynamic loss scaling.
- ``initial_loss_scale``: Default value: ``2e15``.
- ``steps_per_increase``: Default value: ``2000``.
- ``min_loss_scale``: Default value: ``2e-14``.
- ``max_loss_scale``: Default value: ``2e15``.
- ``max_gradient_norm``: For dynamic loss scaling with global gradient clipping. See :ref:`pytorch-gradient-clipping`.

These can be passed in via the ``amp.GradScaler`` constructor. For example:

.. code-block:: python

    from cerebras.framework.torch import amp

    scaler = amp.GradScaler(
        loss_scale="dynamic"
        # DLS optimizer (loss_scale=='dynamic')
        initial_loss_scale=2e15,
        steps_per_increase=2000,
        min_loss_scale=2e-14,
        max_loss_scale=2e15,
        max_gradient_norm=...,
    )

The ``GradScaler`` is used to wrap the loss Tensor and scales it before the backwards pass occurs

.. code-block:: python

    from cerebras.framework.torch import amp

    ...
    scaler = amp.GradScaler(...)
    ...

    for inputs in dataloader:
        loss = model(inputs)
        scaler(loss).backwards()