TensorFlow Dynamic Loss Scaling#

Attention

This document presents dynamic loss scaling for TensorFlow. For PyTorch, see PyTorch Dynamic Loss Scaling.

See also

Dynamic Loss Scaling on Cerebras system.

Enabling dynamic loss scaling#

To enable dynamic loss scaling (DLS) with TensorFlow, use the CS system supported Trainer optimizer.

Trainer#

The Trainer optimizer builds the train ops based on the given configuration parameters. This optimizer initializes several parameters that apply to DLS, such as initial loss scaling factor, number of steps before changing the loss scale factor and so on. These settings are optimized for CS system.

Parameters#

params: Input. Datatype dict. Configuration parameters for the Trainer optimizer.

tf_summary: Input. Datatype bool. The flag for summaries. Defaults to False.

mixed_precision: Input. Datatype bool. The flag for mixed precision. Defaults to False.

Example#

The following is an example showing how to use the Trainer optimizer in your code:

First, create an instance of the Trainer optimizer in the __init__(self) section in your code.

# Model trainer
     self.trainer = Trainer(
         params=params["optimizer"],
         tf_summary=tf_summary,
         mixed_precision=params["training"]["mixed_precision"],
     )

Then build the train ops.

def build_train_ops(self, total_loss):
       """
       Setup optimizer and build train ops.
       """
       return self.trainer.build_train_ops(total_loss)

For more details on the CSDynamicLossScale and the Trainer optimizer, refer to the code in the Cerebras Model Zoo repository.

Note

To access the Python code for CSDynamicLossScale and the Trainer optimizer, you will need read permission for Cerebras Model Zoo Git repository.

  • The CSDynamicLossScale object in Cerebras Graph Compiler (CGC) implements the dynamic loss scaling. See LossScale.py.

  • This CSDynamicLossScale object is used by the Trainer optimizer. See Trainer.py.