Summarize scalars and tensors#

It is often useful to track various values of interest during training. This section introduces APIs for summarizing various tensors in a PyTorch model and inspecting them during or after a run.

To enable summaries, a SummaryWriter must first be created and passed to the DataExecutor object that is used to execute the model.

import cerebras_pytorch as cstorch
writer = cstorch.utils.tensorboard.SummaryWriter(log_dir="/path/to/logdir")
executor = cstorch.utils.data.DataExecutor(
    dataloader, ..., writer=writer,
)

Scalar Summaries#

Motivation#

It is often useful to visualize various scalar values during training. This may include scalar values such as learning rate, gradient norms, etc. For this, we provide the summarize_scalar API which allows to summarize scalar model tensors. These summaries are written to Tensorboard events files and can be visualized using Tensorboard.

How to Use Scalar Summaries#

The scalar summary API is available as part of cerebras_pytorch package. To summarize a scalar tensor S, add the following statement to the model definition code:

import cerebras_pytorch as cstorch
cstorch.summarize_scalar("my_scalar_tensor", S)

During training, the value of S will be periodically written to the Tensorboard events file and can be visualized in TensorBoard.

Note

If a SummaryWriter object was not passed to the DataExecutor, this method is a no-op and no summaries will be written.

Note

Some common scalar tensors that are generally desirable to track during training are already implemented in the Cerebras Model Zoo. They are disabled by default, but to enable them, set log_summaries: True in the optimizer section of the params file passed to a Cerebras Model Zoo run.

Tensor Summaries#

Motivation#

In the section above, we described how to summarize scalar values, which can be visualized in TensorBoard. However, there are cases where it is desirable to summarize arbitrary tensor shapes. Since TensorBoard only supports visualizing scalar summaries, we provide a separate API, which is very similar to summarize_scalar API, but for summarizing tensors of arbitrary shapes.

How to Use Tensor Summaries#

The tensor summary API is available as part of cerebras_pytorch package. To summarize a tensor T, add the following statement to the model definition code:

import cerebras_pytorch as cstorch
cstorch.summarize_tensor("my_tensor", T)

Under the hood, we mark the provided tensor as an output of the graph and fetch its value at every log step (similar to losses and other scalar summaries). This value is then written out to a file and can be later retrieved through the SummaryReader API (see below).

Here’s a simple example where we’d like to summarize the input features and last layer’s logits of a fully connected network:

import cerebras_pytorch as cstorch

class FC(nn.Module):
    def forward(self, features):
        cstorch.summarize_tensor("features", features)
        logits = self.fc_layer(features)
        cstorch.summarize_tensor("last_layer_logits", logits)
        return logits

To retrieve the saved values of these tensors during or after a run, use the SummaryReader API which supports listing all available tensor names and fetching a tensor by name for a given step. SummaryReader object takes as input a single argument denoting the path to a Tensorboard events file or a directory containing Tensorboard events files. Location of tensor summaries are inferred from these events files as there is a one-to-one mapping from Tensorboard events files and tensor summary directories.

In the example above, we added summaries for features and last_layer_logits. We can then use the SummaryReader API to load the summarized values of these tensors at a given step:

>>> import cerebras_pytorch as cstorch
>>> reader = cstorch.utils.tensorboard.SummaryReader("model_dir/train")
>>> reader.tensor_names()   # Grab all tensor summary names
['features', 'last_layer_logits']
>>> reader.read_tensor("features", 2)   # Load tensor "features" from step 2
TensorDescriptor(step=2, tensor=tensor([[2, 4],
        [6, 8]]), utctime='2023-02-07T05:45:29.017264')
>>> reader.read_tensor("non_existing", 100)   # Load a non-existing tensor
WARNING:root:No tensor with name non_existing has been summarized at step 100

SummaryReader.read_tensor() returns one or more TensorDescriptor objects. TensorDescriptor is a POD structure which holds:

step: The step at which this tensor was summarized.
utctime: The UTC time at which the value was saved.
tensor: The summarized value.

Limitations#

Adding tensor summaries may change how the graph is lowered and can create a different compile. This is because marking a tensor as an output may prevent it from being pruned out in certain operation fusions. From an overall computation standpoint, however, the graphs should be identical. The only difference is how the computation is represented.

Evaluate your model during training

Train with dynamic loss scaling