Performance Flags#

The Cerebras PyTorch API provides a set of performance and debugging related flags. See Performance/Debug Flags for a comprehensive list of the flags that are available.

On this page we will cover how you can set these performance flags using the Trainer class.

Prerequisites#

Make sure to have read through Trainer Overview and Trainer Configuration Overview which provide the basic overview of how to run Model Zoo models. In this document, you will be using the tools and configurations outlined in those pages.

Scoped Flags#

While you can simply set the flags directly via the Cerebras PyTorch API, it is often the case that you want different flags for say training versus validation.

As such, we provide two callbacks to facilitate this:

With these callbacks, you can set different performance flags for training and validation.

For example, one of the most important flags that you may want to set is the micro batch size (see working_with_microbatches for more details on micro batching).

While you could set it globally by setting the cerebras.pytorch.backends.csx.performance.micro_batch_size flag, if you want to set different micro batch sizes for training and validation, you can set the values as follows.

trainer:
  init:
    ...
    callbacks:
    - ScopedTrainFlags:
        csx.performance.micro_batch_size: auto
    - ScopedValidateFlags:
        csx.performance.micro_batch_size: 2
  ...

Conclusion#

Setting performance flags in the Trainer is a crucial step to optimize and debug your model training and validation processes. By leveraging the ScopedTrainFlags and ScopedValidateFlags callbacks, you can fine-tune your settings to cater to different stages of your workflow, such as assigning distinct micro batch sizes for training and validation. This flexibility allows for a more tailored and efficient training process, ensuring that you can maximize the performance of Model Zoo models.

Further Reading#

To learn more about how you can extend the capabilities of the Trainer class, you can check out: