Model is too large to fit on the device#

The memory requirements of your model are too large to fit on the device. Please see below for model-specific workarounds:

  • Transformer models: please compile again with the batch size set to 1 to determine if the specified maximum sequence length is feasible. If the max sequence length is feasible, you can try a smaller batch size or enable gradient accumulation for supported models (all NLP models except T5) by setting the use_cs_grad_accum parameter to True in the runconfig section of your model’s yaml file.

  • Vision models (CNNs): try manually decreasing the batch size and/or the image/volume size.

Note

For more information on using gradient accumulation while training in the Cerebras cluster, visit Train with gradient accumulation

Observed Error#

Model is too large to fit on the device. This can happen because of a large batch size, large input tensor dimensions, or other network parameters. Please refer to the Troubleshooting section in the documentation for potential workarounds.