Model is too large to fit on the device#
The memory requirements of your model are too large to fit on the device. Please see below for model-specific workarounds:
Transformer models: please compile again with the batch size set to 1 to determine if the specified maximum sequence length is feasible. If the max sequence length is feasible, you can try a smaller batch size or enable gradient accumulation for supported models (all NLP models except T5) by setting the use_cs_grad_accum parameter to True in the runconfig section of your model’s yaml file.
Vision models (CNNs): try manually decreasing the batch size and/or the image/volume size.
For more information on using gradient accumulation while training in the Cerebras cluster, visit Train with gradient accumulation
Model is too large to fit on the device. This can happen because of a large batch size, large input tensor dimensions, or other network parameters. Please refer to the Troubleshooting section in the documentation for potential workarounds.