Model is too large to fit in memory#

Observed Error#

Model is too large to fit in memory. This can happen because of a large batch size, large input tensor dimensions, or other network parameters. Please refer to the Troubleshooting section in the documentation for potential workarounds

Causes and Possible Solutions#

The memory requirements of your model are too large to fit on the device. Potential workarounds include:

On transformer models, please compile again with the batch size set to 1 using one CS-2 system to determine if the specified maximum sequence length is feasible.
You can try a smaller batch size per device or enable batch tiling (only on transformer models) by setting the micro_batch_size parameter in the train_input or eval_input section of your model’s yaml file (see working_with_microbatches). * If you ran with batch tiling with a specific micro_batch_size value, you can try compiling with a decreased micro_batch_size. The Using “explore” to Search for a Near-Optimal Microbatch Size flow can recommend performant micro batch sizes that will fit in memory.
On CNN models where batch tiling isn’t supported, try manually decreasing the batch size and/or the image/volume size.

Note

For more information on working with batch tiling and selecting performant micro_batch_size values, visit working_with_microbatches

Note

The batch_size parameter set on the yaml configuration is the global batch size. This means that the batch size per CS-2 system is computed as the global batch size divided by the number of CS-2s used.

Out of memory errors and system resources

ModuleNotFoundError