Automatic batch exploration#

Overview#

Selecting batch size#

The effective training batch impacts the achievable performance of a model. If the model fits physically on the device memory, one can train with a given batch or train with gradient accumulation otherwise. However, there is a fundamental constraint around the effective batch_size, micro-batch-size, and several CSX devices where batch_size/num_csx should be a multiple of micro-batch-size. Users can define the micro-batch-size or delegate to the compiler to find the optimal micro-batch at the cost of increased compile time. In both cases, the effective batch the user provides impacts the achievable performance.

Therefore, finding the best-performing micro-batch-size, and consequently batch size, is crucial for getting the best training performance. However, manual batch exploration by the user, which involves compiling with different effective batches when gradient accumulation is enabled, can be significantly time-consuming.

The automatic batch exploration flow provides an effortless method to select the best performing micro-batch-size independent of the user-provided effective batch. As the flow explores the micro-batch vs. performance search space, it recommends interim performant micro-batch sizes as well as the confidence over optimality of the recommendation. This approach saves time and effort while ensuring optimal performance.

Procedure#

To enable automatic batch exploration, modify the YAML file for your model by setting the following parameters:

Set use_cs_grad_accum to “True”” in the runconfig section of the YAML file
Set the num_csx and batch_size parameters. These parameters are needed to guide the compiler stack as an initial data point, but their values do not impact the micro-batch-size recommended by the flow. This batch-size can be same as the default batch size defined in ModelZoo for the model.
Set micro_batch_size to “explore”” in the train_input or eval_input section of the YAML file

Expected Output#

The flow issues informative messages that provide recommended micro-batch-sizes as the exploration proceeds. These messages can be found in the run*.log file under local_compile_<train/eval>/model_dir/. Below is a sample of such messages:

 BEXP202 19:39:45 GMT  Current recommended micro_batch_size: 1, estimated performance: 1.00x base, confidence score: 22 (out of 100)
BEXP202 19:40:22 GMT  Current recommended micro_batch_size: 2, estimated performance: 1.20x base, confidence score: 45 (out of 100)
BEXP202 19:40:39 GMT  Current recommended micro_batch_size: 2, estimated performance: 1.20x base, confidence score: 67 (out of 100)
BEXP202 19:41:11 GMT  Current recommended micro_batch_size: 4, estimated performance: 1.23x base, confidence score: 90 (out of 100)

The confidence score will not reach 100% to account for slight mis-correlation with the actual hardware run. After selecting micro_batch_size, you can select the global batch size as micro_batch_size * num_csx. If you need a specific batch size due to hyper-parameter considerations, you can select a nearby value so long as the implicit per-box batch size, batch_size/num_csx, is evenly divisible by micro_batch_size.

Implementation notes#

The automatic batch exploration flow was first released in version 2.1.0. Although it can find the optimal micro-batch in most cases, there are some known sub-optimality issues that will be addressed in future releases.
Note that automatic batch exploration support is limited to LLM networks. A runtime error will be issued if you attempt to use automatic batch exploration on vision networks.
It should take about an hour to run this feature on GPT models . We will improve the runtime in following releases.

Train with gradient accumulation

Troubleshooting