This script uses user provided SP GPT-2/3 yaml config to calculate and then generate a correspoding muP yaml config.

This script requires arguments for the base model for which you tuned the hyperparameters: lr_base, std_base and m_embed 1) (required) input SP yaml config 2) (optional) base model hidden dimension 3) (optional) base learning rate. Ensure that the config has two sequential Linear schedulers. First lr scheduler should be the linear warm-up and second scheduler should do linear decay. 4) (optional) base initialization standard deviation 5) (optional) embedding output multiplier 6) (optional) Output path to store the muP yaml config

The default values for the optional arguments base_lr, base_init_std and m_embed are set with the “Empirically Tuned Values” in the Cerebras-GPT paper: https://arxiv.org/abs/2304.03208. Also, the default base_layer_width is set to 256 as used in this paper.

Example usage: python convert_config_to_mup.py -i <path/to/yaml>/params_gpt3_tiny.yaml -d_base 256 -lr_base 6.e-3 -std_base 0.08 -m_base 10.