Custom PT training script spawns multiple compile jobs#
Custom PyTorch training/evaluation script spawns multiple compile jobs (or custom PyTorch script recursively executing itself in infinite loop).
The main reason why this happens is that the Python script is not guarded with an if __name__ == “__main__” section. In various places during execution, subprocesses are spun off (e.g., weight transfer, creating surrogate jobs, etc.) which could lead to the whole module being executed.
Add an if __name__ == “__main__” to your Python script.