.. _csrun-cpu:

The ``csrun_cpu`` Script
========================

Use the ``csrun_cpu`` script to compile your model on a CPU node, before running the model on the CS system. See :ref:`validate-and-compile-on-cpu`.


.. important::

	Follow the :ref:`cs-command-line-pattern` to use the ``csrun_cpu`` Bash script correctly.


.. _fig-csrun-cpu-compile:
.. figure:: /images/cs-cpu-compile.png
    :align: center
    :width: 400 px

.. _config-csrun-cpu:

Configuring ``csrun_cpu``
-------------------------

Before you can use ``csrun_cpu``, the system administrator must configure this script with proper variables. Follow the below guidelines:

- The scripts ``csrun_cpu`` and the ``csrun_wse`` are used together in the Cerebras ML workflow. Hence, place these scripts in a commonly accessible location. These are Bash executable scripts, so you can place them where other executables are located.
- Ensure that all the scripts are executable (use ``chmod +x csrun_cpu``, for example).
- Ensure that the location of the scripts is included in the PATH variable.
- To confirm that you have the correct set up, run ``csrun_cpu python`` on a command line. This should launch a Python interpreter inside the Cerebras Singularity Container.
- Edit the ``csrun_cpu`` script and set the variables in it. See the following code section in the ``csrun_cpu`` script where these variables are located:

    .. code-block:: bash

        # All that needs to be set by system admins for different systems is here
        ########################################################################
        # sif image location
        SINGULARITY_IMAGE=

        # Comma seperated string of directories to mount.
        # ex: MOUNT_DIRS="/data/,/home/"
        # Note that the current directory is always mounted. So no need to add ${pwd}
        MOUNT_DIRS=

        # Default slurm cluster settings (must be set)
        DEF_NODES=
        DEF_TASKS_PER_NODE=
        DEF_CPUS_PER_TASK=

        #### More slurm configurations (recommended but not required) #####
        # The name of the GRES resource.
        GRES_RESOURCE=

        # The GRES node associated with the gres resource
        GRES_NODE=
        ########################################################################

    The values of these variables depend on your location of the SIF image, default Slurm configurations, the default directories to mount, and so on. These variable settings will be used by the Cerebras compiler when running jobs on the CS system. Consult Cerebras support if you need help setting the Slurm defaults.

.. important::
    Specifying `GRES_RESOURCE` and `GRES_NODE` avoids conflicts when scheduling CS-2 jobs using slurm. Consult with the system administrator and Cerebras support for the configuration of these environment variables.  


csrun_cpu
---------

.. code-block:: bash

    >csrun_cpu --help
    Usage: csrun_cpu [--help] [--alloc_node] [--mount_dirs] [--use-sbatch] command_to_execute
    ...
    ...
    ...

Description
~~~~~~~~~~~

Runs the given ``<command_to_execute>`` inside the Cerebras environment on a CPU node.

Arguments
~~~~~~~~~

- ``command_to_execute``: A user command, such as ``python run.py`` or ``bash``, that is executed inside the Cerebras container on a CPU node.

- ``--alloc_node``: (Optional) Set this to ``False`` if you do not wish to reserve the CPU node exclusively to execute <command_to_execute>. Default is ``True``.

- ``--mount_dirs``: (Optional) String of comma-seperated paths to mount in addition to the standard paths listed in ``csrun_cpu``. Default is an empty string, i.e., only paths listed in ``csrun_cpu`` are mounted.

- ``--use-sbatch``: (Optional) Adding this flag will submit a batch script to slurm to execute <command_to_execute>. sbatch will immediately exit after submitting the script. The script will stay on the slurm queue of pending jobs until resources are allocated. 

.. important::

	You must compile on a CPU node from within the Cerebras Singularity container. The ``csrun_cpu`` script ensures that your compile is run within the Cerebras Singularity container.


Examples
~~~~~~~~

.. code-block:: bash

    csrun_cpu --mount_dirs="/data/ml,/lab/ml" python run.py --mode=train --validate_only

- Mounts /data/ml and /lab/ml in addition to the default mount directories and then executes the command ``python run.py --mode=train --validate_only``, which runs validation inside the Cerebras container on a CPU node.


.. code-block:: bash

    csrun_cpu --alloc_node=True --use-sbatch python run.py --mode=train --compile_only

- Submits a sbatch job to slurm that reserves the whole CPU node and executes the command ``python run.py --mode=train --compile_only``, which runs the compilation inside the Cerebras container on the reserved CPU node.

.. code-block:: bash

    csrun_cpu python

- Launches a Python interpreter inside the Cerebras container on a CPU node.


.. _validate-only:

Validate only
-------------

With ``validate_only`` mode, the CGC will run in a lightweight verification mode. In this mode, the CGC will only run through the first few stages of the compilation stack, up through kernel matching.

This step is very fast and will allow you to quickly iterate on your model code. It enables you to determine if you are using any functionality that is unsupported by either XLA or the Cerebras stack. Also see :ref:`benefits`.

Here is an example command:

.. code-block:: bash

    csrun_cpu --mount_dirs="/data/ml,/lab/ml" python run.py --mode=train --validate_only

- The above command mounts ``/data/ml`` and ``/lab/ml`` directories, in addition to the default mount directories, and then executes the Python command: ``python run.py --mode=train --validate_only``. The Python command validates whether your training graph is supported by the Cerebras software. This ``csrun_cpu`` command will automatically spin up the Cerebras container on a CPU node to run this step.

A successful run in this mode validates the following:

- Your model code is fully CS-compatible.
- Your model correctly translates through XLA, and
- Your model is supported by the available Cerebras kernels.

.. note::

    A successful run in the ``validate_only`` mode does not mean that your model is guaranteed to compile. Compilation may still fail in lower-level stages of the Cerebras stack. However, any errors you reach beyond this stage are issues not with your model, but with the Cerebras software stack and should be reported to the Cerebras Support Team.

.. _compile-only:

Compile only
------------

With ``compile_only`` mode, the CGC will perform full compilation through all stages of the Cerebras Software Stack and generates a CS system executable. Note that it will not run the executable on the CS system in this mode. However, when the ``compile_only`` mode is successful, your model is likely to run on the CS system. Also see :ref:`benefits`.

Here is an example command:

.. code-block:: bash

    csrun_cpu --alloc_node=True python run.py --mode=train --compile_only

- The above command reserves the whole CPU node and executes the Python command: ``python run.py --mode=train --compile_only``. The Python command compiles a mapping of the training graph on the reserved CPU node. 

You can save time in your workflow by using the compiled artifacts from the ``compile_only`` session in your subsequent execution of this network on the CS system. This allows you to skip the compile step on the CS system, thereby saving you time in your workflow. Also see :ref:`benefits`.

.. note::

	Note that you must use ``csrun_wse`` script to run on the CS system.

.. _hw-requirements-compile-only:

Hardware resource recommendations
---------------------------------

When using the ``compile_only`` option to compile models within the Cerebras environment, we recommend 64GB of memory and at least 8 cores as a minimum requirement. Make sure that these resources are dedicated to the compile and are not shared.

For example, when you run from within a Cerebras Singularity container, you compile a model with the ``compile_only`` option, as shown below:

.. code-block:: bash

    csrun_cpu python run.py --mode=train --compile_only

then, if the hardware resources are less than the above minimum, the compile may fail with the following error:

.. code-block:: bash

      cerebras.cigar.stack.CerebrasStackError: [Cerebras Internal Error (source "plangen")]
      Compilation internal error at stage plangen.
      [...]
      terminate called after throwing an instance of 'std::bad_alloc'
      what():  std::bad_alloc

Validate and compile outside the CS system cluster
--------------------------------------------------

To validate and compile from outside the Cerebras cluster, do not use Slurm to invoke the standard Singularity container. Instead, directly launch the Singularity container interactive shell with the proper path to the Cerebras Singularity image:

.. code-block:: bash

  singularity shell --cleanenv -B {data folders to attach} {path/to/singularity}/cbcore-[version-number].sif

  # Full compile
  csrun_cpu python run.py --mode=train --compile_only

  # Validation only
  csrun_cpu python run.py --mode=train --validate_only

.. important::

	 We recommend that you initially iterate on your model with ``validate_only`` option for the ``run.py``.

.. _compile-only:

Sbatch mode
------------

The default behavior of ``csrun_cpu`` uses ``srun``. With ``srun``, slurm will allocate resources and ``csrun_cpu`` will exit once the slurm job is finished. By using the flag ``--use-sbatch``, ``csrun_cpu`` submits to slurm a batch script to execute the command ``<command_to_execute>`` using ``sbatch``. ``sbatch`` will immediately exit after submitting the script. The script will stay on the slurm queue of pending jobs until resources are allocated. 

The command use will be stored as the file ``CS_<date>.log`` and the standard output and standard error will be stored as ``CS_<date>_<slurm_job_id>.out``.

If a CS-2 dedicaded CPU node is specified using ``GRES_NODE``, then ``csrun_cpu`` will avoid using this node for compilation tasks.