.. _cs-pytorch-pl-k8s: Pipeline K8s Workflow ===================== The Cerebras-recommended workflow for Pipeline models now uses Kubernetes (K8s) as the orchestrating software to manage resources and coordinate communication between the CS system and the other components within the Cerebras Wafer-Scale Cluster. This guide helps you get started with running pipelined execution on the cluster with the K8s workflow. .. Note:: We are in transition from Slurm-based workflow to Appliance model to run Pipeline models on our latest Wafer-Scale Cluster. During this transition, we provide similar scripts that map almost one-to-one to the Slurm scripts, with the eventual goal of moving entirely to the new appliance workflow. The Kubernetes-based workflow to run pipeline models require pre-requisites to set up appliance mode, which simplifies running models across the Wafer-Scale Cluster. The initial few steps will go over the appliance setup mostly viewed as one-time user setup. Perform the following steps to set up appliance mode as a prerequisite for running pipeline models in the K8s workflow: 1. Ensure that the admin setup is complete. Check with your Sysadmin. 2. Follow the first-time user setup procedure. Admin setup ----------- Your admin should have set up the following parameters: - Kubernetes is set up. - Cluster management is already running on the appliance and is ready to interact with the user node. - TLS certificate is generated, and location known. - Python 3.7 is available. - The path to the Cerebras packages is available. - Sysadmin has populated a ``.yaml`` file with the default distribution of resources to be used. First-time user setup --------------------- The first time you use this mode, you must set it up as shown below. .. Note:: Make sure that you have the TLS Certificate available from your sysadmin. You will need this to communicate between the user node and the Wafer-Scale Cluster. Your admin will have shared the path to this file during the setup. 1. Set up the Python virtual environment using Python 3.7. Create the environment named ``venv_appliance`` using the following command: .. code-block:: bash python3.7 -m venv venv_appliance There are three main sets of packages available. There is the ``cerebras_appliance`` software package, the ``cerebras_tensorflow`` package if you wish to use TensorFlow, and the ``cerebras_pytorch`` package if you wish to use PyTorch. However, to run models in Pipeline with K8s, you only need to install the ``cerebras_appliance`` software package. 2. Enter the following commands on the user node (make sure to execute the commands in this order to install the appliance wheel first): .. code-block:: bash source venv_appliance/bin/activate pip install /cerebras_appliance-___-py3-none-any.whl --find-links= pip install /cerebras_pytorch-___-py3-none-any.whl --find-links= In the wheel, there exists two scripts: ``csrun_cpu`` and ``csrun_wse``. These scripts serve the same function as the scripts previously available for Slurm. The ``csrun_cpu`` is for non-wafer scale engine jobs while the ``csrun_wse`` is for jobs that utilize the wafer scale engine. Run pipeline models on the CS system / Cerebras Wafer-Scale Cluster =================================================================== Clone the reference samples --------------------------- 1. Log in to your Wafer-Scale Cluster. 2. Activate the virtual environment (this exposes the commands used below). .. code-block:: bash source venv_appliance/bin/activate 3. Clone the reference samples repository to your preferred location in your home directory. .. code-block:: bash git clone https://github.com/Cerebras/modelzoo 4. Navigate to the model directory. .. code-block:: bash cd cerebras/modelzoo/fc_mnist/pytorch/ Compile on CPU -------------- Cerebras recommends that you first compile your model successfully on a CPU node from the cluster before running it on the CS system. - You can run in ``validate_only`` mode that runs a fast, light-weight verification. In this mode, the compilation only runs through the first few stages, up until kernel library matching. - After a successful ``validate_only`` run, you can run full compilation with ``compile_only`` mode. This section of the quick start shows how to execute these steps on a CPU node. .. Tip:: The ``validate_only`` step is very fast, enabling you to rapidly iterate on your model code. Without needing access to the CS system wafer scale engine, you can determine in this ``validate_only`` step if you are using any PyTorch layer or functionality that is unsupported by either XLA or CGC. Follow these steps to compile on a CPU (uses FC-MNIST example from the Model Zoo git repository): 1. Run the compilation in ``validate_only`` mode. .. code-block:: bash csrun_cpu --admin-defaults="/path/to/admin-defaults.yaml" --mount-dirs="/data/ml,/lab/ml" python run.py --mode train –-validate_only 2. Run the full compilation process in ``compile_only`` mode. This step runs the full compilation through all stages of the Cerebras software stack to generate a CS system executable. .. code-block:: bash csrun_cpu --admin-defaults="/path/to/admin-defaults.yaml" --mount-dirs="/data/ml,/lab/ml" python run.py --mode train –-compile_only When the above compilation is successful, the model is guaranteed to run on the CS system. You can also use ``validate-only`` mode to run pre-compilations of many different model configurations offline so you can more fully use the allotted CS system cluster time. .. Tip:: The compiler detects whether a binary already exists for a particular model config and skips compiling on the fly during training if it detects one. Train and evaulate on CPU/ GPU ------------------------------ To train on the CPU directly, since we are within a Python environment, you can directly call Python. Training on GPU may require generating a new virtual environment configured for your GPU hardware requirements. To setup the environment for GPU, refer to `these requirements `_. .. code-block:: bash python run.py --mode train --params=params.yaml python run.py --mode eval --params=params.yaml Run the model on the CS system ------------------------------ You can run training and eval on CPU as well without any code changes before running on the CS System. .. code-block:: bash csrun_wse --admin-defaults="/path/to/admin-defaults.yaml" --mount-dirs="/data/ml,/lab/ml" python run.py --mode=train --params=params.yaml The command above mounts the directories ``/data/ml`` and ``/lab/ml`` to the container (in addition to the default mount directories) and then trains the FC-MNIST model on the CS System available at the provided IP address ````. To run an eval job on the CS system, enter the following command: .. code-block:: bash csrun_wse --admin-defaults="/path/to/admin-defaults.yaml" --mount-dirs=”/data/ml,/lab/ml" python run.py --mode=eval –eval_steps=1000 You can view the exact options using ``csrun_wse –help``. Output files and artifacts -------------------------- The output files and artifacts include a model directory (``model_dir``) that contains all the results and artifacts of the latest run, including: - Compile directory (``cs_``) - ``performance.json`` file - Checkpoints - Tensorboard event files - ``yaml`` files Compile dir – The directory containing the ``cs_`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The compilation artifacts during and after compilation are stored in the ``/cs_`` directory. Compilation logs and intermediate outputs are helpful to debug compilations issues. The ``xla_service.log`` should contain information about the status of compilation, and whether it passed or failed. In case of failure, it should print an error message and stacktrace in ``xla_service.log``. ``performance.json`` file and its parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The performance directory should contain the ``performance.json /performance/performance.json``. This contains information as listed below: - ``compile_time`` - The amount of time that it took to compile the model to generate the Cerebras executable. - ``est_samples_per_sec`` - The estimated performance in terms of samples per second based on the Cerebras compile. Note that this number is theoretical and actual performance may vary. - ``programming_time`` - This is the time taken to prepare the system and load with the model that is compiled. - ``samples_per_sec`` - The actual performance of your run execution. - ``suspected_input_bottleneck`` - This is a beta feature. It indicates whether you are input-starved and need more input workers to feed the Cerebras system. - ``total_samples`` - The total gross samples that were iterated during the execution. - ``total_time`` - The total time it took to complete the total samples. Checkpoints ~~~~~~~~~~~ Checkpoints are stored in ``/checkpoint_*.mdl``. They are saved with the frequency specified in the ``runconfig`` file. Tensorboard event files ~~~~~~~~~~~~~~~~~~~~~~~ Tensorboard event files are stored in ``/train/`` directory. ``yaml`` files content after the run ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``yaml`` file is stored in the train directory. This ``yaml`` file contains information about the specifics of the run, such as model specific configuration (eg., ``dropout``, ``activation_fn``), optimizer type and optimizer parameters, input data configuration, such as ``batch_size``, and shuffle and run configuration, such as ``max_steps``, ``checkpoint_steps``, and ``num_epochs``.