.. _weight-streaming-quickstart: Weight Streaming Quickstart =========================== Weight streaming (WS) is one of the Cerebras' execution modes ideal to train extreme scale models. To learn more about the differences between pipelined mode and WS mode, visit :ref:`cerebras-execution-modes`. This page provides details on below two aspects: 1. Changing execution mode (only system admins) 2. Training a model in WS mode (all users) Section 1: Changing execution mode (only system admins) ------------------------------------------------------- This step can be performed by System administrators only. If you want the execution mode to be changed to Weight Streaming, contact your system admin. A system administrator who has access can log in the CS-2 system and change the execution mode. To change the execution mode, follow these steps: **Step 1**: Log in into the CS system. **Step 2**: Check current ``execmode`` using the command below: .. code-block:: bash cs> config execmode show The following message appears: .. code-block:: bash Configured Execution Mode : PIPELINED **Step 3**: To change ``execmode``, transition the system to ``STANDBY`` state using the command below. .. code-block:: bash cs> system standby The following message appears: .. code-block:: This puts the system in standby. Do you want to proceed? Select ``yes``. **Step 4**: Change ``execmode`` to weight streaming using the command below. .. code-block:: bash cs> config execmode setup Select ``Weight Streaming``. The following message appears: .. code-block:: bash Selected execution mode configuration: ✔ Weight Streaming **Step 5**: Activate the system. .. code-block:: bash cs> system activate The system reboots and is now activated in Weight Streaming mode. Section 2: Training a model in WS mode (all users) -------------------------------------------------- After the system admin updates the system to WS mode, you can now follow the below steps to run a training job on single CS-2. In CSoft R1.4.0, TF implementations of GPT-2, GPT-3XL (1.3B params) and GPT-J (6B params) are supported in WS mode on a single CS-2 with existing support cluster via compatibility mode. You can access a reference implementation of GPT-J in TF as an example in the `Cerebras Reference Implementations `_ repo. **Step 1**: Clone Reference Implementations Repository To clone the Cerebras Reference Implementations repository, use the following commands: .. code-block:: bash git clone https://github.com/Cerebras/cerebras_reference_implementations.git .. code-block:: bash cd cerebras_reference_implementations/gptj **Step 2**: Run the model in a CS system Here, we use the wrapper script csrun_wse command to compile and execute the code on CS-2 system. See :ref:`csrun-wse` for more information. .. code-block:: csrun_wse --total-nodes 14 --tasks-per-node 8 --cpus-per-task 16 --single-task-nodes 2 --cyclic python-ws run.py --model_dir model_dir --cs_ip --params configs/params_continuous_pretraining.yaml --mode train --max_steps The above command trains the GPT-J model for ``--max_steps`` by executing on the CS system at the IP address specified in the ``--cs_ip`` flag. Note that for Weight Streaming, you will use ``python-ws``. On Weight streaming execution at least two single task cpu nodes are required. This is specified using ``--single-task-nodes 2``. When the command executes, you will see an output similar to shown below: .. code-block:: srun: job ... queued and waiting for resources srun: job ... has been allocated resources INFO:tensorflow:Checkpoints and summaries will be saved in: model_dir ... INFO:tensorflow:Running the TF Client INFO:tensorflow:Calling model_fn. ... INFO:tensorflow:Done calling model_fn. ... INFO:tensorflow:Completed weight initialization on CPU in: ... seconds ... INFO:tensorflow:Calling model_fn. ... INFO:tensorflow:Loading CPU pre-initialized weights took ... seconds INFO:tensorflow:Saving checkpoint at global step 0 ... ...: I tensorflow/compiler/jit/xla_compilation_cache.cc:241] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. INFO:tensorflow:global_step = 1, loss = ... INFO:tensorflow:global_step = 2, loss = ... .. INFO:tensorflow:global_step = 10, loss = ... ... ... INFO:tensorflow:Saving checkpoint at global step .. INFO:tensorflow:Training finished with ... samples in ... seconds, ... samples/second. INFO:tensorflow:Loss for final step: ... .. note:: The compilation time for large-scale models in WS mode typically takes a very long time and can run over an hour. Reducing compile time is an active effort at Cerebras. .. note:: Offline compilation is not supported in WS mode in CSoft R1.4. This includes no support for ``--validation-only``, ``--compile-only`` flags, and with every execution, the model is recompiled. Output files and artifacts -------------------------- The output files and artifacts include: * Model directory (``model_dir``) - Contains all of the results and artifacts of the latest run, including: + Compile directory (``tfcs_``) + ``performance.json`` file + Checkpoints + Tensorboard event files + ``yaml`` files Model directory and its structure ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The model directory (``model_dir``) contains all of the results and artifacts of the latest run. If you go into the ``model_dir`` directory, the following subdirectories are present. Compile dir - The directory containing the ``tfcs_`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The compilation artifacts during and after compilation are stored in ``/tfcs_`` directory. Compilation logs and intermediate outputs are helpful to debug compilations issues. ``Performance.json`` file and its parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is a performance directory that should contain the ``performance.json /performance/performance.json``. This contains information as listed below: * ``compile_time`` - The amount of time that it took to compile the model to generate the Cerebras executable. * ``est_samples_per_sec`` - The estimated performance in terms of samples per second based on the Cerebras compile. Note that this number is theoretical and actual performance may vary. * ``programming_time`` - This is the time taken to prepare the system and load with the model that is compiled. * ``samples_per_sec`` - The actual performance of your run execution. * ``suspected_input_bottleneck`` - This is a beta feature. It indicates whether you are input-starved and need more input workers to feed the Cerebras system. * ``total_samples`` - The total gross samples that were iterated during the execution. * ``total_time`` - The total time it took to complete the total samples. Checkpoints ~~~~~~~~~~~ Checkpoints are stored in ``/model-ckpt*``. Tensorboard event files ~~~~~~~~~~~~~~~~~~~~~~~ Tensorboard event files are stored in ```` directory. ``yaml`` files content after the run ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``yaml`` file is stored in train directory. This yaml file contains information about the specifics of the run, such as model specific configuration (eg. ``dropout``, ``activation_fn``), optimizer type and optimizer parameters, input data configuration, such as ``batch_size``, and shuffle and run configuration, such as ``max_steps``, ``checkpoint_steps``, and ``num_epochs``.