Pytorch: Quick-Start Guide#

If you are new to Cerebras, begin with this quick-start guide. Before you get into in-depth development, follow this quick-start guide to familiarize yourself at a high level with the Cerebras system workflow.

Cerebras Wafer-Scale Clusters are composed of CS-2 systems, MemoryX and SwarmX nodes, input pre-processing servers and associated internal network switches. These clusters support two types of execution modes to enable ML models of different sizes:

  • Layer pipelined: In this mode, all the layers of the network are loaded together onto the Cerebras WSE. This mode is selected for neural network models that fit entirely on the WSE, approximately up to 1B parameters.

  • Weight streaming: In this mode, one layer of the neural network model is loaded at a time. This layer-by-layer mode is used to run extremely large models (>1B parameters).

Cerebras Wafer-Scale Clusters rely on Kubernetes internally to manage various resources.

The Cerebras PyTorch models currently only support Pipeline models. Support for Weight Streaming models are still WIP and will soon be made available.

To run models in Pipeline mode on Cerebras Wafer-Scale Clusters, Cerebras also leverages Kubernetes for internal resource allocation, which was used in the past releases, instead of Slurm. Pipelined jobs can be launched via similar scripts used in the past with Slurm-based workflow. To run your small- to medium-sized models on a Wafer-Scale Cluster in pipeline mode, follow steps provided in Pipeline K8s Workflow.

Note

If you like to run PL models only and have not yet upgraded to the Wafer-Scale Cluster, you can still use Slurm-based workflow on your Original Cerebras Support-Cluster. Note that this will not support weight streaming.

If you are ready to start developing / adapting your own PyTorch code for CS System

Skip to Workflow for PyTorch on CS for an in-depth development guide using PyTorch for Cerebras.