Cerebras Wafer-Scale Cluster#

The Cerebras Wafer-Scale cluster is designed to train neural networks with near-perfect linear scaling across millions of cores, without the inconvenience of distributed computing. The cluster has these components:

One or more CS-2 systems, powered by the Wafer-Scale Engine (WSE). The CS-2s run the core training and inference computations within a neural network. Each rack-mounted CS-2 contains one WSE. The system powers, cools, and delivers data to the WSE. For more information, visit WSE-2 datasheet, virtual tour of the CS-2, and the CS-2 white paper
MemoryX technology that is used to store and intelligently stream a model’s weights to the CS-2 systems
SwarmX technology that integrates multiple CS-2s into one Cerebras cluster to work together for training a single model. SwarmX broadcasts weights (from MemoryX to the cluster) and reduces (sums) gradients (in the other direction)
Input pre-processing servers provide the needed data preprocessing to training samples before they can be sent to the CS-2 systems for training, inference, and evaluation
Management servers orchestrate and schedule the resources of the Cerebras cluster

You will develop code and submit training/evaluation jobs from a user node. A user node is a CPU node, not part of the cluster, that is connected to the Cerebras Cluster through the management server, as shown in Fig. 1. All scheduling of resources is done in the management server, and you therefore only need to specify how many CS-2 systems you want to use for training or evaluation.

../../_images/topology-of-weight-streaming-on-wsc.png — Fig. 1 Topology of Cerebras Wafer-Scale cluster#

Important

For documentation related with installation and administration of Cerebras Wafer-Scale Cluster, visit Cerebras deployment documentation.

Cerebras job scheduling and monitoring

Weight Streaming and Pipelined Execution