Glossary#

Appliance – often used to reference a Wafer-Scale Cluster together with Cerebras software, which lets users interact with CS-2 system(s) and all supporting CPU nodes as with a single entity; an appliance.

Cerebras System, CS System – a network-attached accelerator, packaged into a 15RU system. Each CS system contains one Wafer Scale Engine (WSE) processor, and it brings power, cools, and delivers data to the WSE.

CPU cluster – each CS system is deployed together with a supporting CPU cluster. Depending on a type of installation (Wafer-Scale Cluster or Original Cerebras Installation), this supporting CPU cluster has different components. A CPU cluster runs Cerebras software and is responsible for interaction with CS system(s). ML users interact directly with one of the CPU nodes in the CPU cluster.

CS-2 – a second-generation CS system, which contains WSE-2, a second-generation Cerebras Wafer-Scale Engine.

MemoryX – a large-capacity off-chip memory service, used to store model weights, gradients, optimizer states, etc. with Weight Streaming execution on a Cerebras Wafer-Scale cluster.

Original Cerebras Installation – this installation is designed for a single CS-2 deployment and can only support models below 1B parameters with Pipelined execution. Consists of a CS-2 system and a CPU cluster with CPU nodes playing roles of a coordinator and input workers.

Pipelined Execution – an execution mode in which all the model weights are stored in on-chip SRAM memory for the whole duration of a job. This execution mode relies on model parallelism (both within each layer and layer-pipeline) to distribute a training job across all of the AI cores of a WSE. This mode is best for models below 1B parameters, which can fit into WSE on-chip SRAM. This mode doesn’t support distributed training across multiple CS-2 systems. This mode is supported on both Original Cerebras Installations and on Wafer-Scale Clusters.

SwarmX – broadcast/reduce fabric, used to connect MemoryX and CS-2 system(s) in Wafer-Scale clusters.

Wafer-Scale Cluster – this installation is designed to support large-scale models (up to and well beyond 1 billion parameters) and large-scale inputs. It can contain single or multiple CS-2 systems with ability to distribute jobs across all or a subset of CS-2 systems in the cluster. A supporting CPU cluster in this installation consists of MemoryX, SwarmX, management and input worker nodes. This installation supports both Pipelined execution for models below 1 billion parameters and Weight Streaming execution for models up to and above 1 billion parameters.

Weight Streaming Execution – an execution mode that allows for storing all the model weights externally and stream them into CS-2 systems in the cluster without suffering the traditional penalty associated with the off-chip memory. Weight streaming enables the training of models above 1 billion parameters on a single CS-2 system and with data parallel distribution across multiple CS-2 systems. Weight Streaming requires the Cerebras Wafer-Scale cluster installation, with MemoryX for single-CS-2 clusters and both MemoryX and SwarmX for multi-CS-2 clusters.

WSE, Wafer-Scale Engine – Cerebras revolutionary processor

WSE-2 – a second-generation WSE.

Pipelined Execution

Ways to port your model