The cerebras.framework.torch interface

This section describes the methods of the cerebras.framework.torch module you will use to convert your PyTorch code into Cerebras workflow.

cerebras.framework.torch.initialize()

Initializes the Cerebras system and prepares the system for the runtime. You can configure the runtime system by passing the configuration variables while calling this method.

Important

The cerebras.framework.torch.initialize() must be called first, before any PyTorch code, in your program.

Usage

import cerebras.framework.torch as cbtorch
# Initialize with default values
cbtorch.initialize(cs_ip="12.0.0.0")
# Initialize with configuration variables
cbtorch.initialize(cs_ip="12.0.0.0", autostart_timeout=35, shutdown_timeout=25)
# Initialize for compiling the model only (no system required)
cbtorch.initialize(compile_only=True, autostart_timeout=35, shutdown_timeout=25)
# Initialize with service workdir to configure streamer workers
cbtorch.initialize(cs_ip="12.0.0.0", service_workdir="./service", autostart_timeout=35, shutdown_timeout=25)
# Initialize with cbfloat16 enabled
cbtorch.initialize(cs_ip="12.0.0.0", use_cbfloat16=True, autostart_timeout=35, shutdown_timeout=25)

Parameters

cs_ip

  • The IP address of the Cerebras system.

  • If not provided, assumes that a CPU workflow flow and configures accordingly.

compile_only

  • If True, configures the Cerebras backend for a compile-only flow. No Cerebras system is required and no execution will occur.

service_workdir

  • Optional. A string specifying the path to the service working directory that the streamer workers will use. Default: ./cerebras_wse.

use_cbfloat16

  • If True, configures the run to use the Cerebras cbfloat16 data type. The cbfloat16 is a Cerebras-specific 16-bit floating point type that is optimized for performance.

autostart_timeout

  • Optional. An integer specifying the time in seconds that the process should wait for the auto-started services to complete. Default: 30, indicating 30 seconds. You may need to increase this depending on the host machine’s CPU and networking performance.

shutdown_timeout

  • Optional. An integer specifying the time in seconds that the process should wait for the services to shutdown before raising a time out error. Default: 10, indicating 10 seconds. You may need to increase this depending on the host machine’s CPU and networking performance.

cerebras.framework.torch.module()

Wrap your PyTorch Model object of the class torch.nn.Module with this method as cerebras.framework.torch.module(Model). This will enable the Model object to be executed on the Cerebras system and on any device type specified in the device_type parameter of the cerebras.framework.torch.initialize() method.

Usage

See the following example. This will create the object with the name model, of the type cbtorch, of your PyTorch model MNIST. This model object is now ready to be loaded onto the Cerebras system:

import cerebras.framework.torch as cbtorch

...

class MNIST(nn.Module):
    def __init__(self):
        super(MNIST, self).__init__()
        ...
        ...
def main():
    # Initialize Cerebras backend and configure for run
    cbtorch.initialize(cs_ip=args.cs_ip)

    # prepare to move the model and dataloader onto the Cerebras engine
    model = cbtorch.module(MNIST())
    ...
    ...

Parameters

model

  • Required. The name of the PyTorch Model, of the class torch.nn.Module, that you are targeting to run on the Cerebras system, and in general on any device_type device you specify.

cerebras.framework.torch.dataloader()

Wrap the torch.util.data.DataLoader object in your PyTorch code with cerebras.framework.torch.dataloader. This will enable the DataLoader object to be executed on the Cerebras system and on any device type specified in the device_type parameter of the cerebras.framework.torch.initialize() method.

Usage

See the following example, line 11. This will create the object with the name train_dataloader, of the type cbtorch by simply wrapping the object returned by the get_train_dataloader() method into cbtorch type. The resulting train_dataloader object is now compatible with the Cerebras system.

import cerebras.framework.torch as cbtorch

...

def main():
    # Initialize Cerebras backend and configure for run
    cbtorch.initialize(cs_ip=args.cs_ip)

    # prepare to move the model and dataloader onto the Cerebras engine
    model = cbtorch.module(MNIST())
    train_dataloader = cbtorch.dataloader(get_train_dataloader())

    ...

def get_train_dataloader():
    batch_size = 64

    ...

    train_loader = torch.utils.data.DataLoader(
        train_dataset,
        batch_size=batch_size,
        sampler=None,
        drop_last=True,
        shuffle=True,
        num_workers=0,
    )
    return train_loader

Parameters

loader

  • Required. The name of the PyTorch DataLoader object, of the class torch.util.data.DataLoader that you are targeting to use to stream data to the Cerebras system, and in general to any device_type device you specify.

cerebras.framework.torch.Session()

A context manager that enables running PyTorch code on a Cerebras system.

Usage

See the following example:

import cerebras.framework.torch as cbtorch

...
train_dataloader = cbtorch.dataloader(get_train_dataloader())
...

with cbtorch.Session(train_dataloader, mode="train") as session:
        for epoch in range(num_epochs):
            cm.master_print(f"Epoch {epoch} train begin")

            for step, batch in enumerate(train_dataloader):
            ...

Parameters

dataloader

  • Required. The dataloader being used to generate the data to be sent to the Cerebras system.

mode

  • Specifies the type of run. Only train and eval are supported for this release.