Kernel autogeneration with AutoGen#

Overview#

To ensure optimal model performance on Cerebras hardware, we leverage a combination of pre-written kernels and automatically generated kernels, known as AutoGen kernels.

Handwritten kernels form a core library, covering many common operations. However, certain scenarios may require AutoGen kernels:

Missing Operations: When a model requires operations not available in the handwritten library, AutoGen kernels fill the gap.
Performance Optimization: Even for operations with handwritten implementations, AutoGen can often create specialized, fused kernels that outperform them. This is particularly beneficial for large compound operations like losses.

The Distributed Task Generator (DTG) seamlessly handles AutoGen kernel creation, ensuring efficient execution on Cerebras hardware.

Cerebras automatically optimizes your model by crafting custom kernels on the fly, both for missing operations and CPU-bound ones, achieving a balanced “medium” performance mode by default.

Note

In Release 2.1.1, the automatic kernel generation feature (autogen) is temporarily unavailable. Make sure that in your runconfig file, you set autogen_policy:disabled if you were previously using it. If your model previously required autogen and is now no longer compiling, reach out to the Cerebras Support Team for assistance.

How to enable AutoGen#

Activate AutoGen by setting the autogen_policy flag to “medium” or “aggressive” within the “runconfig” section of the model’s parameters YAML file. Explore YAML details further at Cerebras Model Zoo YAML parameters.

runconfig:
  ...
  autogen_policy: "medium"
  ...

The autogen_policy flag can be one of the following:

disabled : no AutoGen ops
default, medium: try to autogenerate kernels on the wafer instead of executing them on the CPU host.
aggressive: autogenerate some kernels even if existing handwritten wafer kernels exist. This is primarily for debugging purposes.

Note

For most cases, use AutoGen with medium autogen_policy.

Usage examples#

Autogenerate kernels for non-loss operations#

To modify a base model within the Cerebras Model Zoo, for example, to tailor GPT-3 to your needs, swap Gelu with LeakyRelu by making two key adjustments in the parameters YAML file:

1. Update Nonlinearity: Change the nonlinearity setting to “LeakyRelu” within the model configuration section.

2. Enable AutoGen: In the runconfig section, set “autogen_policy” to “medium” to ensure smooth generation of the necessary LeakyRelu kernel.

model:
  nonlinearity: "leaky_relu"
  ...
runconfig:
  autogen_policy: "medium"
  ...

Autogenerate fused kernels for loss operations#

AutoGen improves the performance of PyTorch losses by creating autogenerated fused graphs for losses that were previously covered by primitive kernels.

Note

AutoGen is an experimental feature and may result in unexpected compilation failures, even for the list of supported losses.

To implement a PyTorch loss using AutoGen:

Import the PyTorch loss from our Model Zoo
Set use_autogen=True The default value of use_autogen is False.

from modelzoo.common.pytorch.layers import BCELoss

loss = BCELoss(reduction='mean', use_autogen=True)

List of supported losses:

BCELoss
CrossEntropyLoss
Loss.BCEWithLogitsLoss
Loss.GaussianNLLLoss
Loss.HingeEmbeddingLoss
Loss.HuberLoss
Loss.KLDivLoss
Loss.L1Loss
Loss.MarginRankingLoss
Loss.MSELoss
Loss.MultiLabelSoftMarginLoss
Loss.MultiMarginLoss
Loss.NLLLoss
Loss.PoissonNLLLoss
Loss.SmoothL1Loss
Loss.TripletMarginLoss
Loss.TripletMarginWithDistanceLoss

Unsupported losses:

Loss.CosineEmbeddingLoss (Will compile to primitive kernels and performance will be slower)*

Note

To ensure optimal performance gains from AutoGen, verify that both autogen_policy is enabled in the parameters YAML file and use_autogen is set to True for the specific loss. Disabling use_autogen for a loss will revert to a combination of primitive operations, potentially sacrificing performance benefits.

Autogenerate kernels for customized losses#

Creating custom losses may result in compilation failure due to a graph mismatch. If this occurs, enable AutoGen for the customized loss by adding the AutoGen wrapper @autogen_loss as a decorator for the loss class. Once the custom loss is defined, follow the steps in Autogenerate fused kernels for loss operations to enable the generation of fused kernels.

from modelzoo.common.pytorch.layers.utils import autogen_loss

@autogen_loss
class CustomLoss(nn.Module):
    def __init__(self, ...):

Implementation notes#

Release 2.1.1#

Release 1.9.1#

In Release 1.9.1, we enabled the following AutoGen capabilities

Support for PyTorch operations (e.g., nonlinearities)

Improving the performance of PyTorch losses through fused kernels

Support for user-defined loss functions

Model Development

Define environment variables for input workers