Kernel autogeneration with AutoGen#
Overview#
To ensure optimal model performance on Cerebras hardware, we leverage a combination of pre-written kernels and automatically generated kernels, known as AutoGen kernels.
Handwritten kernels form a core library, covering many common operations. However, certain scenarios may require AutoGen kernels:
Missing Operations: When a model requires operations not available in the handwritten library, AutoGen kernels fill the gap.
Performance Optimization: Even for operations with handwritten implementations, AutoGen can often create specialized, fused kernels that outperform them. This is particularly beneficial for large compound operations like losses.
The Distributed Task Generator (DTG) seamlessly handles AutoGen kernel creation, ensuring efficient execution on Cerebras hardware.
Cerebras automatically optimizes your model by crafting custom kernels on the fly, both for missing operations and CPU-bound ones, achieving a balanced “medium” performance mode by default.
How to enable AutoGen#
Activate AutoGen by setting the autogen_policy
flag to “medium” or “aggressive” within the “runconfig” section of the model’s parameters YAML file. Explore YAML details further at Cerebras Model Zoo YAML parameters.
runconfig:
...
autogen_policy: "medium"
...
The autogen_policy
flag can be one of the following:
disabled
: no AutoGen opsdefault
,medium
: try to autogenerate kernels on the wafer instead of executing them on the CPU host.aggressive
: autogenerate some kernels even if existing handwritten wafer kernels exist. This is primarily for debugging purposes.
Note
For most cases, use AutoGen with medium
autogen_policy.
Usage examples#
Autogenerate kernels for non-loss operations#
To modify a base model within the Cerebras Model Zoo, for example, to tailor GPT-3 to your needs, swap Gelu with LeakyRelu by making two key adjustments in the parameters YAML file:
1. Update Nonlinearity: Change the nonlinearity setting to “LeakyRelu” within the model configuration section.
2. Enable AutoGen: In the runconfig section, set “autogen_policy” to “medium” to ensure smooth generation of the necessary LeakyRelu kernel.
model:
nonlinearity: "leaky_relu"
...
runconfig:
autogen_policy: "medium"
...
Autogenerate fused kernels for loss operations#
AutoGen improves the performance of PyTorch losses by creating autogenerated fused graphs for losses that were previously covered by primitive kernels.
Note
AutoGen is an experimental feature and may result in unexpected compilation failures, even for the list of supported losses.
To implement a PyTorch loss using AutoGen:
Import the PyTorch loss from our Model Zoo
Set
use_autogen=True
The default value ofuse_autogen
isFalse
.
from modelzoo.common.pytorch.layers import BCELoss
loss = BCELoss(reduction='mean', use_autogen=True)
List of supported losses:
BCELoss
CrossEntropyLoss
Loss.BCEWithLogitsLoss
Loss.GaussianNLLLoss
Loss.HingeEmbeddingLoss
Loss.HuberLoss
Loss.KLDivLoss
Loss.L1Loss
Loss.MarginRankingLoss
Loss.MSELoss
Loss.MultiLabelSoftMarginLoss
Loss.MultiMarginLoss
Loss.NLLLoss
Loss.PoissonNLLLoss
Loss.SmoothL1Loss
Loss.TripletMarginLoss
Loss.TripletMarginWithDistanceLoss
Unsupported losses:
Loss.CosineEmbeddingLoss (Will compile to primitive kernels and performance will be slower)*
Note
To ensure optimal performance gains from AutoGen, verify that both autogen_policy
is enabled in the parameters YAML file and use_autogen
is set to True for the specific loss. Disabling use_autogen
for a loss will revert to a combination of primitive operations, potentially sacrificing performance benefits.
Autogenerate kernels for customized losses#
Creating custom losses may result in compilation failure due to a graph mismatch. If this occurs, enable AutoGen for the customized loss by adding the AutoGen wrapper @autogen_loss
as a decorator for the loss class. Once the custom loss is defined, follow the steps in Autogenerate fused kernels for loss operations to enable the generation of fused kernels.
from modelzoo.common.pytorch.layers.utils import autogen_loss
@autogen_loss
class CustomLoss(nn.Module):
def __init__(self, ...):
Implementation notes#
Release 1.9.1#
In Release 1.9.1, we enabled the following AutoGen capabilities
Support for PyTorch operations (e.g., nonlinearities)
Improving the performance of PyTorch losses through fused kernels
Support for user-defined loss functions