.. _cbtorch-limitations:

Limitations of PyTorch on Cerebras
==================================

Floating Point Precision
------------------------

Only mixed precision is supported on the Cerebras system. This means that the
weights are stored as ``float32`` but the computations happens using
``float16``. Casts are automatically inserted and becomes unnecessary to
insert them manually.

This is due to the architecture of the system itself and thus, there are no
plans to support other precision modes at this time.


Static Graphs
-------------

As of the 1.7.0 software release, it is not permitted to reprogram the fabric after initial
programming. This means that multiple compiles are not supported and therefore the
PyTorch compute graph must not change between iterations.

This means that there are a number of caveats as to how the training loop is
allowed to be constructed, all of which are already addressed in our custom
PyTorch runner classses. Refer to our implementations of the various
hooks mentioned in :ref:`pytorch-custom-runner`.


Modes
^^^^^

Only ``mode = "train"`` and ``mode = "eval"`` are supported in this release.
``mode = "train_and_eval"`` is not supported due to the fact that it is not
permitted to reprogram the fabric after initial programming. So, the fabric
cannot be reprogrammed for ``eval`` mode after having been programmed for
``train`` in the same run.

.. note::

   Even when Cerebras supports reprogramming the fabric
   after initial programming in the future, it will still be ideal to avoid recompiling if
   possible, as recompiling and reprogramming the fabric can take a very long
   time.


Learning Rate Scheduler
-----------------------

Currently, we do not support the typical PyTorch learning rate scheduler
paradigm. A typical PyTorch learning scheduler would compute a learning rate
scalar and set the values of the learning rates in the optimizer parameter
groups. However, due to current limitations of the system requiring static
graphs, we cannot support this behaviour.

The supported PyTorch learning rate schedulers are listed on this page: :ref:`supported-pt-learning-rate-schedulers`. 

There are two different ways of scheduling a learning rate, depending on
whether you are running in pipeline mode or in weight streaming mode.

Pipeline mode
^^^^^^^^^^^^^

We provide custom LR scheduler classes which pass in the scheduler information
directly to the system's communication manager so that the scheduler can be
programmed into the fabric.

See :code:`modelzoo.common.pytorch.optim.lr_scheduler` for more details.

Weight Streaming mode
^^^^^^^^^^^^^^^^^^^^^

For weight streaming mode, we must specify the entire learning rate schedule
as a function of the global step. This means that the learning rate becomes less
of a scalar value and more of a tensor that depends on the value of the global
step. See :code:`modelzoo.common.pytorch.optim.lr_scheduler` for examples of this.

This does also mean that any optimizers being used need to be written in a way
such that the learning rate is not treated as a scalar value, but rather as a
tensor. See :code:`modelzoo.common.pytorch.optim.AdamBase` for an example of
this.


Eval metrics
------------

Eval metrics are only allowed to return a single result value.
Therefore, only the final metric value should be returned. No intermediate state or
values can be retrieved at this time.