.. _using-cerebrasestimator:

Using the CerebrasEstimator
===========================

The ``CerebrasEstimator`` is a critical part of your main Python program when running on the CS system. It is the ``CerebrasEstimator`` that launches the Cerebras Graph Compiler (CGC) when its methods such as ``compile``, or ``train`` are called while providing the IP address of the CS system with ``cs_ip``. See also :ref:`interface-cerebras-estimator`.

In this section, an example ``run.py`` template is used to show how the ``CerebrasEstimator`` interacts with the key code segments of your Python program.

.. note::

	For a detailed description of the example ``run.py`` template, see :ref:`run-py-template`.

Shown below is a highly simplified ``run.py`` example code that is used for neural network training:

.. code-block:: python
   :linenos:

    # Example run.py script for neural network training
    from cerebras.models.common.estimator.tf.cs_estimator import CerebrasEstimator
    from cerebras.models.common.estimator.tf.run_config import CSRunConfig
    from cerebras.tf.cs_slurm_cluster_resolver import CSSlurmClusterResolver

    def model_fn(features, labels, mode, params):

        ...

        return spec

    def input_fn(params):

        ...

        return dataset

    config = CSRunConfig(
        cs_ip=ip,
        save_checkpoints_steps=1000,
        log_step_count_steps=10000,
        "use_cbfloat16": True )
    params ={
        "batch_size":32,
        "lr":0.1,
        "use_cbfloat16": True
    }

    est = CerebrasEstimator(
        model_fn,
        config=config,
        params=params,
        model_dir='./out',
        use_cs=True
    )

    est.train(input_fn, steps=100000)


Calling the CerebrasEstimator
-----------------------------

In the ``est=CerebrasEstimator(...)`` call (line 29), the ``model_fn`` argument is a callback function. When the ``CerebrasEstimator`` receives this argument, the ``CerebrasEstimator`` API waits until one of its methods, ``train``, is invoked.

   .. note::

      The ``model_fn`` argument to the ``CerebrasEstimator`` interface is passed without the ``()``.


Callback input function
-----------------------

1. The ``est.train (input_fn, steps=100000)`` (line 37) is a ``train`` method call to the ``CerebrasEstimator`` with ``input_fn`` argument as a callback function. The ``CerebrasEstimator`` then calls the ``input_fn`` with the ``params`` argument.

   .. note::

      The ``input_fn`` argument to the ``train`` method is passed without the ``()``.

   Both the ``CerebrasEstimator`` and TensorFlow Estimator API expect the input function to:

   - Accept a standard group of input parameters with the argument ``params`` and
   - Returns a ``tf.data.Dataset`` that yields tensor pairs in the predefined format: tensor with features and tensor with labeles.

2. Any ``params`` passed to the ``CerebrasEstimator`` are passed on to the ``input_fn`` and to the ``model_fn``. when the ``CerebrasEstimator`` calls the ``input_fn``.

   The ``input_fn`` should return a ``tf.data.Dataset`` (see `Dataset API <https://www.tensorflow.org/api_docs/python/tf/data/Dataset>`__ for
   documentation).

3. The input function builds the input pipeline and yields the batched data in the form of ``(features, labels)`` pairs, where:

   - ``features`` can be a tensor or dictionary of tensors, and
   - ``labels`` can be a tensor, a dictionary of tensors or None.


Example
~~~~~~~

.. code-block:: python

    def input_fn(params):
        ...
        ds = ds.shuffle(buffer_size)
        ds = ds.repeat()
        ds = ds.batch(batch_size, drop_remainder=True)
        ds = ds.prefetch(buffer_size)
        return ds


Callback model function
-----------------------

The model function ``model_fn`` is used to generate the graph for your neural network model.

4. The ``features`` and ``labels``, the two arguments returned from the ``input_fn``, are the handles to the batched data that your model will use. When these two arguments, ``features`` and ``labels``, are returned from the ``input_fn``, the ``CerebrasEstimator`` will then call the ``model_fn`` by passing the following arguments to the ``model_fn``:

   - The ``mode`` argument that indicates whether the caller is requesting training.
   - The ``params`` object that was passed in the ``est=CerebrasEstimator(...)`` call.

   .. important::

     The functions ``input_fn`` and the ``model_fn`` are called by the ``CerebrasEstimator`` as these two are passed to the ``CerebrasEstimator`` as callback functions. You should not directly call either of these two functions in your TensorFlow code.


Both the ``CerebrasEstimator`` and TensorFlow Estimator API expect the model function to accept a standard group of input parameters and return a standard group of output values.

Currently, the ``CerebrasEstimator`` supports usage of the Tensorflow `Keras Layers API <https://www.tensorflow.org/api_docs/python/tf/keras/layers/>`__ in
the model function. However, the Tensorflow `Metrics API <https://www.tensorflow.org/api_docs/python/tf/metrics>`__ is not supported.

Syntax
~~~~~~

.. code-block:: python

    def model_fn(
        features,  # This is batch_features from input_fn
        labels,    # This is batch_labels from input_fn
        mode,      # An instance of tf.estimator.ModeKeys
        params     # Additional configuration
        ):


Example
~~~~~~~

See below an example of ``model_fn`` definition.

.. code-block:: python

    def model_fn(features, labels, mode=tf.estimator.ModeKeys.TRAIN, params=None):
        """ Model definition """
        logits = build_model(features, params)
        learning_rate = tf.constant(params["lr"])
        if mode in (tf.estimator.ModeKeys.TRAIN, tf.estimator.ModeKeys.EVAL):
            loss_op = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits_v2(labels=labels, logits=logits) )
            train_op = tf.train.GradientDescentOptimizer(learning_rate=learning_rate ).minimize(loss_op, global_step=tf.train.get_global_step())
            spec = tf.estimator.EstimatorSpec(mode=mode, loss=loss_op, train_op=train_op)
            return spec


.. _cs-est-config:

Setting the runtime configuration
---------------------------------

Runtime and environment options can be set. Usually this is the information that is not captured in the ``model_fn`` and ``input_fn``. Use the ``CSRunConfig`` object to set these Cerebras-specific options. These options are an extension of `TensorFlow RunConfig <https://www.tensorflow.org/api_docs/python/tf/estimator/RunConfig>`__.

.. important::

   Make sure to add the following ``import`` statement to your Slurm-orchestrated TensorFlow code so that Slurm cluster resolving is done automatically.

   .. code-block:: python

       from cerebras.tf.cs_slurm_cluster_resolver import CSSlurmClusterResolver

.. _csestimator-csrunconfig:

CSRunConfig
~~~~~~~~~~~

The Cerebras ``CSRunConfig`` class inherits from the standard TensorFlow ``RunConfig`` class. You can pass to the ``CSRunConfig`` the same parameters as those of the Tensorflow ``RunConfig``, and also pass additional parameters that specify the configurations for a ``CerebrasEstimator`` run, including the IP address of the CS system. Such additional parameters include:

-  ``cs_ip``: IP address of the CS system, provided by Cerebras.
- ``system_name``: Name of the CS system.

The full list of options for TensorFlow ``RunConfig`` can be found
`here <https://www.tensorflow.org/api_docs/python/tf/estimator/RunConfig>`__.

Example
^^^^^^^

.. code-block:: python

    from cerebras.models.common.estimator.tf.run_config import CSRunConfig
    from cerebras.tf.cs_slurm_cluster_resolver import CSSlurmClusterResolver

    config = CSRunConfig(
                cs_ip=ip,
                save_checkpoints_steps=1000,
                log_step_count_steps=10000,
                save_summary_steps=1000
             )