.. _porting-pytorch-to-cs: Porting PyTorch Model to CS =========================== Option 1 (Easiest): Modify reference models in Cerebras Model Zoo git repository --------------------------------------------------------------------------------- The `Cerebras Model Zoo repository `_ contains reference implementations in PyTorch of popular neural networks such as BERT, GPT-2 and T5. These implementations have been modularized to separate data preprocessing, model implementation, and additional functions for execution. If your primary goal is to use one of these models, even with some model or data preprocessing changes, we recommend start from the Cerebras Model Zoo Repository and add the changes you need. Example 1: Changing the data loader ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For this example, we work with the PyTorch implementation of FC_MNIST in the Cerebras Model Zoo. We create a synthetic dataloader to evaluate performance of the network with respect to different input sizes and number of classes. In :code:`data.py`, we create a function called :code:`get_random_dataloader` that creates random images and labels. We instrument the function to specify in the `params.yaml` file the number of examples, the batch size the seed, the image_size and the number of classes of this dataset. .. code-block:: python import torch import numpy as np def get_random_dataloader(input_params,shuffle,num_classes): num_examples = input_params.get("num_examples") batch_size = input_params.get("batch_size") seed = input_params.get("seed",1) image_size = input_params.get("image_size",[1,28,28]) np.random.seed(seed) image = np.random.random(size = [num_examples,]+image_size).astype(np.float32) label = np.random.randint(low =0, high = num_classes, size = num_examples).astype(np.int32) dataset = torch.utils.data.TensorDataset( torch.from_numpy(image), torch.from_numpy(label) ) return torch.utils.data.DataLoader( dataset, batch_size=batch_size, shuffle=shuffle, num_workers=input_params.get("num_workers", 0), ) def get_train_dataloader(params): return get_random_dataloader( params["train_input"], params["train_input"].get("shuffle"), params["model"].get("num_classes") ) def get_eval_dataloader(params): return get_random_dataloader( params["eval_input"], False, params["model"].get("num_classes") ) In :code:`model.py`, we change the fix number of classes to a parameter in the `params.yaml` file. .. code-block:: python class MNIST(nn.Module): def __init__(self, model_params): super().__init__() self.loss_fn = nn.NLLLoss() self.fc_layers = [] input_size = model_params.get("input_size",784) num_classes = model_params.get("num_classes",10) ... self.last_layer = nn.Linear(input_size, num_classes) ... In `configs/params.yaml`, we add the additional fields used in the dataloader and model definition. .. code-block:: python train_input: batch_size: 128 drop_last_batch: True num_examples: 1000 seed: 123 image_size: [1,28,28] shuffle: True eval_input: data_dir: "./data/mnist/val" batch_size: 128 num_examples: 1000 drop_last_batch: True seed: 1234 image_size: [1,28,28] model: name: "fc_mnist" mixed_precision: True input_size: 768 #1*68*68 num_classes: 10 ... .. Example 2: Adding a classification head ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. TODO: - Choose model to add classification head - (Ideal case) Have documentation of inputs and outputs of the models in MZ - (Ideal case) Have documentation of inputs and outputs of data loaders in MZ Option 2 (Easy): Create new models leveraging Cerebras run function available in Cerebras Model Zoo --------------------------------------------------------------------------------------------------- All PyTorch implementations in the Cerebras Model Zoo Repository use a common harness to manage execution on CS system and other hardware. This harness implements the necessary code changes to compile a model for a Cerebras system, run a compiled model on a Cerebras system, or run the model on CPU/GPU. Therefore, it provides a training/evaluation interface in which models and data preprocessing scripts can be plugged into, without worrying about line-by-line modifications to have Cerebras-friendly code. If your **primary goal is to develop new model and data preprocessing scripts**, we suggest to start by leveraging the common backbone in Cerebras Model Zoo Repository, the :code:`run` function. Prerequisites ~~~~~~~~~~~~~ To use the ``run`` function, you must have the Cerebras Model Zoo Repository compatible with the release installed in the target CS system. The :code:`run` function can be imported as .. code-block:: python from modelzoo/tree/main/modelzoo/common/pytorch.run_utils import run All the code related with run function lives inside the Cerebras Model Zoo Repository and can be found in the `common/pytorch `_ folder. How to use the ``run`` function ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The :code:`run` function modularizes the model implementation, the data loaders, the hyperparameters and the execution. To use the :code:`run` function you need: 1. Params YAML file. This file will be used at runtime. 2. Implementation that includes the following: a. Model definition b. Data loaders for training and evaluation Code Skeleton ^^^^^^^^^^^^^ .. code-block:: python import os import sys import torch #Append path to parent directory of Cerebras Model Zoo Repository sys.path.append(os.path.join(os.path.dirname(__file__), "..")) from Cerebras/modelzoo/tree/main/modelzoo/common/pytorch.run_utils import run from Cerebras/modelzoo/tree/main/modelzoo/common/pytorch.PyTorchBaseModel import PyTorchBaseModel #Step 1: Define Model #Step 1.1 Define Module class Model(torch.nn.Module): def __init__(self, params): ... def forward(inputs): ... return outputs ... #Step 1.2 Define PyTorchBaseModel class BaseModel(PyTorchBaseModel): def __init__(self, params, device = None) self.model = Model(params) self.loss_fn = ... ... super().__init__(params=params, model=self.model, device=device) def __call__(self, data): ... inputs, targets = data outputs = self.model(inputs) loss = self.loss_fn(outputs, targets) return loss #Step 2: Define dataloaders def get_train_dataloader(params): ... loader = torch.utils.data.DataLoader(...) return loader def get_eval_dataloader(params): ... loader = torch.utils.data.DataLoader(...) return loader #Step 3: Setup run function def main(): run(BaseModel, get_train_dataloader, get_eval_dataloader) if __name__ == '__main__': main() Step 1: Define Model ^^^^^^^^^^^^^^^^^^^^ To define the model architecture, the :code:`run` function requires a callable (either class or function) that takes as input a dictionary of params and returns a PyTorchBaseModel. To construct this callable you can: 1. First, define the model architecture with :code:`torch.nn.Module`. 2. Then, wrap it by defining a :code:`PyTorchBaseModel`. This class also takes care of defining the optimizer. To customize the model, the :code:`run` function creates a dictionary or params using the params YAML file such that: 1. The model section defines architecture hyperparameters. (optional) 2. The optimizer defines learning rates and optimizer details. (required) Creating a PyTorchBaseModel """"""""""""""""""""""""""" A PyTorchBaseModel object is a light wrapper around a :code:`torch.nn.Module`. The :code:`PyTorchBaseModel` class configures the optimization parameters defined in the params YAML file. This class can be imported from Cerebras Model Zoo git repository by .. code-block:: python from Cerebras/modelzoo/tree/main/modelzoo/common/pytorch import PyTorchBaseModel The implementation can be found `here `_. To initialize a PyTorchBaseModel it requires: +----------------+-------------------------+----------------------------------------------------------+ | | Type | Notes | +================+=========================+==========================================================+ | :code:`params` | :code:`dict` | Dictionary constructed from params YAML file | +----------------+-------------------------+----------------------------------------------------------+ | :code:`model` | :code:`torch.nn.Module` | Definition of model architecture | +----------------+-------------------------+----------------------------------------------------------+ | :code:`device` | :code:`torch.device` | The default value is :code:`device: torch.device = None` | | | | In this case, the runner code inside run function | | | | figures out the proper device to the run. | +----------------+-------------------------+----------------------------------------------------------+ In addition, any child class of :code:`PyTorchBaseModel` must implement the :code:`__call__` function. Given one iteration of a dataloader as input, the :code:`__call__` function should return the loss associated with one forward pass of that batch. Step 2: Define dataloaders ^^^^^^^^^^^^^^^^^^^^^^^^^^ To define the data loaders, the run function requires a callable (either class or function) that takes as input a dictionary of params returns a :code:`torch.utils.data.DataLoader`. When running training, the :code:`train_data_fn` must be provided. When running evaluation, the :code:`eval_data_fn` must be provided. Step 3: Set up the ``run`` function ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The run function must be imported from `utils.py `_. Always remember to append the parent directory of the Cerebras Model Zoo git repository. All the input parameters of :code:`run` function are callables that take as input a dictionary, called :code:`params`. :code:`params` is a dictionary containing all of the model and data parameters specified by the params YAML file of the model. +---------------------------+-------------------------------------------------------+-----------------------------------------------------+ | Parameter | Type | Notes | +===========================+=======================================================+=====================================================+ | :code:`model_fn` | :code:`Callable[[dict], PyTorchBaseModel]` | Required. A callable that takes in a dictionary of | | | | parameters. Returns a :code:`PyTorchBaseModel`. | +---------------------------+-------------------------------------------------------+-----------------------------------------------------+ | :code:`train_data_fn` | :code:`Callable[[dict], torch.utils.data.DataLoader]` | Required during training run. | +---------------------------+-------------------------------------------------------+-----------------------------------------------------+ | :code:`eval_data_fn` | :code:`Callable[[dict], torch.utils.data.DataLoader]` | Required during evaluation run. | +---------------------------+-------------------------------------------------------+-----------------------------------------------------+ | :code:`default_params_fn` | :code:`Callable[[dict], Optional[dict]]` | Optional. A callable that takes in a dictionary of | | | | parameters. Sets default parameters. | +---------------------------+-------------------------------------------------------+-----------------------------------------------------+ Manage common params for multiple experiments """""""""""""""""""""""""""""""""""""""""""" To avoid params replication between multiple similar experiments, the :code:`run` function has an optional input parameter called :code:`default_params_fn`. This parameter modifies the dictionary of the params YAML file, adding default values of unspecified params. Setting up a :code:`default_params_fn` could be beneficial if the user is planning multiple experiments in which only a small subset of the params YAML file changes. The :code:`default_params_fn` sets up the values shared in all of the experiments. The user can create different configuration YAML files to only address the changes between experiments. The :code:`default_params_fn` should be a callable that takes in the :code:`params` dictionary and returns a new dictionary. If the :code:`default_params_fn` is omitted, the :code:`params` dictionary will be used as is. Step 4: Create params YAML file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ At runtime, the :code:`run` function requires a separate params YAML. This file is specified during execution with the flag :code:`--params` in the command line. Parameters skeleton: .. code-block:: yaml train_input: ... eval_input: ... model: ... optimizer: optimizer_type: ... learning_rate: ... loss_scaling_factor: ... runconfig: max_steps: ... checkpoint_steps: ... log_steps: ... seed: ... save_losses: ... The params YAML file has the following sections: +---------------------+----------+-------------------------------------------------------------------+ | Section | Required | Notes | +=====================+==========+===================================================================+ | :code:`runconfig` | Yes | Used by run to set up logging and execution. | | | | It expects fields: :code:`max_steps`, :code:`checkpoint_steps`, | | | | :code:`log_steps`, :code:`save_losses`. | +---------------------+----------+-------------------------------------------------------------------+ | :code:`optimizer` | Yes | Used by :code:`PyTorchBaseModel` to set up optimizer. | | | | It expects fields: :code:`optimizer_type`, :code:`learning_rate`, | | | | :code:`loss_scaling_factor`. | +---------------------+----------+-------------------------------------------------------------------+ | :code:`model` | No | By convention, it is used to customize the model | | | | architecture in :code:`nn.Module`. Fields are tailored to | | | | needs inside the model. | +---------------------+----------+-------------------------------------------------------------------+ | :code:`train_input` | No | By convention, it is used to customize train_data_fn. | | | | Fields are tailored to needs inside train_data_fn. | +---------------------+----------+-------------------------------------------------------------------+ | :code:`eval_input` | No | By convention, it is used to customize eval_data_fn. | | | | Fields are tailored to needs inside eval_data_fn. | +---------------------+----------+-------------------------------------------------------------------+ .. TODO: - runconfig: (Which of these are necessary? → Add on execution section?) - optimizer: (Which of these are necessary? → Section on Optimizer supported and how to specify each of them?) Optimizer """"""""" There are a number of optimizer parameters that can be used to configure the optimizer for the run. Currently, the only supported optimizers are :code:`SGD` and :code:`AdamW`. The optimizer type can be specified via the :code:`optimizer_type` sub parameter. Below are the required and optional params that can be used to configure them +------------------------+-----------------------------------+--------------------------------------------------+ | :code:`optimizer_type` | Parameters | Descriptions | +========================+===================================+==================================================+ | :code:`SGD` | :code:`learning_rate` | See "Learning Rate Scheduling" subsection. | | +-----------------------------------+--------------------------------------------------+ | | :code:`momentum` | The momentum factor. | | +-----------------------------------+--------------------------------------------------+ | | :code:`weight_decay_rate` | Optional. weight decay. (L2 penalty) | | | | (Default: 0.0) | +------------------------+-----------------------------------+--------------------------------------------------+ | :code:`AdamW` | :code:`learning_rate` | See "Learning Rate Scheduling" subsection. | | +-----------------------------------+--------------------------------------------------+ | | :code:`beta1` | Optional. Adam’s first beta parameter. | | | | (Default: 0.9) | | +-----------------------------------+--------------------------------------------------+ | | :code:`beta2` | Optional. Adam’s second beta parameter. | | | | (Default: 0.999) | | +-----------------------------------+--------------------------------------------------+ | | :code:`correct_bias` | Optional. Whether or not to correct bias in Adam.| | | | (Default: False) | | +-----------------------------------+--------------------------------------------------+ | | :code:`exclude_from_weight_decay` | Parameters to exclude from weight decay. | | | | | +------------------------+-----------------------------------+--------------------------------------------------+ All above parameters being sub parameters to the optimizer top-level parameter. Refer to the Cerebras Model Zoo git repository for examples of how to configure the optimizer. Learning Rate Scheduler """"""""""""""""""""""" We also support various learning rate schedulers. They are configurable using the :code:`learning_rate` sub parameter. Valid configurations include the following: +---------------------------+-------------------------------+----------------------------------------------------+ | :code:`learning_rate` | Parameters | Descriptions | +---------------------------+-------------------------------+----------------------------------------------------+ | Constant | | A floating point number specifying the | | | | learning rate to be used throughout . | +---------------------------+-------------------------------+----------------------------------------------------+ | :code:`PieceWiseConstant` | :code:`values` | The constant values to use. | | +-------------------------------+----------------------------------------------------+ | | :code:`boundaries` | The steps on which to change the learning | | | | rate values. | +---------------------------+-------------------------------+----------------------------------------------------+ | :code:`Linear` | :code:`initial_learning_rate` | The starting learning rate value. | | +-------------------------------+----------------------------------------------------+ | | :code:`end_learning_rate` | The final learning rate value. | | +-------------------------------+----------------------------------------------------+ | | :code:`steps` | The number of steps over which to transition | | | | from the starting learning rate to the final | | | | learning rate. | +---------------------------+-------------------------------+----------------------------------------------------+ | :code:`Exponential` | :code:`initial_learning_rate` | The starting learning rate value. | | +-------------------------------+----------------------------------------------------+ | | :code:`decay_steps` | The number of steps to decay the learning rate. | | +-------------------------------+----------------------------------------------------+ | | :code:`decay_rate` | The rate at which to decay the learning rate. | +---------------------------+-------------------------------+----------------------------------------------------+ Loss scaling """""""""""" We support static and dynamic loss scaling which are configurable through the :code:`optimizer`'s subparameters: +-----------------------------+--------------------------------------------------------------------------------+ | :code:`loss_scaling_factor` | A constant scalar value means configure for static loss scaling. | | | Passing in the string :code:`"dynamic"` configures it for dynamic loss scaling.| | | (Default: :code:`1`. Don't configure any loss scaling.) | +-----------------------------+--------------------------------------------------------------------------------+ | :code:`initial_loss_scale` | The initial loss scale value if :code:`loss_scale == "dynamic"`. | | | (Default: :code:`2e15`) | +-----------------------------+--------------------------------------------------------------------------------+ | :code:`steps_per_increase` | The number of steps after which to increase the loss scaling condition. | | | (Default: :code:`2000`) | +-----------------------------+--------------------------------------------------------------------------------+ | :code:`min_loss_scale` | The minimum loss scale value that can be chosen by dynamic loss scaling. | | | (Default: :code:`2e-14`) | +-----------------------------+--------------------------------------------------------------------------------+ | :code:`max_loss_scale` | The maximum loss scale value that can be chosen by dynamic loss scaling. | | | (Default: :code:`2e15`) | +-----------------------------+--------------------------------------------------------------------------------+ Global Gradient Clipping """""""""""""""""""""""" We support global gradient clipping by `value `_ or by the `normalized value `_. They are configurable through the :code:`optimizer`'s subparameters: +----------------------------+----------------------------+ | :code:`max_gradient_norm` | max norm of the gradients | +----------------------------+----------------------------+ | :code:`max_gradient_value` | max value of the gradients | +----------------------------+----------------------------+ .. note:: The above subparameters are mutually exclusive. They cannot both be specified at the same time. Step 5: Execute script with run function ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The :code:`run` function is instrumented to parse the command line arguments. Required arguments: +--------------------------------------------+---------------------------------------------------------------+ | :code:`-p PARAMS`, | Path to .yaml file with model parameters. | | | This parameters usually define the architecture of the model, | | :code:`--params PARAMS` | details of data preprocessing and logging frequencies. | +--------------------------------------------+---------------------------------------------------------------+ | :code:`-m {train,eval,train_and_eval}` | Execution mode: train, eval, or train_and_eval. | | | Specifies the mode. Depending on this value, the runner | | :code:`--mode {train,eval,train_and_eval}` | exercises either the training loop or the evaluation loop. | +--------------------------------------------+---------------------------------------------------------------+ Optional arguments: +-------------------------------------------+----------------------------------------------------------------+ | :code:`-h`, :code:`--help` | Shows help message and exits. | +-------------------------------------------+----------------------------------------------------------------+ | :code:`-cs CS_IP` | IP Address of Cerebras system | | | This argument is the way to specify the IP address of the | | :code:`--cs_ip CS_IP` | Cerebras system and the port that the connection | | | manager is listening on. | | | | | | If this parameter is not provided, the runner checks whether | | | a GPU is available and will automatically use it unless the | | | :code:`--cpu` argument is also provided. | +-------------------------------------------+----------------------------------------------------------------+ | :code:`--compile_only` | Enables compile only workflow. | | | | | | This exercises the compile only path. This workflow compiles | | | the model and creates executables for the Cerebras system, | | | but it does not launch a job on the system. | | | | | | This workflow is particularly useful to verify whether the | | | model compiles without using system resources. | +-------------------------------------------+----------------------------------------------------------------+ | :code:`-o MODEL_DIR` | Model directory where checkpoints are written. | | | | | :code:`--model_dir MODEL_DIR` | This specifies the path to the model directory. This is the | | | directory where all of the logs and artifacts generated by the | | | compile and execution (including checkpoints) are stored. | +-------------------------------------------+----------------------------------------------------------------+ | :code:`--checkpoint_path CHECKPOINT_PATH` | Checkpoint to initialize weights from. | | | | | | This is useful to execute evaluation or to continue training | | | from pretrained weights. | +-------------------------------------------+----------------------------------------------------------------+ | :code:`--is_pretrained_checkpoint` | Flag indicating that the provided checkpoint is from a | | | pre-training run. If set, training begins from step 0 | | | after loading the matching weights from the checkpoint | | | and ignores the optimizer state if present in the checkpoint. | +-------------------------------------------+----------------------------------------------------------------+ | :code:`--logging LOGGING` | Specifies the default logging level. Defaults to INFO. | +-------------------------------------------+----------------------------------------------------------------+ Execution in CS system ~~~~~~~~~~~~~~~~~~~~~~ To execute the model in a CS system, we can use the instrumentation inside the ``run`` function Compile: .. code-block:: shell csrun_cpu python-pt run.py --mode --params params.yaml --compile_only --cs_ip Execute: .. code-block:: shell csrun_wse python-pt run.py --mode --params params.yaml --cs_ip Execution in different hardware ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Using :code:`run` function enables the execution of the same code on a Cerebras system and on CPU/GPU without any changes. You can use the command line arguments to specify the type of device use for training and evaluation. If you are interested in: +-------------+---------------------------------------------------------+-----------------------------+ | Execution | Devices Available | Flags Needed | +-------------+---------------------------------------------------------+-----------------------------+ | Compilation | The compilation is done in CPU and it is | :code:`--compile_only` | | | only required when executing code in a Cerebras system. | :code:`--cs_ip CS_IP` | | | | :code:`--mode {train,eval}` | +-------------+---------------------------------------------------------+-----------------------------+ | Training | Cerebras system | :code:`--cs_ip CS_IP` | | | | :code:`--mode train` | | +---------------------------------------------------------+-----------------------------+ | | CPU | :code:`--cpu` | | | | :code:`--mode train` | | +---------------------------------------------------------+-----------------------------+ | | GPU | :code:`--mode train` | +-------------+---------------------------------------------------------+-----------------------------+ | Evaluation | Cerebras system | :code:`--cs_ip CS_IP` | | | | :code:`--mode eval` | | +---------------------------------------------------------+-----------------------------+ | | CPU | :code:`--cpu` | | | | :code:`--mode eval` | | +---------------------------------------------------------+-----------------------------+ | | GPU | :code:`--mode eval` | +-------------+---------------------------------------------------------+-----------------------------+ Example using :code:`run` with FC_MNIST ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this example, we port a PyTorch implementation of a fully connected dense neural network for the MNIST dataset to CS-friendly code using the :code:`run` function. Step 1: Define model ^^^^^^^^^^^^^^^^^^^^ In this example, we construct a FC_MNIST implementation given the depth and the hidden size of the network. We assume that the input size is 784 and the last output dimension is 10. We use :code:`ReLU` as non linearity, and a negative log likelihood loss. In `fc_mnist.py` we define a child class from :code:`torch.nn.Module` called MNIST. .. code-block:: python import torch import torch.nn as nn import torch.nn.functional as F class MNIST(nn.Module): def __init__(self, model_params): super().__init__() self.fc_layers = [] input_size = 784 # Depth is len(hidden_sizes) model_params["hidden_sizes"] = [ model_params["hidden_size"] ] * model_params["depth"] for hidden_size in model_params["hidden_sizes"]: fc_layer = nn.Linear(input_size, hidden_size) self.fc_layers.append(fc_layer) input_size = hidden_size self.fc_layers = nn.ModuleList(self.fc_layers) self.last_layer = nn.Linear(input_size, 10) self.nonlin = nn.ReLU() self.dropout = nn.Dropout(model_params["dropout"]) def forward(self, inputs): x = torch.flatten(inputs, 1) for fc_layer in self.fc_layers: x = fc_layer(x) x = self.nonlin(x) x = self.dropout(x) pred_logits = self.last_layer(x) outputs = F.log_softmax(pred_logits, dim=1) return outputs Then, in `model.py`, we create a child class from :code:`PyTorchBaseModel` called :code:`MNISTModel` that wraps the module :code:`MNIST`. In :code:`MNISTModel`, additional to initialization, we implement two functions: :code:`build_model` to create a :code:`MNIST` object, and :code:`__call__` to return the loss associated with one forward pass of a given dataloader iteration. .. code-block:: python import torch from cerebras_modelzoo_common.pytorch.PyTorchBaseModel.py import PyTorchBaseModel from FC_MNIST import MNIST class MNISTModel(PyTorchBaseModel): def __init__(self, params, device=None): self.params = params model_params = params["model"].copy() self.model = self.build_model(model_params) self.loss_fn = nn.NLLLoss() super().__init__(params=params, model=self.model, device=device) def build_model(self, model_params): dtype = torch.float32 model = MNIST(model_params) model.to(dtype) return model def __call__(self, data): inputs, labels = data outputs = self.model(inputs) loss = self.loss_fn(outputs, labels) return loss Step 2: Define dataloaders ^^^^^^^^^^^^^^^^^^^^^^^^^^ In this example, we create two different functions for training and evaluation. We use :code:`torchvision.datasets` functionality to download MNIST dataset. Each of these functions returns a :code:`torch.utils.data.DataLoader`. In `data.py`: .. code-block:: python import torch from torchvision import datasets, transforms def get_train_dataloader(params): input_params = params["train_input"] batch_size = input_params.get("batch_size") dtype = torch.float16 if input_params["to_float16"] else torch.float32 shuffle = input_params["shuffle"] train_dataset = datasets.MNIST( input_params["data_dir"], train=True, download=True, transform=transforms.Compose( [ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)), transforms.Lambda( lambda x: torch.as_tensor(x, dtype=dtype) ), ] ), target_transform=transforms.Lambda( lambda x: torch.as_tensor(x, dtype=torch.int32) ), ) train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=batch_size, drop_last=input_params["drop_last_batch"], shuffle=shuffle, num_workers=input_params.get("num_workers", 0), ) return train_loader def get_eval_dataloader(params): input_params = params["eval_input"] batch_size = input_params.get("batch_size") dtype = torch.float16 if input_params["to_float16"] else torch.float32 eval_dataset = datasets.MNIST( input_params["data_dir"], train=False, download=True, transform=transforms.Compose( [ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)), transforms.Lambda( lambda x: torch.as_tensor(x, dtype=dtype) ), ] ), target_transform=transforms.Lambda( lambda x: torch.as_tensor(x, dtype=torch.int32) ), ) eval_loader = torch.utils.data.DataLoader( eval_dataset, batch_size=batch_size, drop_last=input_params["drop_last_batch"], shuffle=False, num_workers=input_params.get("num_workers", 0), ) return eval_loader Step 3: Set up run function ^^^^^^^^^^^^^^^^^^^^^^^^^^^ With all of the elements in place, now we import the run function from :code:_`modelzoo.common.pytorch.run_utils `_. We append the parent directory of Cerebras Model Zoo Repository. In `run.py`: .. code-block:: python import os import sys #Append path to parent directory of Cerebras ModelZoo Repository sys.path.append(os.path.join(os.path.dirname(__file__), "..")) from modelzoo.common.pytorch.run_utils import run from data import ( get_train_dataloader, get_eval_dataloader, ) from model import MNISTModel def main(): run(MNISTModel, get_train_dataloader, get_eval_dataloader) if __name__ == '__main__': main() Step 4: Setup params yaml file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We customize the fields in :code:`train_input`, :code:`eval_input`, :code:`model`, to be used inside :code:`get_train_dataloader`, :code:`get_eval_dataloader`, :code:`MNISTModel`. We also specify the required :code:`optimizer` and :code:`runconfig` params. In `params.yaml`: .. code-block:: yaml train_input: data_dir: "./data/mnist/train" batch_size: 128 drop_last_batch: True shuffle: True to_float16: True eval_input: data_dir: "./data/mnist/val" batch_size: 128 drop_last_batch: True to_float16: True model: name: "fc_mnist" mixed_precision: True depth: 10 hidden_size: 50 dropout: 0.0 activation_fn: "relu" optimizer: optimizer_type: "SGD" learning_rate: 0.001 momentum: 0.9 loss_scaling_factor: 1.0 runconfig: max_steps: 10000 checkpoint_steps: 2000 log_steps: 50 seed: 1 save_losses: True Additional functionality ~~~~~~~~~~~~~~~~~~~~~~~~ Logging ^^^^^^^ By default, the run function logs training information to the console and to TensorBoard. - **Console logging**: Given the frequency defined by :code:`log_steps` in the :code:`runconfig` section of the params YAML file, the training displays the step number, the current loss, the number of samples per second, and the current time. As an example: .. code-block:: | Train Device=xla:0 Step=2 Loss=0.00000 Rate=361.86 GlobalRate=361.53 Time=08:29:00 - **TensorBoard Logging**: A TensorBoard SummaryWriter is created. It contains information such as the loss and samples per second. This information is stored inside the `model_dir/{mode}` directory. Evaluation metrics ^^^^^^^^^^^^^^^^^^ The Cerebras Model Zoo git repository uses a base class to compute evaluation metrics called :code:`CBMetric`. Metrics already defined in the Model Zoo git repository can be imported as: .. code-block:: from modelzoo.common.pytorch.metrics import ( AccuracyMetric, FBetaScoreMetric, PerplexityMetric, RougeScoreMetric, ) As an example, the BERT implementation in PyTorch (`modelzoo/transformers/pytorch/bert/model.py`) uses some of these metrics. How to use evaluation metrics """"""""""""""""""""""""""""" 1. **Registration**: All metrics must be registered with the corresponding :code:`PyTorchBaseModel` class. This is automatically done when the :code:`CBMetric` object is constructed. That is, to register a metric to a :code:`PytorchBaseModel` class, construct the metric object in the :code:`PytorchBaseModel` class' constructor. 2. **Update**: The metrics are stateful. This means that every call to the metric object with the appropriate arguments automatically the latest metric value and save it in the metric’s internal state. 3. **Logging**: At the very end of the run, the final metrics values will be computed and then logged both to the console and to the TensorBoard :code:`SummaryWriter`. More on evaluation metrics """""""""""""""""""""""""" Implementation of :code:`CBMetric` class can be found `here `_. The :code:`CBMetric` class is a base class for creating metrics on CS devices. Subclasses must override methods to provide the full functionality of the metric. These methods are meant to split the computation graph into two portions: 1. :code:`update_on_device`: Compiles and runs on the device (i.e., CS system). 2. :code:`update_on_host`: Runs on the host (i.e., CPU). These metrics also support running on CPU and GPU.