Submodules# module#

Base class for TensorFlow models.


Bases: abc.ABC

Base class for TensorFlow models. Provides a general model API, consisting of the following methods that must be implemented by child classes:

build_model: builds the model build_total_loss: builds total loss, given model

outputs returned by build_model

build_train_ops: sets up an optimizer and returns asscoiated train ops build_eval_metric_ops: build evaluation metric ops

The __call__ function wraps around build_model.

All TF models must inherit from TFBaseModel and implement __init__ (containing the call to TFBaseModel’s __init__), build_model, build_total_loss, build_train_ops, and build_eval_metric_ops methods.


mixed_precision (bool) – Enable mixed precision, if True.

abstract build_eval_metric_ops(model_outputs, labels, features=None)#

Build eval metric ops.

  • model_outputs – model outputs. returned by build_model

  • labels – Labels.

  • features – Input features, optional


Eval ops.

abstract build_model(features, mode)#

Build model.

  • features – Input features.

  • mode (tf.estimator.ModeKeys) – Mode (TRAIN, EVAL).


Model outputs

abstract build_total_loss(model_outputs, features, labels, mode)#

Build loss given model outputs.

  • model_outputs – model outputs. returned by build_model

  • features – Input features.

  • labels – Labels.

  • mode (tf.estimator.ModeKeys) – Mode (TRAIN, EVAL).


Total loss tensor.

abstract build_train_ops(total_loss)#

Setup optimizer and build train ops.


total_loss (Tensor) – The total loss return by __call__


Train ops module#

Helper utilities for running on cerebras appliance cluster


Bases: object

Represent Cerebras Execution Strategies

classmethod as_appliance_key(key: str)#

Transform strategy string key to a typed enum.

pipeline = 'pipeline'#
classmethod strategies()#

Returns all available strategies.

weight_streaming = 'weight_streaming'#, debug_ini_path)#

Appliance mode DebugArgs., debug_args)#

Appliance mode get debug_mgr related handling str, set_default_params: Optional[Callable] = None) dict#

Parses commandline arguments and returns the params.

  • run_dir – The root directory where to create the model_dir in.

  • set_default_params – A callable that updates params with some defaults specific to this model. Defaults to None.


Params parsed from cmdline arguments and the params file. Callable, train_input_fn: Callable, eval_input_fn: Callable, supported_strategies: List[str], default_params_fn: Optional[Callable] = None, stack_params_fn: Optional[Callable] = None, enable_cs_summaries: bool = False)#

Helper method for running models locally or on CS-X Systems.

  • model_fn – A callable for creating the model.

  • train_input_fn – A callable for creating a data input pipeline for train.

  • eval_input_fn – A callable for creating a data input pipeline for eval.

  • supported_strategies – List of supported execution strategies. If a strategy is not explicitly selected in cmdline args, the default strategy chosen is the first item in this list.

  • default_params_fn – A callable that takes in the parsed params and sets defaults for missing params.

  • stack_params_fn – A callable that takes in the parsed params and sets Cerebras-specific config for stack compilation.

  • enable_summaries – Enable summaries when running on CS-X hardware. str, logging_dir: Optional[str] = None)#

Sets up the logging verbosity level.

  • level – The logging level string.

  • logging_dir – Where to store logs for archival purposes., stack_params_fn: Callable[[dict], dict], params: dict) None#

Gets stack params and encodes them in the give debug args.

  • debug_args – The debug args in which to inject the stack params.

  • stack_params_fn – A callable that takes in params and returns a dict of stack params for the model.

  • params – The parsed model params. module#

Defining run time utilities of estimator workflow for device-specific execution

Key functions include:

is_cs: checks whether this is a CS1 runtime environment get_gpu_distribution_strategy: set up GPU distributed training save_params: save params in yaml format in model directory update_params_from_args: update command line arguments into params save_predictions: save predictions from estimator.predict into npy files


Bases: enum.Enum

An enumeration.

OutsideCerebras = 3#
Pipeline = 2#
WeightStreaming = 1#

Bases: object

Class to easily load weights from a checkpoint by name or as iterator.

property var_names#

Variable names contained in the checkpoint

Perform basic checks for parameters and env


params (dict) – runconfig dict we want to validate, exclude_string=None)#

Creates warm start settings for estimator.

Does not load any weights that include exclude string. This is useful when fine-tuning pretrained models.

  • runconfig_params (dict) – runconfig params

  • exclude_string (str) – any weights with this string in the name will be initialized from scratch instead of coming from the checkpoint.


a WarmStartSettings object (or None if no checkpoint_path is provided) to be passed into estimator’s warm_start_from field., checkpoint_name)#

Saves a dictionary of weight values into a tf Saver style chekcpoint.

  • state_dict – (Dict[str, np.ndarray]) Collection of weights.

  • checkpoint_name – (str) Name of the checkpoint file to create.


(str) The path to the saved checkpoint.

Returns CSConfig proto.

Get the correct input checkpoint steps to run. :param model_dir: Model directory to fetch input checkpoint steps from :type model_dir: str


An integer specifying the number of iterations of the data loader to skip.

Gets the predict directory within the given model_dir if it exists


model_dir (string) – Directory we want to write to

Reads TensorFlow checkpoint from specified path and returns the corresponding model’s parameters as a dictionary of variable names to numpy arrays. :param ckpt_path: (str)

Path to TensorFlow checkpoint (prefix).



Dictionary of variable names to numpy arrays with corresponding model parameters.

Check if the runtime environment is that of a Cerebras System. If yes, return True, else False

For legacy k8s flow, the user does not need to specify cs_ip, since k8s schedule determines which CS system to use internally. When a CS is needed for the run, K8S_CS_IP will be set. This functions returns true if K8S_CS_IP is set.


params (dict) – runconfig dict to provide parameters for check, model_dir, fname='params.yaml')#

Writes and saves a dictionary to a file in the model_dir.

  • params (dict) – dict we want to write to a file in model_dir

  • model_dir (string) – Directory we want to write to

  • fname (string) – Name of file in model_dir we want to save to., outputs, name='outputs.npz')#

Save outputs in give model_dir to give name, by initializing the predict dir within model_dir

  • model_dir (string) – Directory we want to write to

  • outputs (list) – List of dictionaries returned by estimator.predict

  • name (string) – Name of output, generally in .npy format

Set environment to have determinism and reproducible runs if tf_random_seed is set.


params (dict) – Parameters for execution

Update the correct input checkpoint steps to run. :param params: Parameters for execution :type params: dict


The parameter dictionary modified with the correct number of input steps to skip during execution, params)#

Sets command line arguments from args into params.

  • args (argparse namespace) – Command line arguments

  • params (dict) – runconfig dict we want to update

Module contents#