common.tf package#

Subpackages#

Submodules#

common.tf.TFBaseModel module#

Base class for TensorFlow models.

class common.tf.TFBaseModel.TFBaseModel#

Bases: abc.ABC

Base class for TensorFlow models. Provides a general model API, consisting of the following methods that must be implemented by child classes:

build_model: builds the model build_total_loss: builds total loss, given model

outputs returned by build_model

build_train_ops: sets up an optimizer and returns asscoiated train ops build_eval_metric_ops: build evaluation metric ops

The __call__ function wraps around build_model.

All TF models must inherit from TFBaseModel and implement __init__ (containing the call to TFBaseModel’s __init__), build_model, build_total_loss, build_train_ops, and build_eval_metric_ops methods.

Parameters

mixed_precision (bool) – Enable mixed precision, if True.

__init__(mixed_precision=False)#
abstract build_eval_metric_ops(model_outputs, labels, features=None)#

Build eval metric ops.

Parameters
  • model_outputs – model outputs. returned by build_model

  • labels – Labels.

  • features – Input features, optional

Returns

Eval ops.

abstract build_model(features, mode)#

Build model.

Parameters
  • features – Input features.

  • mode (tf.estimator.ModeKeys) – Mode (TRAIN, EVAL).

Returns

Model outputs

abstract build_total_loss(model_outputs, features, labels, mode)#

Build loss given model outputs.

Parameters
  • model_outputs – model outputs. returned by build_model

  • features – Input features.

  • labels – Labels.

  • mode (tf.estimator.ModeKeys) – Mode (TRAIN, EVAL).

Returns

Total loss tensor.

abstract build_train_ops(total_loss)#

Setup optimizer and build train ops.

Parameters

total_loss (Tensor) – The total loss return by __call__

Returns

Train ops

common.tf.appliance_utils module#

Helper utilities for running on cerebras appliance cluster

class common.tf.appliance_utils.ExecutionStrategy#

Bases: object

Represent Cerebras Execution Strategies

classmethod as_appliance_key(key: str)#

Transform strategy string key to a typed enum.

pipeline = 'pipeline'#
classmethod strategies()#

Returns all available strategies.

weight_streaming = 'weight_streaming'#
common.tf.appliance_utils.get_debug_args(debug_args_path, debug_ini_path)#

Appliance mode DebugArgs.

common.tf.appliance_utils.get_debug_mgr_args(debug_ini_fp, debug_args)#

Appliance mode get debug_mgr related handling

common.tf.appliance_utils.parse_args_and_params(run_dir: str, set_default_params: Optional[Callable] = None) dict#

Parses commandline arguments and returns the params.

Parameters
  • run_dir – The root directory where to create the model_dir in.

  • set_default_params – A callable that updates params with some defaults specific to this model. Defaults to None.

Returns

Params parsed from cmdline arguments and the params file.

common.tf.appliance_utils.run_appliance(model_fn: Callable, train_input_fn: Callable, eval_input_fn: Callable, supported_strategies: List[str], default_params_fn: Optional[Callable] = None, stack_params_fn: Optional[Callable] = None, enable_cs_summaries: bool = False)#

Helper method for running models locally or on CS-X Systems.

Parameters
  • model_fn – A callable for creating the model.

  • train_input_fn – A callable for creating a data input pipeline for train.

  • eval_input_fn – A callable for creating a data input pipeline for eval.

  • supported_strategies – List of supported execution strategies. If a strategy is not explicitly selected in cmdline args, the default strategy chosen is the first item in this list.

  • default_params_fn – A callable that takes in the parsed params and sets defaults for missing params.

  • stack_params_fn – A callable that takes in the parsed params and sets Cerebras-specific config for stack compilation.

  • enable_summaries – Enable summaries when running on CS-X hardware.

common.tf.appliance_utils.setup_logging(level: str, logging_dir: Optional[str] = None)#

Sets up the logging verbosity level.

Parameters
  • level – The logging level string.

  • logging_dir – Where to store logs for archival purposes.

common.tf.appliance_utils.update_debug_args_from_stack_params(debug_args, stack_params_fn: Callable[[dict], dict], params: dict) None#

Gets stack params and encodes them in the give debug args.

Parameters
  • debug_args – The debug args in which to inject the stack params.

  • stack_params_fn – A callable that takes in params and returns a dict of stack params for the model.

  • params – The parsed model params.

common.tf.run_utils module#

Defining run time utilities of estimator workflow for device-specific execution

Key functions include:

is_cs: checks whether this is a CS1 runtime environment get_gpu_distribution_strategy: set up GPU distributed training save_params: save params in yaml format in model directory update_params_from_args: update command line arguments into params save_predictions: save predictions from estimator.predict into npy files

class common.tf.run_utils.ExecutionMode#

Bases: enum.Enum

An enumeration.

OutsideCerebras = 3#
Pipeline = 2#
WeightStreaming = 1#
class common.tf.run_utils.GetWeights#

Bases: object

Class to easily load weights from a checkpoint by name or as iterator.

__init__(ckpt_path)#
property var_names#

Variable names contained in the checkpoint

common.tf.run_utils.check_env(params)#

Perform basic checks for parameters and env

Parameters

params (dict) – runconfig dict we want to validate

common.tf.run_utils.create_warm_start_settings(runconfig_params, exclude_string=None)#

Creates warm start settings for estimator.

Does not load any weights that include exclude string. This is useful when fine-tuning pretrained models.

Parameters
  • runconfig_params (dict) – runconfig params

  • exclude_string (str) – any weights with this string in the name will be initialized from scratch instead of coming from the checkpoint.

Returns

a WarmStartSettings object (or None if no checkpoint_path is provided) to be passed into estimator’s warm_start_from field.

common.tf.run_utils.dict_to_checkpoint(state_dict, checkpoint_name)#

Saves a dictionary of weight values into a tf Saver style chekcpoint.

Parameters
  • state_dict – (Dict[str, np.ndarray]) Collection of weights.

  • checkpoint_name – (str) Name of the checkpoint file to create.

Returns

(str) The path to the saved checkpoint.

common.tf.run_utils.get_csconfig(params)#

Returns CSConfig proto.

common.tf.run_utils.get_csrunconfig_dict(params)#
common.tf.run_utils.get_execution_mode()#
common.tf.run_utils.get_input_checkpoint_steps(model_dir)#

Get the correct input checkpoint steps to run. :param model_dir: Model directory to fetch input checkpoint steps from :type model_dir: str

Returns

An integer specifying the number of iterations of the data loader to skip.

common.tf.run_utils.get_params(params_file)#
common.tf.run_utils.get_predict_directory(model_dir)#

Gets the predict directory within the given model_dir if it exists

Parameters

model_dir (string) – Directory we want to write to

common.tf.run_utils.get_weight_dict(ckpt_path)#

Reads TensorFlow checkpoint from specified path and returns the corresponding model’s parameters as a dictionary of variable names to numpy arrays. :param ckpt_path: (str)

Path to TensorFlow checkpoint (prefix).

Returns

(dict)

Dictionary of variable names to numpy arrays with corresponding model parameters.

common.tf.run_utils.is_cs(params)#

Check if the runtime environment is that of a Cerebras System. If yes, return True, else False

For legacy k8s flow, the user does not need to specify cs_ip, since k8s schedule determines which CS system to use internally. When a CS is needed for the run, K8S_CS_IP will be set. This functions returns true if K8S_CS_IP is set.

Parameters

params (dict) – runconfig dict to provide parameters for check

common.tf.run_utils.save_params(params, model_dir, fname='params.yaml')#

Writes and saves a dictionary to a file in the model_dir.

Parameters
  • params (dict) – dict we want to write to a file in model_dir

  • model_dir (string) – Directory we want to write to

  • fname (string) – Name of file in model_dir we want to save to.

common.tf.run_utils.save_predictions(model_dir, outputs, name='outputs.npz')#

Save outputs in give model_dir to give name, by initializing the predict dir within model_dir

Parameters
  • model_dir (string) – Directory we want to write to

  • outputs (list) – List of dictionaries returned by estimator.predict

  • name (string) – Name of output, generally in .npy format

common.tf.run_utils.setup_environment(params)#

Set environment to have determinism and reproducible runs if tf_random_seed is set.

Parameters

params (dict) – Parameters for execution

common.tf.run_utils.update_input_checkpoint_steps(params)#

Update the correct input checkpoint steps to run. :param params: Parameters for execution :type params: dict

Returns

The parameter dictionary modified with the correct number of input steps to skip during execution

common.tf.run_utils.update_params_from_args(args, params)#

Sets command line arguments from args into params.

Parameters
  • args (argparse namespace) – Command line arguments

  • params (dict) – runconfig dict we want to update

Module contents#