common.tf package#
Subpackages#
- common.tf.estimator package
- common.tf.hooks package
- common.tf.input package
- common.tf.layers package
- Submodules
- common.tf.layers.AbstractRecomputeWrapper module
- common.tf.layers.ActivationLayer module
- common.tf.layers.AddLayer module
- common.tf.layers.AttentionLayer module
- common.tf.layers.BaseLayer module
- common.tf.layers.Conv2DLayer module
- common.tf.layers.Conv2DTransposeLayer module
- common.tf.layers.CrossEntropyFromLogitsLayer module
- common.tf.layers.DenseLayer module
- common.tf.layers.DropoutLayer module
- common.tf.layers.EmbeddingLayer module
- common.tf.layers.FeedForwardNetwork module
- common.tf.layers.FeedForwardNetworkV2 module
- common.tf.layers.Input module
- common.tf.layers.LayerNormalizationLayer module
- common.tf.layers.MaxPool2DLayer module
- common.tf.layers.PoolerLayer module
- common.tf.layers.PoolerLayerV2 module
- common.tf.layers.PositionEmbeddingLayer module
- common.tf.layers.PrePostProcessWrapper module
- common.tf.layers.ReshapeLayer module
- common.tf.layers.SegmentEmbeddingLayer module
- common.tf.layers.SharedWeightsDenseLayer module
- common.tf.layers.SoftmaxLayer module
- common.tf.layers.SquaredErrorLayer module
- common.tf.layers.utils module
- Module contents
- common.tf.metrics package
- Submodules
- common.tf.metrics.accuracy module
- common.tf.metrics.bits_per_x module
- common.tf.metrics.dice_coefficient module
- common.tf.metrics.ece_loss_metric module
- common.tf.metrics.f1_score module
- common.tf.metrics.fbeta_score module
- common.tf.metrics.mcc module
- common.tf.metrics.perplexity module
- common.tf.metrics.rouge_score module
- common.tf.metrics.utils module
- Module contents
- common.tf.model_utils package
- common.tf.optimizers package
Submodules#
common.tf.TFBaseModel module#
Base class for TensorFlow models.
- class common.tf.TFBaseModel.TFBaseModel#
Bases:
abc.ABC
Base class for TensorFlow models. Provides a general model API, consisting of the following methods that must be implemented by child classes:
build_model: builds the model build_total_loss: builds total loss, given model
outputs returned by build_model
build_train_ops: sets up an optimizer and returns asscoiated train ops build_eval_metric_ops: build evaluation metric ops
The __call__ function wraps around build_model.
All TF models must inherit from TFBaseModel and implement __init__ (containing the call to TFBaseModel’s __init__), build_model, build_total_loss, build_train_ops, and build_eval_metric_ops methods.
- Parameters
mixed_precision (bool) – Enable mixed precision, if True.
- __init__(mixed_precision=False)#
- abstract build_eval_metric_ops(model_outputs, labels, features=None)#
Build eval metric ops.
- Parameters
model_outputs – model outputs. returned by build_model
labels – Labels.
features – Input features, optional
- Returns
Eval ops.
- abstract build_model(features, mode)#
Build model.
- Parameters
features – Input features.
mode (tf.estimator.ModeKeys) – Mode (TRAIN, EVAL).
- Returns
Model outputs
- abstract build_total_loss(model_outputs, features, labels, mode)#
Build loss given model outputs.
- Parameters
model_outputs – model outputs. returned by build_model
features – Input features.
labels – Labels.
mode (tf.estimator.ModeKeys) – Mode (TRAIN, EVAL).
- Returns
Total loss tensor.
- abstract build_train_ops(total_loss)#
Setup optimizer and build train ops.
- Parameters
total_loss (Tensor) – The total loss return by __call__
- Returns
Train ops
common.tf.appliance_utils module#
Helper utilities for running on cerebras appliance cluster
- class common.tf.appliance_utils.ExecutionStrategy#
Bases:
object
Represent Cerebras Execution Strategies
- classmethod as_appliance_key(key: str)#
Transform strategy string key to a typed enum.
- pipeline = 'pipeline'#
- classmethod strategies()#
Returns all available strategies.
- weight_streaming = 'weight_streaming'#
- common.tf.appliance_utils.get_debug_args(debug_args_path, debug_ini_path)#
Appliance mode DebugArgs.
- common.tf.appliance_utils.get_debug_mgr_args(debug_ini_fp, debug_args)#
Appliance mode get debug_mgr related handling
- common.tf.appliance_utils.parse_args_and_params(run_dir: str, set_default_params: Optional[Callable] = None) dict #
Parses commandline arguments and returns the params.
- Parameters
run_dir – The root directory where to create the model_dir in.
set_default_params – A callable that updates params with some defaults specific to this model. Defaults to None.
- Returns
Params parsed from cmdline arguments and the params file.
- common.tf.appliance_utils.run_appliance(model_fn: Callable, train_input_fn: Callable, eval_input_fn: Callable, supported_strategies: List[str], default_params_fn: Optional[Callable] = None, stack_params_fn: Optional[Callable] = None, enable_cs_summaries: bool = False)#
Helper method for running models locally or on CS-X Systems.
- Parameters
model_fn – A callable for creating the model.
train_input_fn – A callable for creating a data input pipeline for train.
eval_input_fn – A callable for creating a data input pipeline for eval.
supported_strategies – List of supported execution strategies. If a strategy is not explicitly selected in cmdline args, the default strategy chosen is the first item in this list.
default_params_fn – A callable that takes in the parsed params and sets defaults for missing params.
stack_params_fn – A callable that takes in the parsed params and sets Cerebras-specific config for stack compilation.
enable_summaries – Enable summaries when running on CS-X hardware.
- common.tf.appliance_utils.setup_logging(level: str, logging_dir: Optional[str] = None)#
Sets up the logging verbosity level.
- Parameters
level – The logging level string.
logging_dir – Where to store logs for archival purposes.
- common.tf.appliance_utils.update_debug_args_from_stack_params(debug_args, stack_params_fn: Callable[[dict], dict], params: dict) None #
Gets stack params and encodes them in the give debug args.
- Parameters
debug_args – The debug args in which to inject the stack params.
stack_params_fn – A callable that takes in params and returns a dict of stack params for the model.
params – The parsed model params.
common.tf.run_utils module#
Defining run time utilities of estimator workflow for device-specific execution
- Key functions include:
is_cs: checks whether this is a CS1 runtime environment get_gpu_distribution_strategy: set up GPU distributed training save_params: save params in yaml format in model directory update_params_from_args: update command line arguments into params save_predictions: save predictions from estimator.predict into npy files
- class common.tf.run_utils.ExecutionMode#
Bases:
enum.Enum
An enumeration.
- OutsideCerebras = 3#
- Pipeline = 2#
- WeightStreaming = 1#
- class common.tf.run_utils.GetWeights#
Bases:
object
Class to easily load weights from a checkpoint by name or as iterator.
- __init__(ckpt_path)#
- property var_names#
Variable names contained in the checkpoint
- common.tf.run_utils.check_env(params)#
Perform basic checks for parameters and env
- Parameters
params (dict) – runconfig dict we want to validate
- common.tf.run_utils.create_warm_start_settings(runconfig_params, exclude_string=None)#
Creates warm start settings for estimator.
Does not load any weights that include exclude string. This is useful when fine-tuning pretrained models.
- Parameters
runconfig_params (dict) – runconfig params
exclude_string (str) – any weights with this string in the name will be initialized from scratch instead of coming from the checkpoint.
- Returns
a WarmStartSettings object (or None if no checkpoint_path is provided) to be passed into estimator’s warm_start_from field.
- common.tf.run_utils.dict_to_checkpoint(state_dict, checkpoint_name)#
Saves a dictionary of weight values into a tf Saver style chekcpoint.
- Parameters
state_dict – (Dict[str, np.ndarray]) Collection of weights.
checkpoint_name – (str) Name of the checkpoint file to create.
- Returns
(str) The path to the saved checkpoint.
- common.tf.run_utils.get_csconfig(params)#
Returns CSConfig proto.
- common.tf.run_utils.get_csrunconfig_dict(params)#
- common.tf.run_utils.get_execution_mode()#
- common.tf.run_utils.get_input_checkpoint_steps(model_dir)#
Get the correct input checkpoint steps to run. :param model_dir: Model directory to fetch input checkpoint steps from :type model_dir: str
- Returns
An integer specifying the number of iterations of the data loader to skip.
- common.tf.run_utils.get_params(params_file)#
- common.tf.run_utils.get_predict_directory(model_dir)#
Gets the predict directory within the given model_dir if it exists
- Parameters
model_dir (string) – Directory we want to write to
- common.tf.run_utils.get_weight_dict(ckpt_path)#
Reads TensorFlow checkpoint from specified path and returns the corresponding model’s parameters as a dictionary of variable names to numpy arrays. :param ckpt_path: (str)
Path to TensorFlow checkpoint (prefix).
- Returns
- (dict)
Dictionary of variable names to numpy arrays with corresponding model parameters.
- common.tf.run_utils.is_cs(params)#
Check if the runtime environment is that of a Cerebras System. If yes, return True, else False
For legacy k8s flow, the user does not need to specify cs_ip, since k8s schedule determines which CS system to use internally. When a CS is needed for the run, K8S_CS_IP will be set. This functions returns true if K8S_CS_IP is set.
- Parameters
params (dict) – runconfig dict to provide parameters for check
- common.tf.run_utils.save_params(params, model_dir, fname='params.yaml')#
Writes and saves a dictionary to a file in the model_dir.
- Parameters
params (dict) – dict we want to write to a file in model_dir
model_dir (string) – Directory we want to write to
fname (string) – Name of file in model_dir we want to save to.
- common.tf.run_utils.save_predictions(model_dir, outputs, name='outputs.npz')#
Save outputs in give model_dir to give name, by initializing the predict dir within model_dir
- Parameters
model_dir (string) – Directory we want to write to
outputs (list) – List of dictionaries returned by estimator.predict
name (string) – Name of output, generally in .npy format
- common.tf.run_utils.setup_environment(params)#
Set environment to have determinism and reproducible runs if tf_random_seed is set.
- Parameters
params (dict) – Parameters for execution
- common.tf.run_utils.update_input_checkpoint_steps(params)#
Update the correct input checkpoint steps to run. :param params: Parameters for execution :type params: dict
- Returns
The parameter dictionary modified with the correct number of input steps to skip during execution
- common.tf.run_utils.update_params_from_args(args, params)#
Sets command line arguments from args into params.
- Parameters
args (argparse namespace) – Command line arguments
params (dict) – runconfig dict we want to update