common.tf.model_utils package#
Submodules#
common.tf.model_utils.create_initializer module#
- common.tf.model_utils.create_initializer.create_initializer(spec, seed=None)#
Creates the specified initializer.
- Parameters
spec (dict/str) – either a string indicating the name of the initializer or a dict that includes the name + other params if relevant.
seed (int) – random seed for the initializer or None to run unseeded.
Note: Currently seed is removed until TF upgade. passed seed is going to be ignored here if someone passes seed in a call to create_initializer :returns: initializer that can be passed to layers
common.tf.model_utils.gather module#
- common.tf.model_utils.gather.gather_last_tokens(hidden_states, input_seq_lens)#
Assuming hidden_states has dimensions [B, S, H], gather the particular hidden states slicing in the sequence dimension at input_seq_len-1, so that it is the hidden state vector after the last time step.
We have to reshape/gather/reshape because there is an issue with using gather_nd through LAIR.
- Parameters
hidden_states (Tensor) – 3-d tensor (B,S,H).
input_seq_lens (Tensor) – 1-d tensor sequence lengths.
- Returns
2-d Tensor of last hidden states (B, H).
common.tf.model_utils.reshape_gather module#
- common.tf.model_utils.reshape_gather.reshape_gather(inputs, masked_lm_positions, do_reshape=True)#
Gather elements from inputs tensor based on mask provided. :param Tensor inputs: Input to get gathered elements from. :param Tensor masked_lm_positions: Indices to call tf.gather and get elements by. :param bool do_reshape: If set True, the size of the output is 3D tensor of
shape [batch_size, length, hidden_size], otherwise, 2D tensor is returned with shape of [batch_size * length, hidden_size].
- Returns
Tensor with gathered elements from inputs tensor.
common.tf.model_utils.scale_loss_value module#
- common.tf.model_utils.scale_loss_value.scale_loss_value(loss, label_weights, scale_type, batch_size, output_type=tensorflow.float32)#
Performs different types of scaling of the loss value. :param Tensor loss: The loss value to scale. :param Tensor label_weights:
The mask of labels to use for modes num_masked or num_cls.
- Parameters
scale_type (str) – Scale type one of num_masked, num_cls, “batch_size” or None.
batch_size (int) – Required if scale type is batch_size.
output_type (tf.dtype) – Type of the output. If None is specified no type casting is performed.
- Returns
The scaled loss value.
common.tf.model_utils.shard_dataset module#
- common.tf.model_utils.shard_dataset.shard_dataset(dataset, use_multiple_workers, input_context=None)#
Shard a dataset based on whether we are using a multi-gpu setting or using a Cerebras System with multiple-workers. For single worker scenario on the Cerebras System, it’s best to pass use_multiple_workers as False
- Parameters
dataset (tf.data.Dataset) – TF dataset to shard
use_multiple_workers (bool) – Specifies whether using multiple_workers with the Cerebras System or not
input_context (dict) – Given by distributed strategy for training
- Returns dataset
Sharded if either input_context or use_multiple_workers
is passed, else just returns the dataset
common.tf.model_utils.vocab_utils module#
- common.tf.model_utils.vocab_utils.get_vocab_size(vocab_file, vocab_size=None)#
Function to get vocab size and validate with vocabulary file. :params str vocab_file: Path to vocabulary file. :params int vocab_size: Size of vocabulary file. :returns integer value indicating the size of vocabulary file.