common.tf.model_utils package#

Submodules#

common.tf.model_utils.create_initializer module#

common.tf.model_utils.create_initializer.create_initializer(spec, seed=None)#

Creates the specified initializer.

Parameters
  • spec (dict/str) – either a string indicating the name of the initializer or a dict that includes the name + other params if relevant.

  • seed (int) – random seed for the initializer or None to run unseeded.

Note: Currently seed is removed until TF upgade. passed seed is going to be ignored here if someone passes seed in a call to create_initializer :returns: initializer that can be passed to layers

common.tf.model_utils.gather module#

common.tf.model_utils.gather.gather_last_tokens(hidden_states, input_seq_lens)#

Assuming hidden_states has dimensions [B, S, H], gather the particular hidden states slicing in the sequence dimension at input_seq_len-1, so that it is the hidden state vector after the last time step.

We have to reshape/gather/reshape because there is an issue with using gather_nd through LAIR.

Parameters
  • hidden_states (Tensor) – 3-d tensor (B,S,H).

  • input_seq_lens (Tensor) – 1-d tensor sequence lengths.

Returns

2-d Tensor of last hidden states (B, H).

common.tf.model_utils.reshape_gather module#

common.tf.model_utils.reshape_gather.reshape_gather(inputs, masked_lm_positions, do_reshape=True)#

Gather elements from inputs tensor based on mask provided. :param Tensor inputs: Input to get gathered elements from. :param Tensor masked_lm_positions: Indices to call tf.gather and get elements by. :param bool do_reshape: If set True, the size of the output is 3D tensor of

shape [batch_size, length, hidden_size], otherwise, 2D tensor is returned with shape of [batch_size * length, hidden_size].

Returns

Tensor with gathered elements from inputs tensor.

common.tf.model_utils.scale_loss_value module#

common.tf.model_utils.scale_loss_value.scale_loss_value(loss, label_weights, scale_type, batch_size, output_type=tensorflow.float32)#

Performs different types of scaling of the loss value. :param Tensor loss: The loss value to scale. :param Tensor label_weights:

The mask of labels to use for modes num_masked or num_cls.

Parameters
  • scale_type (str) – Scale type one of num_masked, num_cls, “batch_size” or None.

  • batch_size (int) – Required if scale type is batch_size.

  • output_type (tf.dtype) – Type of the output. If None is specified no type casting is performed.

Returns

The scaled loss value.

common.tf.model_utils.shard_dataset module#

common.tf.model_utils.shard_dataset.shard_dataset(dataset, use_multiple_workers, input_context=None)#

Shard a dataset based on whether we are using a multi-gpu setting or using a Cerebras System with multiple-workers. For single worker scenario on the Cerebras System, it’s best to pass use_multiple_workers as False

Parameters
  • dataset (tf.data.Dataset) – TF dataset to shard

  • use_multiple_workers (bool) – Specifies whether using multiple_workers with the Cerebras System or not

  • input_context (dict) – Given by distributed strategy for training

Returns dataset

Sharded if either input_context or use_multiple_workers

is passed, else just returns the dataset

common.tf.model_utils.vocab_utils module#

common.tf.model_utils.vocab_utils.get_vocab_size(vocab_file, vocab_size=None)#

Function to get vocab size and validate with vocabulary file. :params str vocab_file: Path to vocabulary file. :params int vocab_size: Size of vocabulary file. :returns integer value indicating the size of vocabulary file.

Module contents#