Submodules# module#, seed=None)#

Creates the specified initializer.

  • spec (dict/str) – either a string indicating the name of the initializer or a dict that includes the name + other params if relevant.

  • seed (int) – random seed for the initializer or None to run unseeded.

Note: Currently seed is removed until TF upgade. passed seed is going to be ignored here if someone passes seed in a call to create_initializer :returns: initializer that can be passed to layers module#, input_seq_lens)#

Assuming hidden_states has dimensions [B, S, H], gather the particular hidden states slicing in the sequence dimension at input_seq_len-1, so that it is the hidden state vector after the last time step.

We have to reshape/gather/reshape because there is an issue with using gather_nd through LAIR.

  • hidden_states (Tensor) – 3-d tensor (B,S,H).

  • input_seq_lens (Tensor) – 1-d tensor sequence lengths.


2-d Tensor of last hidden states (B, H). module#, masked_lm_positions, do_reshape=True)#

Gather elements from inputs tensor based on mask provided. :param Tensor inputs: Input to get gathered elements from. :param Tensor masked_lm_positions: Indices to call tf.gather and get elements by. :param bool do_reshape: If set True, the size of the output is 3D tensor of

shape [batch_size, length, hidden_size], otherwise, 2D tensor is returned with shape of [batch_size * length, hidden_size].


Tensor with gathered elements from inputs tensor. module#, label_weights, scale_type, batch_size, output_type=tensorflow.float32)#

Performs different types of scaling of the loss value. :param Tensor loss: The loss value to scale. :param Tensor label_weights:

The mask of labels to use for modes num_masked or num_cls.

  • scale_type (str) – Scale type one of num_masked, num_cls, “batch_size” or None.

  • batch_size (int) – Required if scale type is batch_size.

  • output_type (tf.dtype) – Type of the output. If None is specified no type casting is performed.


The scaled loss value. module#, use_multiple_workers, input_context=None)#

Shard a dataset based on whether we are using a multi-gpu setting or using a Cerebras System with multiple-workers. For single worker scenario on the Cerebras System, it’s best to pass use_multiple_workers as False

  • dataset ( – TF dataset to shard

  • use_multiple_workers (bool) – Specifies whether using multiple_workers with the Cerebras System or not

  • input_context (dict) – Given by distributed strategy for training

Returns dataset

Sharded if either input_context or use_multiple_workers

is passed, else just returns the dataset module#, vocab_size=None)#

Function to get vocab size and validate with vocabulary file. :params str vocab_file: Path to vocabulary file. :params int vocab_size: Size of vocabulary file. :returns integer value indicating the size of vocabulary file.

Module contents#