common.tf.layers package#

Submodules#

common.tf.layers.AbstractRecomputeWrapper module#

class common.tf.layers.AbstractRecomputeWrapper.AbstractRecomputeWrapper#

Bases: abc.ABC

Utility functions for the decorator tf.custom_gradient, when used in training.

An abstract class to handle many small requirements when using the decorator tf.custom_gradient. This class is used to recompute the activations during the backward propagation part of a training step. This code acts as a backbone for recompute wrappers and reversible layers.

The following utility functions are designed to make it easy to implement the recomputation:

_set_recomputed_tensor and _check_get_recomputed_tensor.

These functions to attach the recomputed tensors to the corresponding forward pass tensors. These functions are useful for passing the recomputed tensors between, for example, reversible layers, so that we do not need to save any tensors.

_block_recompute_and_gradients.

This function takes a forward block of the computation, recomputes the block, and then calculates and returns the gradients associated with the block.

Scope handling functions

tf.custom_gradient.

This structure names the scopes of the gradients. However, this naming is based on the IdentityN ops it attaches to the portion of the graph for which the user would like to add a custom gradient. This is not always convenient. Moreover, the tf.custom_gradient does not track the appropriate control flow contexts for the variables used in that portion of the graph. The scope handling functions in this class are helpful here.

_get_clean_grad_scope

This function cleans the named scope for clean graphs.

_update_variables_for_context

This function finds the correct variable tensors for the control flow contexts: for example, to use recomputation inside a while-loop).

The basic structure for a recompute layer is as follows:

Define a custom gradient function using tf.custom_gradient inside the __call__ function of a recompute layer.

Inside the __call__ function, call the forward propagation of the layer and define the recompute+gradient function. We recommend you use the _block_recompute_and_gradients function).

CtrlFlowWarnedOnce = False#

abstract call(*args, **kwargs)#

The call function for the layers that use recomputation during backward phase.

This function is wrapped by the __call__ function of this abstract recompute wrapper, and it must be overridden by a child class to implement the forward computation of the layer.

static is_in_while_loop(graph=None)#

Returns True if the specified, or current if unspecified, graph corresponds to a while loop in the forward, backward or cond graph.

Returns: True if the specified, or current if unspecified, graph corresponds to a while loop in the forward, backward or cond graph.
Return type: bool

common.tf.layers.ActivationLayer module#

class common.tf.layers.ActivationLayer.ActivationLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Wrapper around the Keras activation layer.

Also supports activation="GeLU", activation="relu6" and activation="hard_swish" which are currently missing in keras.layers.ActivationLayer v2.2.

Parameters

activation (Union[str, Callable]) – The function to be applied. This can either be callable string name of a Tensorflow built-in activation, or one of "gelu", "lrelu" (lrelu denotes LeakyReLU), "relu6" or "hard_swish".
boundary_casting (bool) – If True, outputs the values in half precision and casts the input values up to full precision.
tf_summary (bool) – If True, saves the activations with the summary_layer.

__init__(activation, boundary_casting=False, tf_summary=False, **kwargs)#

call(inputs, **kwargs)#

Apply the activation layer.

Parameters: inputs – Arbitrary tensor.
Returns: A tensor of the same shape as the input.
Return type: Tensor

static gelu(x)#

static hard_swish(x)#

static relu6(x)#

common.tf.layers.AddLayer module#

class common.tf.layers.AddLayer.AddLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Wrapper around the Keras layer. Adds a list of inputs.

__init__(boundary_casting=False, tf_summary=False, **kwargs)#

call(inputs, **kwargs)#

Apply the AddLayer to sum up a list of inputs.

Parameters: inputs – List of input tensors (at least 2).
Returns: A tensor containing the sum of inputs.
Return type: Tensor

common.tf.layers.AttentionLayer module#

class common.tf.layers.AttentionLayer.AttentionLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Multi-head attention layer. Based on MLCommons model.

Parameters

hidden_size (int) – Number of units in each projection output.
num_heads (int) – Number of attention heads.
use_projection_bias (bool) – Whether to use bias in the key, query, and value projections.
use_ffn_bias (bool) – Whether to use bias in the output projection.
initializer (str) – Projection kernel intializer. Defaults to glorot_uniform.
query_layer_initializer (initializer) – Query kernel initializer. Defaults to None in which case initializer will be used.
key_layer_initializer (initializer) – Key kernel initializer. Defaults to None in which case ``initializer` will be used.
value_layer_initializer (initializer) – Value kernel initializer. Defaults to None in which case initializer will be used.
relative_attention_bias_weight_initializer (initializer) – Relative Attention Bias weight None in which case initializer will be used.
output_layer_initializer (str or initializer) – If not None, use this initializer for the output transform layer. Defaults to None.
kernel_regularizer (Optional[Callable]) – Projection kernel regularizer. Defaults to None.
bias_regularizer (Optional[Callable]) – Projection bias regularizer. Defaults to None.
attention_type (str) – The attention variant to execute. Currently accepts dot_product and scaled_dot_product. Defaults to scaled_dot_product.
dropout_rate (float) – Dropout rate for key-query weights. Defaults to 0.0.
dropout_seed (int) – Seed with which to initialize the dropout layer. Defaults to None.
use_relative_attention_bias (bool) – Whether to use relative position bias when calculating attention.
relative_attention_bias (Tensor) – Tensor with relative attention weights. Shape: [num_relative_attention_buckets, num_heads]. Defaults set to None.
num_relative_attention_buckets (int) – Used to calculate relative position bias when use_relative_attention_bias set to True.
bidirectional_relative_attention (bool) – Whether attention is bidirectional.
softmax_dtype_fp32 (bool) – If True, cast query-key logits to FP32 before sending into softmax calculation in FP32.
boundary_casting (bool) – If True, then outputs the values in half precision and casts the input values up to full precision.
tf_summary (bool) – If True, then saves the activations with summary_layer.

__init__(hidden_size, num_heads, output_projection_size=None, use_projection_bias=False, use_ffn_bias=False, initializer='glorot_uniform', query_layer_initializer=None, key_layer_initializer=None, value_layer_initializer=None, relative_attention_bias_weight_initializer=None, output_layer_initializer=None, kernel_regularizer=None, bias_regularizer=None, attention_type='scaled_dot_product', dropout_rate=0.0, dropout_seed=None, use_relative_attention_bias=False, relative_attention_bias=None, num_relative_attention_buckets=32, bidirectional_relative_attention=False, softmax_dtype_fp32=True, boundary_casting=False, tf_summary=False, **kwargs)#

build(input_shape)#

call(q, v, mask=None, past_kv=None, cache_present_kv=False, training=True, position_bias=None, cache_position_bias=False)#

Applies the attention mechanism to queries q and values v. Keys will be set to be same as v.

Parameters

q (Tensor) – Queries, shape [batch_size, seq_length, hidden_size].
v (Tensor) – Values, shape [batch_size, seq_length, hidden_size].
mask (Tensor) – Attention mask. Can be 2D of shape [batch_size, seq_length], or 3D of shape [batch, query_length, seq_length].
past_kv (Tensor) – Past keys and values. Has shape [2, batch_size, num_heads, seq_length, hidden_size / num_heads]. The tensors in [0,:,:,:,:] and [1,:,:,:,:] contain the past keys and values, respectively. Defaults to None.
cache_present_kv (bool) – Specifies if the present keys and values must be cached and returned. Needed to speed up the computations when the decoder is called within an autoregressive loop. Defaults to False.
training (bool) – Training the model if True. Needed to call the dropout (after softmax) in the appropriate mode.
position_bias (Tensor) – Tensor containing position bias to apply in attention.
cache_position_bias (bool) – Specifies if position bias must be cached and returned. Needed to speed up the computations when the decoder is called within an autoregressive loop. Defaults to False.

Returns

when cache_present_kv is True and cache_position_bias is True, returns a tuple, where the 0th entry contains the attention output, 1st entry contains a tensor of keys and values computed at the current application of the attention layer, and the 3rd entry contains a tensor of position bias computed at the current application of the attention layer.

If cache_present_kv is False, no entry for present keys and values is provided.

If cache_position_bias is False, no entry for position bias is provided.

if both cache_present_kv cache_position_bias are set to False, return a tensor of shape equal to shape of past_kv (see above).

class common.tf.layers.AttentionLayer.SelfAttentionLayer#

Bases: common.tf.layers.AttentionLayer.AttentionLayer

Multiheaded self-attention layer.

call(x, mask=None, past_kv=None, cache_present_kv=False, training=True, position_bias=None, cache_position_bias=False)#

Applies the attention mechanism to queries q and values v. Keys will be set to be same as v.

Parameters

q (Tensor) – Queries, shape [batch_size, seq_length, hidden_size].
v (Tensor) – Values, shape [batch_size, seq_length, hidden_size].
mask (Tensor) – Attention mask. Can be 2D of shape [batch_size, seq_length], or 3D of shape [batch, query_length, seq_length].
past_kv (Tensor) – Past keys and values. Has shape [2, batch_size, num_heads, seq_length, hidden_size / num_heads]. The tensors in [0,:,:,:,:] and [1,:,:,:,:] contain the past keys and values, respectively. Defaults to None.
cache_present_kv (bool) – Specifies if the present keys and values must be cached and returned. Needed to speed up the computations when the decoder is called within an autoregressive loop. Defaults to False.
training (bool) – Training the model if True. Needed to call the dropout (after softmax) in the appropriate mode.
position_bias (Tensor) – Tensor containing position bias to apply in attention.
cache_position_bias (bool) – Specifies if position bias must be cached and returned. Needed to speed up the computations when the decoder is called within an autoregressive loop. Defaults to False.

Returns

If cache_present_kv is False, no entry for present keys and values is provided.

If cache_position_bias is False, no entry for position bias is provided.

if both cache_present_kv cache_position_bias are set to False, return a tensor of shape equal to shape of past_kv (see above).

common.tf.layers.BaseLayer module#

class common.tf.layers.BaseLayer.BaseLayer#

Bases: tensorflow.keras.layers.Layer

Base layer for the reference models.

Parameters

boundary_casting (bool) – If True, outputs the values in half precision and casts the input values up to full precision.
tf_summary (bool) – If True, saves the activations with summary_layer.

__init__(boundary_casting=False, tf_summary=False, **kwargs)#

call()#

common.tf.layers.Conv2DLayer module#

class common.tf.layers.Conv2DLayer.Conv2DLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Wrapper around the Keras 2D convolution layer.

__init__(filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, boundary_casting=False, tf_summary=False, **kwargs)#

call(inputs, **kwargs)#

Apply the 2D convolution layer.

Parameters: inputs – A 4D tensor with shape: (samples, channels, rows, cols) if data_format='channels_first' or a 4D tensor with shape (samples, rows, cols, channels) if data_format='channels_last'.
Returns: A 4D tensor with shape: (samples, filters, new_rows, new_cols) if data_format='channels_first' or a 4D tensor with shape: (samples, new_rows, new_cols, filters) if data_format='channels_last'. Note that rows and cols values might have changed due to padding.
Return type: Tensor

common.tf.layers.Conv2DTransposeLayer module#

class common.tf.layers.Conv2DTransposeLayer.Conv2DTransposeLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Wrapper around the Keras 2D transposed convolution layer.

__init__(filters, kernel_size, strides=(1, 1), padding='valid', output_padding=None, data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, boundary_casting=False, tf_summary=False, **kwargs)#

call(inputs, **kwargs)#

Apply the 2D transposed convolution layer.

Parameters: inputs – A 4D tensor with shape: (samples, channels, rows, cols) if data_format='channels_first' or 4D tensor with shape: (samples, rows, cols, channels) if data_format='channels_last'.
Returns: A 4D tensor with shape: (samples, filters, new_rows, new_cols) if data_format='channels_first' or a 4D tensor with shape: (samples, new_rows, new_cols, filters) if data_format='channels_last'. Note that rows and cols values might have changed due to padding.
Return type: Tensor

common.tf.layers.CrossEntropyFromLogitsLayer module#

class common.tf.layers.CrossEntropyFromLogitsLayer.CrossEntropyFromLogitsLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Cross entropy loss, given logits. Compares logits against labels.

Parameters

boundary_casting (bool) –
tf_summary (bool) –

__init__(boundary_casting=False, tf_summary=False, **kwargs)#

call(labels, logits)#

Calculating cross entropy over logits.

Parameters

labels (Tensor) – Label indices.
logits (Tensor) – Logits (non-normalized).

Returns

A tensor of the same shape as labels and of the same type as logits with the softmax cross entropy loss.

Return type

Tensor

common.tf.layers.DenseLayer module#

class common.tf.layers.DenseLayer.DenseLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Wrapper around the Keras densely-connected layer. Provides support for "gelu" activation.

Parameters

units (int) – Number of units in the layer output.
activation (Optional[Union[str, Callable]]) – If not None, an activation function to be applied after the dense layer. The activation function can either be a callable string name of a Tensorflow built-in activation, or "gelu".
use_bias (bool) – Whether to use bias.
kernel_initializer (str) – Kernel intializer. Defaults to "glorot_uniform".
kernel_initializer – Bias intializer. Defaults to "zeros".
kernel_regularizer (Optional[Callable]) – Kernel regularizer. Defaults to None.
bias_regularizer (Optional[Callable]) – Bias regularizer. Defaults to None.
activity_regularizer (Optional[Callable]) – Activity (output activation) regularizer. Defaults to None.
kernel_constraint (Optional[Callable]) – Kernel constraint. Defaults to None.
bias_constraint (Optional[Callable]) – Bias constraint. Defaults to None.
boundary_casting (bool) – If True, outputs the values in half precision and casts the input values up to full precision.
tf_summary (bool) – If True, saves the activations with summary_layer.

__init__(units, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, boundary_casting=False, tf_summary=False, **kwargs)#

call(inputs, **kwargs)#

Apply the densely-connected layer.

Parameters: inputs (Tensor) – An N-D tensor with shape: (batch_size, ..., input_dim).
Returns: An N-D tensor with shape: (batch_size, ..., units).
Return type: Tensor

common.tf.layers.DropoutLayer module#

class common.tf.layers.DropoutLayer.DropoutLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Wrapper around the Keras dropout layer.

__init__(rate, noise_shape=None, seed=None, boundary_casting=False, tf_summary=False, **kwargs)#

call(inputs, training=True, **kwargs)#

Performs the dropout.

Parameters

inputs (Tensor) – Arbitrary tensor.
training (bool) – Training mode if set to True.

Returns

A tensor of same shape as input.

Return type

Tensor

common.tf.layers.EmbeddingLayer module#

class common.tf.layers.EmbeddingLayer.EmbeddingLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Embedding layer. Built on top of the Keras Embedding layer.

__init__(input_dim, output_dim, embeddings_initializer='uniform', bias_initializer='zeros', embeddings_regularizer=None, activity_regularizer=None, embeddings_constraint=None, mask_zero=False, input_length=None, use_bias=False, weight_name='embedding_weights', boundary_casting=False, tf_summary=False, **kwargs)#

build(input_shape)#

call(inputs, pad_id=- 1, scale=1)#

Get token embeddings of inputs.

Parameters

inputs (Tensor) – A tensor with shape [batch_size, length].
pad_id – Integer specifying which input ID corresponds instead to padding. It does not need to be a legal vocabulary entry. Any `inputs` elements equal to this value will not be looked up, but instead directly output zeros. On the Wafer Scale Engine, this indicates the presence of variable sequence length.
scale – Scaling of the embedding (in MLPERF hidden_size**0.5 is used).

Returns

A tensor of embeddings with shape [batch_size, length, hidden_size]. Padded positions are filled with zeros.

Return type

embeddings (Tensor)

embedding_table()#

common.tf.layers.FeedForwardNetwork module#

class common.tf.layers.FeedForwardNetwork.FeedForwardNetwork#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

A feed forward network that consists of a stack of fully connected layers.

Parameters

layers_units (int) – List of units for each layer.
layers_activation (str) – List of activation types (str) for each layer.
layers_dropout_rates (float) – List of dropout rates (float) for each layer.
use_bias (bool) – If True, use bias throughout all layers.
kernel_initializer (string) – Kernel initializer. Defaults to "glorot_uniform".
bias_initializer (callable) – Bias initializer. Defaults to "zeros".
output_layer_initializer – If not None, initialize the last projection layer with this initializer. Defaults to None.
kernel_regularizer (callable) – Kernel regularizer.
bias_initializer – Bias regularizer.
dropout_seed (int) – Seed with which to initialize the dropout layer. Defaults to None.

Initialize the FFN object instance.

__init__(layers_units, layers_activation=None, layers_dropout_rates=None, use_bias=False, kernel_initializer='glorot_uniform', bias_initializer='zeros', output_layer_initializer=None, kernel_regularizer=None, bias_regularizer=None, dropout_seed=None, boundary_casting=False, tf_summary=False, **kwargs)#: Initialize the FFN object instance.

call(inputs, training=True, **kwargs)#

common.tf.layers.FeedForwardNetworkV2 module#

class common.tf.layers.FeedForwardNetworkV2.FeedForwardNetworkV2#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Implement a feed forward network as used in the T5 model.

Setup the FFN components

Parameters

d_ff (int) – The hidden dimension of the feed forward network, i.e. the output dimension of the first layer.
d_model (int) – The output dimension of the feed forward network.
activation (string) – The name of the activation to apply after the first dense layer.
dropout_rate (float) – Dropout rate applied after the first dense layer.
use_bias (bool) – Whether or not to use bias in the dense layers of the feed forward network.
input_layer_initializer (initializer) – A string or initializer to use to initialize the weights of the first dense layer.
output_layer_initializer (initializer) – A string or initializer to use to initialize the weights of the second dense layer.
dropout_seed (int) – The seed to make the dropout layer deterministic.
**kwargs –
Keyword arguments to be passed into BaseLayer.

__init__(d_ff, d_model, activation='relu', dropout_rate=0.0, use_bias=False, input_layer_initializer='glorot_uniform', output_layer_initializer='glorot_uniform', dropout_seed=None, **kwargs)#

Setup the FFN components

Parameters

d_ff (int) – The hidden dimension of the feed forward network, i.e. the output dimension of the first layer.
d_model (int) – The output dimension of the feed forward network.
activation (string) – The name of the activation to apply after the first dense layer.
dropout_rate (float) – Dropout rate applied after the first dense layer.
use_bias (bool) – Whether or not to use bias in the dense layers of the feed forward network.
input_layer_initializer (initializer) – A string or initializer to use to initialize the weights of the first dense layer.
output_layer_initializer (initializer) – A string or initializer to use to initialize the weights of the second dense layer.
dropout_seed (int) – The seed to make the dropout layer deterministic.
**kwargs –
Keyword arguments to be passed into BaseLayer.

call(inputs, training=True, **kwargs)#

common.tf.layers.Input module#

common.tf.layers.Input.SetupInputTensor(features, tf_summary=False)#

Adds tensor summary to the model’s input features and their gradient, if tf_summary is set to True.

Parameters: features – The input features.

common.tf.layers.LayerNormalizationLayer module#

class common.tf.layers.LayerNormalizationLayer.LayerNormalizationLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Wrapper around the Keras layer normalization. Reference: Layer Normalization.

__init__(axis=- 1, epsilon=1e-08, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, trainable=True, boundary_casting=False, tf_summary=False, **kwargs)#

call(inputs, **kwargs)#

Apply the layer normalization.

Parameters

inputs (Tensor) – Arbitrary tensor.

Returns

A normalized tensor of the same shape as input.

NOTE: While **kwargs are passed, the training arg is never used.

Return type

Tensor

common.tf.layers.MaxPool2DLayer module#

class common.tf.layers.MaxPool2DLayer.MaxPool2DLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Wrapper around the Keras 2D max pooling layer.

__init__(pool_size=(2, 2), strides=None, padding='valid', data_format=None, boundary_casting=False, tf_summary=False, **kwargs)#

call(inputs, **kwargs)#

Applies the 2D max pooling layer.

Parameters: inputs (Tensor) – A 4D tensor with the shape: (samples, channels, rows, cols) if data_format='channels_first' or a 4D tensor with the shape (samples, rows, cols, channels) if data_format='channels_last'.
Returns: A 4D tensor with the shape: (batch_size, channels, pooled_rows, pooled_cols) if data_format='channels_first' or a 4D tensor with shape: (batch_size, pooled_rows, pooled_cols, channels) if data_format='channels_last'.
Return type: Tensor

common.tf.layers.PoolerLayer module#

class common.tf.layers.PoolerLayer.PoolerLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

The pooler layer.

Currently supports the following pooler types:

"mean": Mean reduction.

"max": Max reduction.

"first": First slice in the axis dimension.

"last": Last slice in the axis dimension.

"sum": Takes the sum over the axis dimension. Defaults to the entire Tensor.

None: No pooling (output=input).

__init__(pooler_type='mean', axis=None, boundary_casting=False, tf_summary=False, **kwargs)#

call(inputs, padding_mask=None, **kwargs)#

Apply pooler of a given type.

Takes in a padding mask with 1s for tokens and 0s for padding.

common.tf.layers.PoolerLayerV2 module#

class common.tf.layers.PoolerLayerV2.PoolerLayerV2#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

The pooler layer. Usually used for pooling or summarizing the sequence data.

This layer is added as a workaround to the existing pooler layer for additional masking support. The plan is to use this layer for kernel matching and integ bring up. After we have full support for this layer, we should deprecate the old PoolerLayer.

Parameters

pooler_type (str) – Type of pooling. Currently supports the following
types (pooler) –
- "mean": Mean reduction.
- "max": Max reduction.
- "first": First slice in the axis dimension.
- "last": Last slice in the axis dimension (Not yet supported)
- "sum": Takes the sum over the axis dimension. Defaults to the entire Tensor.
axis (int) – The dimensions to reduce. If None (the default), reduces all dimensions.
boundary_casting (bool) – If True, outputs the values in half precision and casts the input values up to full precision.
tf_summary (bool) – If True, saves the activations with summary_layer.

__init__(pooler_type, axis=None, boundary_casting=False, tf_summary=False, **kwargs)#

call(inputs, padding_mask=None)#

Apply pooling with optional masking.

Parameters

inputs (Tensor) – Input tensor.
padding_mask (Tensor) – The padding mask tensor. Assumed to be 1-based, i.e., has 1 in the non-padded positions and 0 elsewhere. If the input tensor is of the shape [d0, d1, ..., d_{k-1}, d_{axis}, d_{k+1}, ... d_n], then the padding_mask must have the shape [d0, d1, ..., d_{k-1}, axis] or [d0, d1, ..., d_{k-1}, axis, 1, ..., 1]. If None (the default), a padding mask of all 1’s is used.

common.tf.layers.PositionEmbeddingLayer module#

class common.tf.layers.PositionEmbeddingLayer.PositionEmbeddingLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Implementation of the position embedding layer.

Adds positional information to the token embedding provided as input. Supports 'fixed' and 'learned' positional embeddings.

Parameters

max_position_embeddings (int) – Maximum sequence length to train using the model. If None, set to the input sequence length.
embedding_type (str) –
Options are 'learned' or 'fixed'.
- Learned: Trainable weights for embeddings.
- Fixed: Fixed weights for embeddings.
embeddings_initializer (callable) – Embeddings initializer.
embeddings_regularizer (callable) – Embeddings regularizer.
boundary_casting (bool) – See the documentation for BaseLayer.
tf_summary – See the documentation for BaseLayer.
**kwargs – Additional keyword arguments for BaseLayer.

__init__(max_position_embeddings=None, embedding_type='fixed', embeddings_initializer='uniform', embeddings_regularizer=None, boundary_casting=False, tf_summary=False, **kwargs)#

build(input_shape)#

call(inputs, position_ids=None)#

Add position embeddings to the inputs.

Parameters

inputs (Tensor) – Input of the size [batch_size, seq_len, embedding_size].
position_ids (Tensor) – Position IDs of the inputs.A 1D tensor of size seq_len. If None (default), assumes that corresponds to [0, 1, ..., seq_len-1].

setup_fixed_position_embedding(length, channels, min_timescale=1.0, max_timescale=10000.0)#

Adds several sinusoids of different frequencies to a Tensor.

Each channel of the input Tensor is incremented by a sinusoid of a different frequency and phase.

This allows the attention to learn to use absolute and relative positions. Timing signals should be added to some precursors of both the query and the memory inputs to the attention.

The use of relative position is possible because sin(x+y) and cos(x+y) can be expressed in terms of y, sin(x) and cos(x).

In specific, this function uses a geometric sequence of timescales starting with min_timescale and ending with max_timescale. The number of different timescales is equal to channels / 2. For each timescale, this function generates the two sinusoidal signals sin(timestep/timescale) and cos(timestep/timescale). All these sinusoids are concatenated in the channels dimension.

Parameters

min_timescale (float) –
max_timescale (float) –

Returns

A tensor of the shape [length, channels]. Based on _get_timing_signal_1d.

Return type

Tensor

common.tf.layers.PrePostProcessWrapper module#

class common.tf.layers.PrePostProcessWrapper.PrePostProcessWrapper#

Bases: object

Helper class that allows for a flexible specification of pre- and post-process operations wrapped around a given layer.

__init__(layer, pre_process_config=[], post_process_config=[], **kwargs)#

common.tf.layers.ReshapeLayer module#

class common.tf.layers.ReshapeLayer.ReshapeLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Wrapper around the Keras layer that reshapes the input.

__init__(target_shape, boundary_casting=False, tf_summary=False, **kwargs)#

call(input, **kwargs)#

Apply the reshape layer to an input.

Parameters: inputs (Tensor) – A tensor.
Returns: The tensor after reshape.
Return type: Tensor

common.tf.layers.SegmentEmbeddingLayer module#

class common.tf.layers.SegmentEmbeddingLayer.SegmentEmbeddingLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Segment embedding layer. Adds segment information. For example, to which sentence the token belongs when an input sequence contains multiple sentences, such as two in the case of BERT model, to the token embedding provided as input.

Parameters

num_segments (int) – Number of encoded segments.
embeddings_regularizer (callable) – Embeddings regularizer.

__init__(num_segments=2, embeddings_initializer='uniform', embeddings_regularizer=None, boundary_casting=False, tf_summary=False, **kwargs)#

build(input_shape)#

call(inputs, segment_ids)#

Add segment embedding to inputs.

Parameters

inputs – Tensor of input embeddings.
segment_ids – Segment IDs.

common.tf.layers.SharedWeightsDenseLayer module#

class common.tf.layers.SharedWeightsDenseLayer.SharedWeightsDenseLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Dense layer that takes in a kernel as a shared weight. Can also optionally add a bias.

__init__(units, activation=None, use_bias=True, bias_initializer='zeros', bias_regularizer=None, bias_constraint=None, boundary_casting=False, tf_summary=False, *args, **kwargs)#

call(inputs, kernel, transpose_kernel=True, **kwargs)#

Apply the densely-connected layer.

Parameters

inputs (Tensor) – An N-D tensor with the shape: (batch_size, ..., input_dim).
kernel (Tensor) – A 2-D tensor with the shape: (units, input_dim). The dense kernel.
transpose_kernel (bool) – Whether to transpose the kernel when performing tf.matmul(inputs, kernel).

Returns

An N-D tensor with shape: (batch_size, ..., units).

Return type

Tensor

common.tf.layers.SoftmaxLayer module#

class common.tf.layers.SoftmaxLayer.SoftmaxLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Wrapper around the Keras softmax layer.

__init__(axis=- 1, boundary_casting=False, tf_summary=False, **kwargs)#

call(inputs, **kwargs)#

Performs the softmax.

Parameters: inputs – Arbitrary tensor.
Returns: A tensor of the same shape as input.
Return type: Tensor

common.tf.layers.SquaredErrorLayer module#

class common.tf.layers.SquaredErrorLayer.SquaredErrorLayer#

Bases: modelzoo.common.tf.layers.BaseLayer.BaseLayer

Squared error between prediction and labels.

Parameters

boundary_casting (bool) – If True, outputs the values in half precision and casts the input values up to full precision.
tf_summary (bool) – If True, saves the activations with summary_layer.

__init__(boundary_casting=False, tf_summary=False, **kwargs)#

call(labels, pred)#

Calculates the squared error between prediction and labels.

Parameters

labels (Tensor) – Labels.
pred (Tensor) – Predictions (same shape as labels).

Returns

Loss tensor of the same shape and type as pred.

Return type

Tensor

common.tf.layers.utils module#

Module contents#

common.tf.input package

common.tf.metrics package