common.tf.layers package#
Submodules#
common.tf.layers.AbstractRecomputeWrapper module#
- class common.tf.layers.AbstractRecomputeWrapper.AbstractRecomputeWrapper#
Bases:
abc.ABC
Utility functions for the decorator tf.custom_gradient, when used in training.
An abstract class to handle many small requirements when using the decorator tf.custom_gradient. This class is used to recompute the activations during the backward propagation part of a training step. This code acts as a backbone for recompute wrappers and reversible layers.
The following utility functions are designed to make it easy to implement the recomputation:
_set_recomputed_tensor
and_check_get_recomputed_tensor
.These functions to attach the recomputed tensors to the corresponding forward pass tensors. These functions are useful for passing the recomputed tensors between, for example, reversible layers, so that we do not need to save any tensors.
_block_recompute_and_gradients
.This function takes a forward block of the computation, recomputes the block, and then calculates and returns the gradients associated with the block.
Scope handling functions
tf.custom_gradient
.This structure names the scopes of the gradients. However, this naming is based on the
IdentityN
ops it attaches to the portion of the graph for which the user would like to add a custom gradient. This is not always convenient. Moreover, thetf.custom_gradient
does not track the appropriate control flow contexts for the variables used in that portion of the graph. The scope handling functions in this class are helpful here._get_clean_grad_scope
This function cleans the named scope for clean graphs.
_update_variables_for_context
This function finds the correct variable tensors for the control flow contexts: for example, to use recomputation inside a while-loop).
The basic structure for a recompute layer is as follows:
Define a custom gradient function using
tf.custom_gradient
inside the__call__
function of a recompute layer.Inside the
__call__
function, call the forward propagation of the layer and define the recompute+gradient function. We recommend you use the_block_recompute_and_gradients
function).
- CtrlFlowWarnedOnce = False#
- abstract call(*args, **kwargs)#
The call function for the layers that use recomputation during backward phase.
This function is wrapped by the
__call__
function of this abstract recompute wrapper, and it must be overridden by a child class to implement the forward computation of the layer.
- static is_in_while_loop(graph=None)#
Returns
True
if the specified, or current if unspecified, graph corresponds to awhile
loop in the forward, backward or cond graph.- Returns
True
if the specified, or current if unspecified, graph corresponds to awhile
loop in the forward, backward or cond graph.- Return type
bool
common.tf.layers.ActivationLayer module#
- class common.tf.layers.ActivationLayer.ActivationLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Wrapper around the Keras activation layer.
Also supports
activation="GeLU"
,activation="relu6"
andactivation="hard_swish"
which are currently missing inkeras.layers.ActivationLayer
v2.2.- Parameters
activation (Union[str, Callable]) – The function to be applied. This can either be callable string name of a Tensorflow built-in activation, or one of
"gelu"
,"lrelu"
(lrelu
denotes LeakyReLU),"relu6"
or"hard_swish"
.boundary_casting (bool) – If
True
, outputs the values in half precision and casts the input values up to full precision.tf_summary (bool) – If
True
, saves the activations with thesummary_layer
.
- __init__(activation, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Apply the activation layer.
- Parameters
inputs – Arbitrary tensor.
- Returns
A tensor of the same shape as the input.
- Return type
Tensor
- static gelu(x)#
- static hard_swish(x)#
- static relu6(x)#
common.tf.layers.AddLayer module#
- class common.tf.layers.AddLayer.AddLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Wrapper around the Keras layer. Adds a list of inputs.
- __init__(boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Apply the
AddLayer
to sum up a list of inputs.- Parameters
inputs – List of input tensors (at least 2).
- Returns
A tensor containing the sum of inputs.
- Return type
Tensor
common.tf.layers.AttentionLayer module#
- class common.tf.layers.AttentionLayer.AttentionLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Multi-head attention layer. Based on MLCommons model.
- Parameters
hidden_size (int) – Number of units in each projection output.
num_heads (int) – Number of attention heads.
use_projection_bias (bool) – Whether to use bias in the key, query, and value projections.
use_ffn_bias (bool) – Whether to use bias in the output projection.
initializer (str) – Projection kernel intializer. Defaults to
glorot_uniform
.query_layer_initializer (initializer) – Query kernel initializer. Defaults to None in which case
initializer
will be used.key_layer_initializer (initializer) – Key kernel initializer. Defaults to None in which case ``initializer` will be used.
value_layer_initializer (initializer) – Value kernel initializer. Defaults to None in which case
initializer
will be used.relative_attention_bias_weight_initializer (initializer) – Relative Attention Bias weight None in which case
initializer
will be used.output_layer_initializer (str or initializer) – If not None, use this initializer for the output transform layer. Defaults to None.
kernel_regularizer (Optional[Callable]) – Projection kernel regularizer. Defaults to
None
.bias_regularizer (Optional[Callable]) – Projection bias regularizer. Defaults to
None
.attention_type (str) – The attention variant to execute. Currently accepts
dot_product
andscaled_dot_product
. Defaults toscaled_dot_product
.dropout_rate (float) – Dropout rate for key-query weights. Defaults to 0.0.
dropout_seed (int) – Seed with which to initialize the dropout layer. Defaults to
None
.use_relative_attention_bias (bool) – Whether to use relative position bias when calculating attention.
relative_attention_bias (Tensor) – Tensor with relative attention weights. Shape: [num_relative_attention_buckets, num_heads]. Defaults set to None.
num_relative_attention_buckets (int) – Used to calculate relative position bias when use_relative_attention_bias set to True.
bidirectional_relative_attention (bool) – Whether attention is bidirectional.
softmax_dtype_fp32 (bool) – If
True
, cast query-key logits to FP32 before sending into softmax calculation in FP32.boundary_casting (bool) – If
True
, then outputs the values in half precision and casts the input values up to full precision.tf_summary (bool) – If
True
, then saves the activations withsummary_layer
.
- __init__(hidden_size, num_heads, output_projection_size=None, use_projection_bias=False, use_ffn_bias=False, initializer='glorot_uniform', query_layer_initializer=None, key_layer_initializer=None, value_layer_initializer=None, relative_attention_bias_weight_initializer=None, output_layer_initializer=None, kernel_regularizer=None, bias_regularizer=None, attention_type='scaled_dot_product', dropout_rate=0.0, dropout_seed=None, use_relative_attention_bias=False, relative_attention_bias=None, num_relative_attention_buckets=32, bidirectional_relative_attention=False, softmax_dtype_fp32=True, boundary_casting=False, tf_summary=False, **kwargs)#
- build(input_shape)#
- call(q, v, mask=None, past_kv=None, cache_present_kv=False, training=True, position_bias=None, cache_position_bias=False)#
Applies the attention mechanism to queries
q
and valuesv
. Keys will be set to be same asv
.- Parameters
q (Tensor) – Queries, shape
[batch_size, seq_length, hidden_size]
.v (Tensor) – Values, shape
[batch_size, seq_length, hidden_size]
.mask (Tensor) – Attention mask. Can be 2D of shape
[batch_size, seq_length]
, or 3D of shape[batch, query_length, seq_length]
.past_kv (Tensor) – Past keys and values. Has shape
[2, batch_size, num_heads, seq_length, hidden_size / num_heads]
. The tensors in[0,:,:,:,:]
and[1,:,:,:,:]
contain the past keys and values, respectively. Defaults toNone
.cache_present_kv (bool) – Specifies if the present keys and values must be cached and returned. Needed to speed up the computations when the decoder is called within an autoregressive loop. Defaults to
False
.training (bool) – Training the model if
True
. Needed to call thedropout
(after softmax) in the appropriate mode.position_bias (Tensor) – Tensor containing position bias to apply in attention.
cache_position_bias (bool) – Specifies if position bias must be cached and returned. Needed to speed up the computations when the decoder is called within an autoregressive loop. Defaults to
False
.
- Returns
when
cache_present_kv
isTrue
andcache_position_bias
isTrue
, returns a tuple, where the 0th entry contains the attention output, 1st entry contains a tensor of keys and values computed at the current application of the attention layer, and the 3rd entry contains a tensor of position bias computed at the current application of the attention layer.If
cache_present_kv
isFalse
, no entry for present keys and values is provided.If
cache_position_bias
isFalse
, no entry for position bias is provided.if both
cache_present_kv
cache_position_bias
are set to False, return a tensor of shape equal to shape ofpast_kv
(see above).
- class common.tf.layers.AttentionLayer.SelfAttentionLayer#
Bases:
common.tf.layers.AttentionLayer.AttentionLayer
Multiheaded self-attention layer.
- call(x, mask=None, past_kv=None, cache_present_kv=False, training=True, position_bias=None, cache_position_bias=False)#
Applies the attention mechanism to queries
q
and valuesv
. Keys will be set to be same asv
.- Parameters
q (Tensor) – Queries, shape
[batch_size, seq_length, hidden_size]
.v (Tensor) – Values, shape
[batch_size, seq_length, hidden_size]
.mask (Tensor) – Attention mask. Can be 2D of shape
[batch_size, seq_length]
, or 3D of shape[batch, query_length, seq_length]
.past_kv (Tensor) – Past keys and values. Has shape
[2, batch_size, num_heads, seq_length, hidden_size / num_heads]
. The tensors in[0,:,:,:,:]
and[1,:,:,:,:]
contain the past keys and values, respectively. Defaults toNone
.cache_present_kv (bool) – Specifies if the present keys and values must be cached and returned. Needed to speed up the computations when the decoder is called within an autoregressive loop. Defaults to
False
.training (bool) – Training the model if
True
. Needed to call thedropout
(after softmax) in the appropriate mode.position_bias (Tensor) – Tensor containing position bias to apply in attention.
cache_position_bias (bool) – Specifies if position bias must be cached and returned. Needed to speed up the computations when the decoder is called within an autoregressive loop. Defaults to
False
.
- Returns
when
cache_present_kv
isTrue
andcache_position_bias
isTrue
, returns a tuple, where the 0th entry contains the attention output, 1st entry contains a tensor of keys and values computed at the current application of the attention layer, and the 3rd entry contains a tensor of position bias computed at the current application of the attention layer.If
cache_present_kv
isFalse
, no entry for present keys and values is provided.If
cache_position_bias
isFalse
, no entry for position bias is provided.if both
cache_present_kv
cache_position_bias
are set to False, return a tensor of shape equal to shape ofpast_kv
(see above).
common.tf.layers.BaseLayer module#
- class common.tf.layers.BaseLayer.BaseLayer#
Bases:
tensorflow.keras.layers.Layer
Base layer for the reference models.
- Parameters
boundary_casting (bool) – If
True
, outputs the values in half precision and casts the input values up to full precision.tf_summary (bool) – If
True
, saves the activations withsummary_layer
.
- __init__(boundary_casting=False, tf_summary=False, **kwargs)#
- call()#
common.tf.layers.Conv2DLayer module#
- class common.tf.layers.Conv2DLayer.Conv2DLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Wrapper around the Keras 2D convolution layer.
- __init__(filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Apply the 2D convolution layer.
- Parameters
inputs – A 4D tensor with shape:
(samples, channels, rows, cols)
ifdata_format='channels_first'
or a 4D tensor with shape(samples, rows, cols, channels)
ifdata_format='channels_last'
.- Returns
A 4D tensor with shape:
(samples, filters, new_rows, new_cols)
ifdata_format='channels_first'
or a 4D tensor with shape:(samples, new_rows, new_cols, filters)
ifdata_format='channels_last'
. Note thatrows
andcols
values might have changed due to padding.- Return type
Tensor
common.tf.layers.Conv2DTransposeLayer module#
- class common.tf.layers.Conv2DTransposeLayer.Conv2DTransposeLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Wrapper around the Keras 2D transposed convolution layer.
- __init__(filters, kernel_size, strides=(1, 1), padding='valid', output_padding=None, data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Apply the 2D transposed convolution layer.
- Parameters
inputs – A 4D tensor with shape:
(samples, channels, rows, cols)
ifdata_format='channels_first'
or 4D tensor with shape:(samples, rows, cols, channels)
ifdata_format='channels_last'
.- Returns
A 4D tensor with shape:
(samples, filters, new_rows, new_cols)
ifdata_format='channels_first'
or a 4D tensor with shape:(samples, new_rows, new_cols, filters)
ifdata_format='channels_last'
. Note thatrows
andcols
values might have changed due to padding.- Return type
Tensor
common.tf.layers.CrossEntropyFromLogitsLayer module#
- class common.tf.layers.CrossEntropyFromLogitsLayer.CrossEntropyFromLogitsLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Cross entropy loss, given logits. Compares logits against labels.
- Parameters
boundary_casting (bool) –
tf_summary (bool) –
- __init__(boundary_casting=False, tf_summary=False, **kwargs)#
- call(labels, logits)#
Calculating cross entropy over logits.
- Parameters
labels (Tensor) – Label indices.
logits (Tensor) – Logits (non-normalized).
- Returns
A tensor of the same shape as labels and of the same type as logits with the softmax cross entropy loss.
- Return type
Tensor
common.tf.layers.DenseLayer module#
- class common.tf.layers.DenseLayer.DenseLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Wrapper around the Keras densely-connected layer. Provides support for
"gelu"
activation.- Parameters
units (int) – Number of units in the layer output.
activation (Optional[Union[str, Callable]]) – If not
None
, an activation function to be applied after the dense layer. The activation function can either be a callable string name of a Tensorflow built-in activation, or"gelu"
.use_bias (bool) – Whether to use bias.
kernel_initializer (str) – Kernel intializer. Defaults to
"glorot_uniform"
.kernel_initializer – Bias intializer. Defaults to
"zeros"
.kernel_regularizer (Optional[Callable]) – Kernel regularizer. Defaults to
None
.bias_regularizer (Optional[Callable]) – Bias regularizer. Defaults to
None
.activity_regularizer (Optional[Callable]) – Activity (output activation) regularizer. Defaults to
None
.kernel_constraint (Optional[Callable]) – Kernel constraint. Defaults to
None
.bias_constraint (Optional[Callable]) – Bias constraint. Defaults to
None
.boundary_casting (bool) – If
True
, outputs the values in half precision and casts the input values up to full precision.tf_summary (bool) – If
True
, saves the activations withsummary_layer
.
- __init__(units, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Apply the densely-connected layer.
- Parameters
inputs (Tensor) – An N-D tensor with shape:
(batch_size, ..., input_dim)
.- Returns
An N-D tensor with shape:
(batch_size, ..., units)
.- Return type
Tensor
common.tf.layers.DropoutLayer module#
- class common.tf.layers.DropoutLayer.DropoutLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Wrapper around the Keras dropout layer.
- __init__(rate, noise_shape=None, seed=None, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, training=True, **kwargs)#
Performs the dropout.
- Parameters
inputs (Tensor) – Arbitrary tensor.
training (bool) – Training mode if set to
True
.
- Returns
A tensor of same shape as input.
- Return type
Tensor
common.tf.layers.EmbeddingLayer module#
- class common.tf.layers.EmbeddingLayer.EmbeddingLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Embedding layer. Built on top of the Keras Embedding layer.
- __init__(input_dim, output_dim, embeddings_initializer='uniform', bias_initializer='zeros', embeddings_regularizer=None, activity_regularizer=None, embeddings_constraint=None, mask_zero=False, input_length=None, use_bias=False, weight_name='embedding_weights', boundary_casting=False, tf_summary=False, **kwargs)#
- build(input_shape)#
- call(inputs, pad_id=- 1, scale=1)#
Get token embeddings of inputs.
- Parameters
inputs (Tensor) – A tensor with shape
[batch_size, length]
.pad_id – Integer specifying which input ID corresponds instead to padding. It does not need to be a legal vocabulary entry. Any
`inputs`
elements equal to this value will not be looked up, but instead directly output zeros. On the Wafer Scale Engine, this indicates the presence of variable sequence length.scale – Scaling of the embedding (in MLPERF
hidden_size**0.5
is used).
- Returns
A tensor of embeddings with shape
[batch_size, length, hidden_size]
. Padded positions are filled with zeros.- Return type
embeddings (Tensor)
- embedding_table()#
common.tf.layers.FeedForwardNetwork module#
- class common.tf.layers.FeedForwardNetwork.FeedForwardNetwork#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
A feed forward network that consists of a stack of fully connected layers.
- Parameters
layers_units (int) – List of units for each layer.
layers_activation (str) – List of activation types (str) for each layer.
layers_dropout_rates (float) – List of dropout rates (float) for each layer.
use_bias (bool) – If
True
, use bias throughout all layers.kernel_initializer (string) – Kernel initializer. Defaults to
"glorot_uniform"
.bias_initializer (callable) – Bias initializer. Defaults to
"zeros"
.output_layer_initializer – If not None, initialize the last projection layer with this initializer. Defaults to None.
kernel_regularizer (callable) – Kernel regularizer.
bias_initializer – Bias regularizer.
dropout_seed (int) – Seed with which to initialize the dropout layer. Defaults to
None
.
Initialize the FFN object instance.
- __init__(layers_units, layers_activation=None, layers_dropout_rates=None, use_bias=False, kernel_initializer='glorot_uniform', bias_initializer='zeros', output_layer_initializer=None, kernel_regularizer=None, bias_regularizer=None, dropout_seed=None, boundary_casting=False, tf_summary=False, **kwargs)#
Initialize the FFN object instance.
- call(inputs, training=True, **kwargs)#
common.tf.layers.FeedForwardNetworkV2 module#
- class common.tf.layers.FeedForwardNetworkV2.FeedForwardNetworkV2#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Implement a feed forward network as used in the T5 model.
Setup the FFN components
- Parameters
d_ff (int) – The hidden dimension of the feed forward network, i.e. the output dimension of the first layer.
d_model (int) – The output dimension of the feed forward network.
activation (string) – The name of the activation to apply after the first dense layer.
dropout_rate (float) – Dropout rate applied after the first dense layer.
use_bias (bool) – Whether or not to use bias in the dense layers of the feed forward network.
input_layer_initializer (initializer) – A string or initializer to use to initialize the weights of the first dense layer.
output_layer_initializer (initializer) – A string or initializer to use to initialize the weights of the second dense layer.
dropout_seed (int) – The seed to make the dropout layer deterministic.
**kwargs –
Keyword arguments to be passed into BaseLayer.
- __init__(d_ff, d_model, activation='relu', dropout_rate=0.0, use_bias=False, input_layer_initializer='glorot_uniform', output_layer_initializer='glorot_uniform', dropout_seed=None, **kwargs)#
Setup the FFN components
- Parameters
d_ff (int) – The hidden dimension of the feed forward network, i.e. the output dimension of the first layer.
d_model (int) – The output dimension of the feed forward network.
activation (string) – The name of the activation to apply after the first dense layer.
dropout_rate (float) – Dropout rate applied after the first dense layer.
use_bias (bool) – Whether or not to use bias in the dense layers of the feed forward network.
input_layer_initializer (initializer) – A string or initializer to use to initialize the weights of the first dense layer.
output_layer_initializer (initializer) – A string or initializer to use to initialize the weights of the second dense layer.
dropout_seed (int) – The seed to make the dropout layer deterministic.
**kwargs –
Keyword arguments to be passed into BaseLayer.
- call(inputs, training=True, **kwargs)#
common.tf.layers.Input module#
- common.tf.layers.Input.SetupInputTensor(features, tf_summary=False)#
Adds tensor summary to the model’s input features and their gradient, if
tf_summary
is set toTrue
.- Parameters
features – The input features.
common.tf.layers.LayerNormalizationLayer module#
- class common.tf.layers.LayerNormalizationLayer.LayerNormalizationLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Wrapper around the Keras layer normalization. Reference: Layer Normalization.
- __init__(axis=- 1, epsilon=1e-08, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, trainable=True, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Apply the layer normalization.
- Parameters
inputs (Tensor) – Arbitrary tensor.
- Returns
A normalized tensor of the same shape as input.
NOTE: While
**kwargs
are passed, the training arg is never used.- Return type
Tensor
common.tf.layers.MaxPool2DLayer module#
- class common.tf.layers.MaxPool2DLayer.MaxPool2DLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Wrapper around the Keras 2D max pooling layer.
- __init__(pool_size=(2, 2), strides=None, padding='valid', data_format=None, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Applies the 2D max pooling layer.
- Parameters
inputs (Tensor) – A 4D tensor with the shape:
(samples, channels, rows, cols)
ifdata_format='channels_first'
or a 4D tensor with the shape(samples, rows, cols, channels)
ifdata_format='channels_last'
.- Returns
A 4D tensor with the shape:
(batch_size, channels, pooled_rows, pooled_cols)
ifdata_format='channels_first'
or a 4D tensor with shape:(batch_size, pooled_rows, pooled_cols, channels)
ifdata_format='channels_last'
.- Return type
Tensor
common.tf.layers.PoolerLayer module#
- class common.tf.layers.PoolerLayer.PoolerLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
The pooler layer.
Currently supports the following pooler types:
"mean"
: Mean reduction."max"
: Max reduction."first"
: First slice in the axis dimension."last"
: Last slice in the axis dimension."sum"
: Takes the sum over the axis dimension. Defaults to the entire Tensor.None
: No pooling (output=input).
- __init__(pooler_type='mean', axis=None, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, padding_mask=None, **kwargs)#
Apply pooler of a given type.
Takes in a padding mask with 1s for tokens and 0s for padding.
common.tf.layers.PoolerLayerV2 module#
- class common.tf.layers.PoolerLayerV2.PoolerLayerV2#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
The pooler layer. Usually used for pooling or summarizing the sequence data.
This layer is added as a workaround to the existing pooler layer for additional masking support. The plan is to use this layer for kernel matching and integ bring up. After we have full support for this layer, we should deprecate the old
PoolerLayer
.- Parameters
pooler_type (str) – Type of pooling. Currently supports the following
types (pooler) –
"mean"
: Mean reduction."max"
: Max reduction."first"
: First slice in the axis dimension."last"
: Last slice in the axis dimension (Not yet supported)"sum"
: Takes the sum over the axis dimension. Defaults to the entire Tensor.
axis (int) – The dimensions to reduce. If None (the default), reduces all dimensions.
boundary_casting (bool) – If
True
, outputs the values in half precision and casts the input values up to full precision.tf_summary (bool) – If
True
, saves the activations withsummary_layer
.
- __init__(pooler_type, axis=None, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, padding_mask=None)#
Apply pooling with optional masking.
- Parameters
inputs (Tensor) – Input tensor.
padding_mask (Tensor) – The padding mask tensor. Assumed to be 1-based, i.e., has
1
in the non-padded positions and0
elsewhere. If the input tensor is of the shape[d0, d1, ..., d_{k-1}, d_{axis}, d_{k+1}, ... d_n]
, then thepadding_mask
must have the shape[d0, d1, ..., d_{k-1}, axis]
or[d0, d1, ..., d_{k-1}, axis, 1, ..., 1]
. IfNone
(the default), a padding mask of all 1’s is used.
common.tf.layers.PositionEmbeddingLayer module#
- class common.tf.layers.PositionEmbeddingLayer.PositionEmbeddingLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Implementation of the position embedding layer.
Adds positional information to the token embedding provided as input. Supports
'fixed'
and'learned'
positional embeddings.- Parameters
max_position_embeddings (int) – Maximum sequence length to train using the model. If
None
, set to the input sequence length.embedding_type (str) –
Options are
'learned'
or'fixed'
.Learned: Trainable weights for embeddings.
Fixed: Fixed weights for embeddings.
embeddings_initializer (callable) – Embeddings initializer.
embeddings_regularizer (callable) – Embeddings regularizer.
boundary_casting (bool) – See the documentation for
BaseLayer
.tf_summary – See the documentation for
BaseLayer
.**kwargs – Additional keyword arguments for
BaseLayer
.
- __init__(max_position_embeddings=None, embedding_type='fixed', embeddings_initializer='uniform', embeddings_regularizer=None, boundary_casting=False, tf_summary=False, **kwargs)#
- build(input_shape)#
- call(inputs, position_ids=None)#
Add position embeddings to the inputs.
- Parameters
inputs (Tensor) – Input of the size
[batch_size, seq_len, embedding_size]
.position_ids (Tensor) – Position IDs of the inputs.A 1D tensor of size
seq_len
. IfNone
(default), assumes that corresponds to[0, 1, ..., seq_len-1]
.
- setup_fixed_position_embedding(length, channels, min_timescale=1.0, max_timescale=10000.0)#
Adds several sinusoids of different frequencies to a Tensor.
Each channel of the input Tensor is incremented by a sinusoid of a different frequency and phase.
This allows the attention to learn to use absolute and relative positions. Timing signals should be added to some precursors of both the query and the memory inputs to the attention.
The use of relative position is possible because
sin(x+y)
andcos(x+y)
can be expressed in terms ofy
,sin(x)
andcos(x)
.In specific, this function uses a geometric sequence of timescales starting with
min_timescale
and ending withmax_timescale
. The number of different timescales is equal tochannels / 2
. For each timescale, this function generates the two sinusoidal signalssin(timestep/timescale)
andcos(timestep/timescale)
. All these sinusoids are concatenated in the channels dimension.- Parameters
min_timescale (float) –
max_timescale (float) –
- Returns
A tensor of the shape
[length, channels]
. Based on _get_timing_signal_1d.- Return type
Tensor
common.tf.layers.PrePostProcessWrapper module#
common.tf.layers.ReshapeLayer module#
- class common.tf.layers.ReshapeLayer.ReshapeLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Wrapper around the Keras layer that reshapes the input.
- __init__(target_shape, boundary_casting=False, tf_summary=False, **kwargs)#
- call(input, **kwargs)#
Apply the reshape layer to an input.
- Parameters
inputs (Tensor) – A tensor.
- Returns
The tensor after reshape.
- Return type
Tensor
common.tf.layers.SegmentEmbeddingLayer module#
- class common.tf.layers.SegmentEmbeddingLayer.SegmentEmbeddingLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Segment embedding layer. Adds segment information. For example, to which sentence the token belongs when an input sequence contains multiple sentences, such as two in the case of BERT model, to the token embedding provided as input.
- Parameters
num_segments (int) – Number of encoded segments.
embeddings_regularizer (callable) – Embeddings regularizer.
- __init__(num_segments=2, embeddings_initializer='uniform', embeddings_regularizer=None, boundary_casting=False, tf_summary=False, **kwargs)#
- build(input_shape)#
- call(inputs, segment_ids)#
Add segment embedding to inputs.
- Parameters
inputs – Tensor of input embeddings.
segment_ids – Segment IDs.
common.tf.layers.SoftmaxLayer module#
- class common.tf.layers.SoftmaxLayer.SoftmaxLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Wrapper around the Keras softmax layer.
- __init__(axis=- 1, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Performs the softmax.
- Parameters
inputs – Arbitrary tensor.
- Returns
A tensor of the same shape as input.
- Return type
Tensor
common.tf.layers.SquaredErrorLayer module#
- class common.tf.layers.SquaredErrorLayer.SquaredErrorLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Squared error between prediction and labels.
- Parameters
boundary_casting (bool) – If
True
, outputs the values in half precision and casts the input values up to full precision.tf_summary (bool) – If
True
, saves the activations withsummary_layer
.
- __init__(boundary_casting=False, tf_summary=False, **kwargs)#
- call(labels, pred)#
Calculates the squared error between prediction and labels.
- Parameters
labels (Tensor) – Labels.
pred (Tensor) – Predictions (same shape as labels).
- Returns
Loss tensor of the same shape and type as
pred
.- Return type
Tensor