tf.layers.PositionEmbeddingLayer module#

class tf.layers.PositionEmbeddingLayer.PositionEmbeddingLayer(*args: Any, **kwargs: Any)#


Implementation of the position embedding layer.

Adds positional information to the token embedding provided as input. Supports 'fixed' and 'learned' positional embeddings.

  • max_position_embeddings (int) – Maximum sequence length to train using the model. If None, set to the input sequence length.

  • embedding_type (str) –

    Options are 'learned' or 'fixed'.

    • Learned: Trainable weights for embeddings.

    • Fixed: Fixed weights for embeddings.

  • embeddings_initializer (callable) – Embeddings initializer.

  • embeddings_regularizer (callable) – Embeddings regularizer.

  • boundary_casting (bool) – See the documentation for BaseLayer.

  • tf_summary – See the documentation for BaseLayer.

  • **kwargs – Additional keyword arguments for BaseLayer.

call(inputs, position_ids=None)#

Add position embeddings to the inputs.

  • inputs (Tensor) – Input of the size [batch_size, seq_len, embedding_size].

  • position_ids (Tensor) – Position IDs of the inputs.A 1D tensor of size seq_len. If None (default), assumes that corresponds to [0, 1, ..., seq_len-1].

setup_fixed_position_embedding(length, channels, min_timescale=1.0, max_timescale=10000.0)#

Adds several sinusoids of different frequencies to a Tensor.

Each channel of the input Tensor is incremented by a sinusoid of a different frequency and phase.

This allows the attention to learn to use absolute and relative positions. Timing signals should be added to some precursors of both the query and the memory inputs to the attention.

The use of relative position is possible because sin(x+y) and cos(x+y) can be expressed in terms of y, sin(x) and cos(x).

In specific, this function uses a geometric sequence of timescales starting with min_timescale and ending with max_timescale. The number of different timescales is equal to channels / 2. For each timescale, this function generates the two sinusoidal signals sin(timestep/timescale) and cos(timestep/timescale). All these sinusoids are concatenated in the channels dimension.

  • min_timescale (float) –

  • max_timescale (float) –


A tensor of the shape [length, channels]. Based on _get_timing_signal_1d.

Return type