.. _pytorch-ops-torch.nn.transformer-encoder-layer:

modelzoo.common.pytorch.layers.TransformerEncoderLayer
======================================================

import path: ``modelzoo.common.pytorch.layers.TransformerEncoderLayer``

**TransformerEncoderLayer** (d_model, nhead, dim_feedforward=2048, dropout=0.1, activation=”gelu, layer_norm_eps=1e-05, batch_first=True, norm_first=False, device=None, attention_dropout_rate=None, attention_type="scaled_dot_product", use_projection_bias_in_attention=False, use_ffn_bias_in_attention=False, use_ffn_bias=False, attention_initializer="xavier_uniform", ffn_initializer="xavier_uniform"):

- **d_model**: the number of expected features in the input (required).

- **nhead**: the number of heads in the multihead attention models (required). 

- **dim_feedforward**: the dimension of the feedforward network model (``default=2048``). 

- **dropout**: the dropout value (``default=0.1``). 

- **activation**: the activation function of the intermediate layer, can be a string (``relu`` or ``gelu``) or a unary callable. Default: ``gelu``. 

- **layer_norm_eps**: the eps value in layer normalization components (``default=1e-5``). 

- **batch_first**: If ``True``, then the input and output tensors are provided as (``batch``, ``seq``, ``feature``). Default: ``False`` (``seq``, ``batch``, ``feature``). We only support ``batch_first = True`` now.

- **norm_first**: if ``True``, layer norm is done prior to attention and feedforward operations, respectively. Otherwise it's done after. ``Default: False`` (after). 

- **attention_dropout_rate**: Attention dropout rate. If ``None``, defaults to dropout.

- **attention_type**: Should be in [``scaled_dot_product``, ``dot_product``] 

- **use_projection_bias_in_attention**: Add bias to Q, K, V projections in the Attention layer. Defaults to ``False``. 

- **use_ffn_bias_in_attention**: Add bias in the concluding FFN in the Attention layer. Defaults to ``False``.

- **use_ffn_bias**: Add bias in all dense layers of the decoder's ffn sublayer.

- **attention_initializer**: Attention layer initializer. Defaults to ``xavier_uniform``. 

- **ffn_initializer**: FFN layer initializer. Defaults to ``xavier_uniform``.

**forward** (src, mask=None, src_key_padding_mask=None):

- **src** (Tensor): the sequence to the encoder layer (required). shape ``[batch_size, src_seq_length, embed_dim]``.

- **mask** (Tensor): the mask for the src sequence (optional). shape ``[src_seq_length, src_seq_length]``.

- **src_key_padding_mask** (Tensor): the mask for the src keys per batch (optional). shape ``[batch_size, src_seq_length]``.