.. _pytorch-ops-torch.nn.transformer-encoder-layer: modelzoo.common.pytorch.layers.TransformerEncoderLayer ====================================================== import path: ``modelzoo.common.pytorch.layers.TransformerEncoderLayer`` **TransformerEncoderLayer** (d_model, nhead, dim_feedforward=2048, dropout=0.1, activation=”gelu, layer_norm_eps=1e-05, batch_first=True, norm_first=False, device=None, attention_dropout_rate=None, attention_type="scaled_dot_product", use_projection_bias_in_attention=False, use_ffn_bias_in_attention=False, use_ffn_bias=False, attention_initializer="xavier_uniform", ffn_initializer="xavier_uniform"): - **d_model**: the number of expected features in the input (required). - **nhead**: the number of heads in the multihead attention models (required). - **dim_feedforward**: the dimension of the feedforward network model (``default=2048``). - **dropout**: the dropout value (``default=0.1``). - **activation**: the activation function of the intermediate layer, can be a string (``relu`` or ``gelu``) or a unary callable. Default: ``gelu``. - **layer_norm_eps**: the eps value in layer normalization components (``default=1e-5``). - **batch_first**: If ``True``, then the input and output tensors are provided as (``batch``, ``seq``, ``feature``). Default: ``False`` (``seq``, ``batch``, ``feature``). We only support ``batch_first = True`` now. - **norm_first**: if ``True``, layer norm is done prior to attention and feedforward operations, respectively. Otherwise it's done after. ``Default: False`` (after). - **attention_dropout_rate**: Attention dropout rate. If ``None``, defaults to dropout. - **attention_type**: Should be in [``scaled_dot_product``, ``dot_product``] - **use_projection_bias_in_attention**: Add bias to Q, K, V projections in the Attention layer. Defaults to ``False``. - **use_ffn_bias_in_attention**: Add bias in the concluding FFN in the Attention layer. Defaults to ``False``. - **use_ffn_bias**: Add bias in all dense layers of the decoder's ffn sublayer. - **attention_initializer**: Attention layer initializer. Defaults to ``xavier_uniform``. - **ffn_initializer**: FFN layer initializer. Defaults to ``xavier_uniform``. **forward** (src, mask=None, src_key_padding_mask=None): - **src** (Tensor): the sequence to the encoder layer (required). shape ``[batch_size, src_seq_length, embed_dim]``. - **mask** (Tensor): the mask for the src sequence (optional). shape ``[src_seq_length, src_seq_length]``. - **src_key_padding_mask** (Tensor): the mask for the src keys per batch (optional). shape ``[batch_size, src_seq_length]``.