modelzoo.transformers.pytorch.bert.bert_pretrain_models.BertPretrainModel#

class modelzoo.transformers.pytorch.bert.bert_pretrain_models.BertPretrainModel[source]#

Bases: torch.nn.Module

Bert Model with two heads on top as done during the pretraining: a masked language modeling head and a next sentence prediction (classification) head. Following the paper: https://arxiv.org/abs/1810.04805.

Parameters
  • disable_nsp (bool optional, defaults to False) – Whether to disable next-sentence-prediction and only use masked-language-model.

  • mlm_loss_weight (float optional, defaults to 1.0) – The scaling factor for masked-language-model loss.

  • label_smoothing (float optional, defaults to 0.0) – The label smoothing factor used during training.

Methods

forward

param input_ids

The id of input tokens. Can be of shape [batch_size, seq_length]

get_input_embeddings

get_output_embeddings

reset_parameters

tie_weights

__call__(*args: Any, **kwargs: Any) Any#

Call self as a function.

__init__(disable_nsp=False, mlm_loss_weight=1.0, label_smoothing=0.0, num_classes=2, mlm_nonlinearity=None, vocab_size=50257, max_position_embeddings=1024, position_embedding_type='learned', embedding_pad_token_id=0, mask_padding_in_positional_embed=False, hidden_size=768, share_embedding_weights=True, num_hidden_layers=12, layer_norm_epsilon=1e-05, num_heads=12, attention_module='aiayn_attention', extra_attention_params={}, attention_type='scaled_dot_product', attention_softmax_fp32=True, dropout_rate=0.1, nonlinearity='gelu', pooler_nonlinearity=None, attention_dropout_rate=0.1, use_projection_bias_in_attention=True, use_ffn_bias_in_attention=True, filter_size=3072, use_ffn_bias=True, use_ffn_bias_in_mlm=True, use_output_bias_in_mlm=True, initializer_range=0.02, num_segments=2)[source]#
Parameters
  • disable_nsp (bool optional, defaults to False) – Whether to disable next-sentence-prediction and only use masked-language-model.

  • mlm_loss_weight (float optional, defaults to 1.0) – The scaling factor for masked-language-model loss.

  • label_smoothing (float optional, defaults to 0.0) – The label smoothing factor used during training.

static __new__(cls, *args: Any, **kwargs: Any) Any#
forward(input_ids=None, attention_mask=None, position_ids=None, token_type_ids=None, masked_lm_positions=None, should_gather_mlm_labels=False)[source]#
Parameters
  • input_ids (Tensor) – The id of input tokens. Can be of shape [batch_size, seq_length]

  • attention_mask (Tensor) – Can be 2D of shape [batch_size, seq_length], or 3D of shape [batch, query_length, seq_length], or 4D of shape [batch, num_heads, query_length, seq_length].

  • position_ids (Tensor) – The position id of input tokens. Can be of shape [batch_size, seq_length]

  • token_type_ids (Tensor) – The segment id of input tokens, indicating which sequence the token belongs to. Can be of shape ``[batch_size, seq_length]

  • masked_lm_positions (Tensor) – Position ids of mlm tokens. Shape [batch_size, max_predictions_per_seq]