modelzoo.transformers.pytorch.bert.bert_pretrain_models.BertPretrainModel#

class modelzoo.transformers.pytorch.bert.bert_pretrain_models.BertPretrainModel[source]#

Bases: torch.nn.Module

Bert Model with two heads on top as done during the pretraining: a masked language modeling head and a next sentence prediction (classification) head. Following the paper: https://arxiv.org/abs/1810.04805.

Parameters

disable_nsp (bool optional, defaults to False) – Whether to disable next-sentence-prediction and only use masked-language-model.
mlm_loss_weight (float optional, defaults to 1.0) – The scaling factor for masked-language-model loss.
label_smoothing (float optional, defaults to 0.0) – The label smoothing factor used during training.

Methods

`forward`	param input_ids The id of input tokens. Can be of shape `[batch_size, seq_length]`
`get_input_embeddings`
`get_output_embeddings`
`reset_parameters`
`tie_weights`

__call__(*args: Any, **kwargs: Any) → Any#: Call self as a function.

__init__(disable_nsp=False, mlm_loss_weight=1.0, label_smoothing=0.0, num_classes=2, mlm_nonlinearity=None, vocab_size=50257, max_position_embeddings=1024, position_embedding_type='learned', embedding_pad_token_id=0, mask_padding_in_positional_embed=False, hidden_size=768, share_embedding_weights=True, num_hidden_layers=12, layer_norm_epsilon=1e-05, num_heads=12, attention_module='aiayn_attention', extra_attention_params={}, attention_type='scaled_dot_product', attention_softmax_fp32=True, dropout_rate=0.1, nonlinearity='gelu', pooler_nonlinearity=None, attention_dropout_rate=0.1, use_projection_bias_in_attention=True, use_ffn_bias_in_attention=True, filter_size=3072, use_ffn_bias=True, use_ffn_bias_in_mlm=True, use_output_bias_in_mlm=True, initializer_range=0.02, num_segments=2)[source]#

Parameters

disable_nsp (bool optional, defaults to False) – Whether to disable next-sentence-prediction and only use masked-language-model.
mlm_loss_weight (float optional, defaults to 1.0) – The scaling factor for masked-language-model loss.
label_smoothing (float optional, defaults to 0.0) – The label smoothing factor used during training.

static __new__(cls, *args: Any, **kwargs: Any) → Any#

forward(input_ids=None, attention_mask=None, position_ids=None, token_type_ids=None, masked_lm_positions=None, should_gather_mlm_labels=False)[source]#

Parameters

input_ids (Tensor) – The id of input tokens. Can be of shape [batch_size, seq_length]
attention_mask (Tensor) – Can be 2D of shape [batch_size, seq_length], or 3D of shape [batch, query_length, seq_length], or 4D of shape [batch, num_heads, query_length, seq_length].
position_ids (Tensor) – The position id of input tokens. Can be of shape [batch_size, seq_length]
token_type_ids (Tensor) – The segment id of input tokens, indicating which sequence the token belongs to. Can be of shape ``[batch_size, seq_length]
masked_lm_positions (Tensor) – Position ids of mlm tokens. Shape [batch_size, max_predictions_per_seq]

modelzoo.transformers.pytorch.bert.bert_pretrain_models.BertMLMHeadTransform

modelzoo.transformers.pytorch.bert.data