common.pytorch.model_utils package#

Subpackages#

common.pytorch.model_utils.checkpoint_converters package

Submodules#

common.pytorch.model_utils.BertPretrainModelLoss module#

class common.pytorch.model_utils.BertPretrainModelLoss.BertPretrainModelLoss#

Bases: torch.nn.Module

__init__(disable_nsp=False, mlm_loss_weight=1.0, label_smoothing=0.0)#

forward(mlm_logits, vocab_size, mlm_labels, nsp_logits, nsp_labels, mlm_weights, mlm_loss_scale=None)#

common.pytorch.model_utils.GPTLMHeadModelLoss module#

class common.pytorch.model_utils.GPTLMHeadModelLoss.GPTLMHeadModelLoss#

Bases: torch.nn.Module

__init__(vocab_size, loss_scaling, loss_weight)#

forward(lm_logits, labels, attention_mask)#

common.pytorch.model_utils.RotaryPositionEmbeddingHelper module#

class common.pytorch.model_utils.RotaryPositionEmbeddingHelper.RotaryPositionEmbeddingHelper#

Bases: object

__init__(max_position_embeddings, rotary_dim)#

create_fixed_pos_emb(device, dtype)#

rotate_tensor(x, real_seq_length, offset=0)#

common.pytorch.model_utils.T5ForConditionalGenerationLoss module#

class common.pytorch.model_utils.T5ForConditionalGenerationLoss.T5ForConditionalGenerationLoss#

Bases: torch.nn.Module

__init__(lm_loss_weight, mlm_loss_scaling, label_smoothing=0.0)#

forward(lm_logits, labels, decoder_attention_mask, loss_weight=None)#

Per-token loss is averaged across the batch by

Summing across all tokens in the batch
Dividing by the batch size
Multiplying by the provided loss weight (expected to be roughly
equal to batch_size / num_tokens_in_batch)

The user has the option to specify this loss weight once and use the same weight for every batch (by setting self.global_loss_weight and not passing in loss_weight to the forward function) or use a different weight for every batch (by passing loss_weight to the forward function).

common.pytorch.model_utils.T5ForConditionalGenerationLoss.smooth_loss(prediction_scores, loss, label_smoothing, classes)#

common.pytorch.model_utils.activations module#

common.pytorch.model_utils.convert_checkpoint module#

class common.pytorch.model_utils.convert_checkpoint.CheckpointConverterCLI#

Bases: object

__init__()#

common.pytorch.model_utils.convert_checkpoint.convert_checkpoint(model, src_fmt, tgt_fmt, checkpoint, config, drop_unmatched_keys=False, no_progress_bar=True, debug=False)#

common.pytorch.model_utils.convert_checkpoint.convert_checkpoint_from_file(model, src_fmt, tgt_fmt, checkpoint_file, config_file, outputdir=None, export_h5_checkpoint=False, drop_unmatched_keys=False, no_progress_bar=True, debug=False)#

common.pytorch.model_utils.convert_checkpoint.convert_config(model, src_fmt, tgt_fmt, config, drop_unmatched_keys=False, no_progress_bar=True, debug=False)#

common.pytorch.model_utils.convert_checkpoint.convert_config_from_file(model, src_fmt, tgt_fmt, config_file, outputdir=None, drop_unmatched_keys=False, no_progress_bar=True, debug=False)#

common.pytorch.model_utils.create_initializer module#

common.pytorch.model_utils.create_initializer.create_initializer(spec)#

Creates the specified initializer.

Parameters

spec (dict/str) – either a string indicating the name of the initializer or a dict that includes the name + other params if relevant.
seed (int) – random seed for the initializer or None to run unseeded.

Returns

initializer that can be passed to layers

common.pytorch.model_utils.weight_initializers module#

common.pytorch.model_utils.weight_initializers.lecun_normal_(tensor)#

Adapted from TensorFlow’s initializations https://www.tensorflow.org/api_docs/python/tf/keras/initializers/LecunNormal

Parameters: tensor (torch.Tensor) – an n-dimensional torch.Tensor

Examples

>>> w = torch.empty(3, 3)
>>> lecun_normal_(w)

common.pytorch.model_utils.weight_initializers.lecun_uniform_(tensor)#

Adapted from TensorFlow’s initializations https://www.tensorflow.org/api_docs/python/tf/keras/initializers/LecunUniform

Parameters: tensor (torch.Tensor) – an n-dimensional torch.Tensor

Examples

>>> w = torch.empty(3, 3)
>>> lecun_uniform_(w)

common.pytorch.model_utils.weight_initializers.trunc_normal_(tensor, mean=0.0, std=1.0, a=- 2.0, b=2.0)#

Fills the input Tensor with values drawn from a truncated normal distribution. The values are effectively drawn from the normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\) with values outside \([a, b]\) redrawn until they are within the bounds. The method used for generating the random values works best when \(a \leq \text{mean} \leq b\).

Parameters

tensor (torch.Tensor) – an n-dimensional torch.Tensor
mean (float) – the mean of the normal distribution. Defaults to 0.0
std (float) – the standard deviation of the normal distribution. Defaults to 1.0
a (float) – the minimum cutoff value. Defaults to -2.0
b (float) – the maximum cutoff value. Defaults to 2.0

Examples

>>> w = torch.empty(3, 3)
>>> trunc_normal_(w)

common.pytorch.model_utils.weight_initializers.variance_scaling_(tensor, scale=1.0, mode='fan_in', distribution='truncated_normal')#

Adapted from TensorFlow’s initializations https://www.tensorflow.org/api_docs/python/tf/keras/initializers/VarianceScaling

Fills the input Tensor with values given scale, mode and distribution.

Parameters

tensor (torch.Tensor) – an n-dimensional torch.Tensor
scale (float) – scaling factor (positive float)
mode (str) – mode of weight initialization. Defaults to fan_in
distribution (str) – distributino to initialize tensors with. Defaults to truncated_normal

Examples

>>> w = torch.empty(3, 3)
>>> variance_scaling_(w)

Module contents#

common.pytorch.metrics package

common.pytorch.model_utils.checkpoint_converters package