common.pytorch.model_utils package#
Subpackages#
- common.pytorch.model_utils.checkpoint_converters package
- Submodules
- common.pytorch.model_utils.checkpoint_converters.base_converter module
- common.pytorch.model_utils.checkpoint_converters.bert module
- common.pytorch.model_utils.checkpoint_converters.bert_finetune module
- common.pytorch.model_utils.checkpoint_converters.gpt2_hf_cs module
- common.pytorch.model_utils.checkpoint_converters.gpt_neox_hf_cs module
- common.pytorch.model_utils.checkpoint_converters.gptj_hf_cs module
- common.pytorch.model_utils.checkpoint_converters.salesforce_codegen_hf_cs module
- common.pytorch.model_utils.checkpoint_converters.t5 module
- Module contents
Submodules#
common.pytorch.model_utils.BertPretrainModelLoss module#
common.pytorch.model_utils.GPTLMHeadModelLoss module#
common.pytorch.model_utils.RotaryPositionEmbeddingHelper module#
common.pytorch.model_utils.T5ForConditionalGenerationLoss module#
- class common.pytorch.model_utils.T5ForConditionalGenerationLoss.T5ForConditionalGenerationLoss#
Bases:
torch.nn.Module
- __init__(lm_loss_weight, mlm_loss_scaling, label_smoothing=0.0)#
- forward(lm_logits, labels, decoder_attention_mask, loss_weight=None)#
- Per-token loss is averaged across the batch by
Summing across all tokens in the batch
Dividing by the batch size
- Multiplying by the provided loss weight (expected to be roughly
equal to batch_size / num_tokens_in_batch)
The user has the option to specify this loss weight once and use the same weight for every batch (by setting self.global_loss_weight and not passing in loss_weight to the forward function) or use a different weight for every batch (by passing loss_weight to the forward function).
- common.pytorch.model_utils.T5ForConditionalGenerationLoss.smooth_loss(prediction_scores, loss, label_smoothing, classes)#
common.pytorch.model_utils.activations module#
common.pytorch.model_utils.convert_checkpoint module#
- class common.pytorch.model_utils.convert_checkpoint.CheckpointConverterCLI#
Bases:
object
- __init__()#
- common.pytorch.model_utils.convert_checkpoint.convert_checkpoint(model, src_fmt, tgt_fmt, checkpoint, config, drop_unmatched_keys=False, no_progress_bar=True, debug=False)#
- common.pytorch.model_utils.convert_checkpoint.convert_checkpoint_from_file(model, src_fmt, tgt_fmt, checkpoint_file, config_file, outputdir=None, export_h5_checkpoint=False, drop_unmatched_keys=False, no_progress_bar=True, debug=False)#
- common.pytorch.model_utils.convert_checkpoint.convert_config(model, src_fmt, tgt_fmt, config, drop_unmatched_keys=False, no_progress_bar=True, debug=False)#
- common.pytorch.model_utils.convert_checkpoint.convert_config_from_file(model, src_fmt, tgt_fmt, config_file, outputdir=None, drop_unmatched_keys=False, no_progress_bar=True, debug=False)#
common.pytorch.model_utils.create_initializer module#
- common.pytorch.model_utils.create_initializer.create_initializer(spec)#
Creates the specified initializer.
- Parameters
spec (dict/str) – either a string indicating the name of the initializer or a dict that includes the name + other params if relevant.
seed (int) – random seed for the initializer or None to run unseeded.
- Returns
initializer that can be passed to layers
common.pytorch.model_utils.weight_initializers module#
- common.pytorch.model_utils.weight_initializers.lecun_normal_(tensor)#
Adapted from TensorFlow’s initializations https://www.tensorflow.org/api_docs/python/tf/keras/initializers/LecunNormal
- Parameters
tensor (torch.Tensor) – an n-dimensional torch.Tensor
Examples
>>> w = torch.empty(3, 3) >>> lecun_normal_(w)
- common.pytorch.model_utils.weight_initializers.lecun_uniform_(tensor)#
Adapted from TensorFlow’s initializations https://www.tensorflow.org/api_docs/python/tf/keras/initializers/LecunUniform
- Parameters
tensor (torch.Tensor) – an n-dimensional torch.Tensor
Examples
>>> w = torch.empty(3, 3) >>> lecun_uniform_(w)
- common.pytorch.model_utils.weight_initializers.trunc_normal_(tensor, mean=0.0, std=1.0, a=- 2.0, b=2.0)#
Fills the input Tensor with values drawn from a truncated normal distribution. The values are effectively drawn from the normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\) with values outside \([a, b]\) redrawn until they are within the bounds. The method used for generating the random values works best when \(a \leq \text{mean} \leq b\).
- Parameters
tensor (torch.Tensor) – an n-dimensional torch.Tensor
mean (float) – the mean of the normal distribution. Defaults to 0.0
std (float) – the standard deviation of the normal distribution. Defaults to 1.0
a (float) – the minimum cutoff value. Defaults to -2.0
b (float) – the maximum cutoff value. Defaults to 2.0
Examples
>>> w = torch.empty(3, 3) >>> trunc_normal_(w)
- common.pytorch.model_utils.weight_initializers.variance_scaling_(tensor, scale=1.0, mode='fan_in', distribution='truncated_normal')#
Adapted from TensorFlow’s initializations https://www.tensorflow.org/api_docs/python/tf/keras/initializers/VarianceScaling
Fills the input Tensor with values given scale, mode and distribution.
- Parameters
tensor (torch.Tensor) – an n-dimensional torch.Tensor
scale (float) – scaling factor (positive float)
mode (str) – mode of weight initialization. Defaults to fan_in
distribution (str) – distributino to initialize tensors with. Defaults to truncated_normal
Examples
>>> w = torch.empty(3, 3) >>> variance_scaling_(w)