modelzoo.common.pytorch.model_utils.DPOLoss#
Classes
DPO Loss :param beta: Temperature parameter for the DPO loss, typically something in the range of 0.1 to 0.5. We ignore the reference model as beta -> 0. :param reference_free: If True, we ignore the _provided_ reference model and implicitly use a reference model that assigns equal probability to all responses. |