mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2026-04-10 12:33:42 +00:00
Fix docstring parameter names in compute_dpo_loss function (#953)
This commit is contained in:
@@ -1880,8 +1880,8 @@
|
||||
" \"\"\"Compute the DPO loss for a batch of policy and reference model log probabilities.\n",
|
||||
"\n",
|
||||
" Args:\n",
|
||||
" policy_chosen_logprobs: Log probabilities of the policy model for the chosen responses. Shape: (batch_size,)\n",
|
||||
" policy_rejected_logprobs: Log probabilities of the policy model for the rejected responses. Shape: (batch_size,)\n",
|
||||
" model_chosen_logprobs: Log probabilities of the policy model for the chosen responses. Shape: (batch_size,)\n",
|
||||
" model_rejected_logprobs: Log probabilities of the policy model for the rejected responses. Shape: (batch_size,)\n",
|
||||
" reference_chosen_logprobs: Log probabilities of the reference model for the chosen responses. Shape: (batch_size,)\n",
|
||||
" reference_rejected_logprobs: Log probabilities of the reference model for the rejected responses. Shape: (batch_size,)\n",
|
||||
" beta: Temperature parameter for the DPO loss; typically something in the range of 0.1 to 0.5. We ignore the reference model as beta -> 0.\n",
|
||||
|
||||
Reference in New Issue
Block a user