From 82010e2c7729c4582afd5cb155c9d654f62ba43a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dawid=20Wo=C5=BAniak?= <81079505+wozniakos10@users.noreply.github.com> Date: Thu, 29 Jan 2026 23:51:17 +0100 Subject: [PATCH] Fix docstring parameter names in compute_dpo_loss function (#953) --- ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb b/ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb index 3bab1c3..7411d34 100644 --- a/ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb +++ b/ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb @@ -1880,8 +1880,8 @@ " \"\"\"Compute the DPO loss for a batch of policy and reference model log probabilities.\n", "\n", " Args:\n", - " policy_chosen_logprobs: Log probabilities of the policy model for the chosen responses. Shape: (batch_size,)\n", - " policy_rejected_logprobs: Log probabilities of the policy model for the rejected responses. Shape: (batch_size,)\n", + " model_chosen_logprobs: Log probabilities of the policy model for the chosen responses. Shape: (batch_size,)\n", + " model_rejected_logprobs: Log probabilities of the policy model for the rejected responses. Shape: (batch_size,)\n", " reference_chosen_logprobs: Log probabilities of the reference model for the chosen responses. Shape: (batch_size,)\n", " reference_rejected_logprobs: Log probabilities of the reference model for the rejected responses. Shape: (batch_size,)\n", " beta: Temperature parameter for the DPO loss; typically something in the range of 0.1 to 0.5. We ignore the reference model as beta -> 0.\n",