From 2d8d6224ed38f05520d44800e5ce1ce90ae880bd Mon Sep 17 00:00:00 2001
From: Hayato Hongo <151999571+HayatoHongo@users.noreply.github.com>
Date: Wed, 3 Sep 2025 00:14:36 +0900
Subject: [PATCH] added brief explanations about  2 different ways of RoPE
 implementations (#802)

* added brief explanations about  2 different ways of RoPE implementations

* improve comment

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
---
 ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb b/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb
index 0feaced..fa5af69 100644
--- a/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb
+++ b/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb
@@ -410,7 +410,10 @@
     "```\n",
     "\n",
     "- Unlike traditional absolute positional embeddings, Llama uses rotary position embeddings (RoPE), which enable it to capture both absolute and relative positional information simultaneously\n",
-    "- The reference paper for RoPE is [RoFormer: Enhanced Transformer with Rotary Position Embedding (2021)](https://arxiv.org/abs/2104.09864)"
+    "- The reference paper for RoPE is [RoFormer: Enhanced Transformer with Rotary Position Embedding (2021)](https://arxiv.org/abs/2104.09864)\n",
+    "- RoPE can be implemented in two equivalent ways: the *split-halves* version and the *interleaved even/odd version*; they are mathematically the same as long as we pair dimensions consistently and use the same cos/sin ordering (see [this](https://github.com/rasbt/LLMs-from-scratch/issues/751) GitHub discussion for more information)\n",
+    "- This code uses the RoPE *split-halves* approach, similar to Hugging Face transformers ([modeling_llama.py](https://github.com/huggingface/transformers/blob/e42587f596181396e1c4b63660abf0c736b10dae/src/transformers/models/llama/modeling_llama.py#L173-L188))\n",
+    "- The original RoPE paper and Meta's official Llama 2 repository, however, use the *interleaved (even/odd)* version ([llama/model.py](https://github.com/meta-llama/llama/blob/6c7fe276574e78057f917549435a2554000a876d/llama/model.py#L64-L74)); but as mentioned earlier, they are equivalent"
    ]
   },
   {