fix typos & inconsistent texts (#269)

Co-authored-by: TRAN <you@example.com>
2026-04-10 12:33:42 +00:00 · 2024-07-17 21:34:51 +09:00
parent a33e89c12c
commit 070a69fc8b
3 changed files with 3 additions and 3 deletions
--- a/ch04/01_main-chapter-code/ch04.ipynb
+++ b/ch04/01_main-chapter-code/ch04.ipynb
@@ -1180,7 +1180,7 @@
    "- In the original GPT-2 paper, the researchers applied weight tying, which means that they reused the token embedding layer (`tok_emb`) as the output layer, which means setting `self.out_head.weight = self.tok_emb.weight`\n",
    "- The token embedding layer projects the 50,257-dimensional one-hot encoded input tokens to a 768-dimensional embedding representation\n",
    "- The output layer projects 768-dimensional embeddings back into a 50,257-dimensional representation so that we can convert these back into words (more about that in the next section)\n",
-    "- So, the embedding and output layer have the same number of weight parameters, as we can see based on the shape of their weight matrices: the next chapter\n",
+    "- So, the embedding and output layer have the same number of weight parameters, as we can see based on the shape of their weight matrices\n",
    "- However, a quick note about its size: we previously referred to it as a 124M parameter model; we can double check this number as follows:"
   ]
  },