fix typos & inconsistent texts (#269)

Co-authored-by: TRAN <you@example.com>
2026-04-10 12:33:42 +00:00 · 2024-07-17 21:34:51 +09:00
parent a33e89c12c
commit 070a69fc8b
3 changed files with 3 additions and 3 deletions
--- a/ch02/01_main-chapter-code/ch02.ipynb
+++ b/ch02/01_main-chapter-code/ch02.ipynb
@@ -705,7 +705,7 @@
    "  - `[BOS]` (beginning of sequence) marks the beginning of text\n",
    "  - `[EOS]` (end of sequence) marks where the text ends (this is usually used to concatenate multiple unrelated texts, e.g., two different Wikipedia articles or two different books, and so on)\n",
    "  - `[PAD]` (padding) if we train LLMs with a batch size greater than 1 (we may include multiple texts with different lengths; with the padding token we pad the shorter texts to the longest length so that all texts have an equal length)\n",
-    "- `[UNK]` to represent works that are not included in the vocabulary\n",
+    "- `[UNK]` to represent words that are not included in the vocabulary\n",
    "\n",
    "- Note that GPT-2 does not need any of these tokens mentioned above but only uses an `<|endoftext|>` token to reduce complexity\n",
    "- The `<|endoftext|>` is analogous to the `[EOS]` token mentioned above\n",