minor fixes (#246)

* removed duplicated white spaces * Update ch07/01_main-chapter-code/ch07.ipynb * Update ch07/05_dataset-generation/llama3-ollama.ipynb * removed duplicated white spaces * fixed title again --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2026-04-10 12:33:42 +00:00 · 2024-06-26 00:30:30 +02:00
parent 9a9b3530c9
commit 81c843bdc0
10 changed files with 19 additions and 19 deletions
--- a/ch02/01_main-chapter-code/ch02.ipynb
+++ b/ch02/01_main-chapter-code/ch02.ipynb
@@ -710,7 +710,7 @@
    "- `[UNK]` to represent works that are not included in the vocabulary\n",
    "\n",
    "- Note that GPT-2 does not need any of these tokens mentioned above but only uses an `<|endoftext|>` token to reduce complexity\n",
-    "- The  `<|endoftext|>` is analogous to the `[EOS]` token mentioned above\n",
+    "- The `<|endoftext|>` is analogous to the `[EOS]` token mentioned above\n",
    "- GPT also uses the `<|endoftext|>` for padding (since we typically use a mask when training on batched inputs, we would not attend padded tokens anyways, so it does not matter what these tokens are)\n",
    "- GPT-2 does not use an `<UNK>` token for out-of-vocabulary words; instead, GPT-2 uses a byte-pair encoding (BPE) tokenizer, which breaks down words into subword units which we will discuss in a later section\n",
    "\n"