Readability and code quality improvements (#959)

* Consistent dataset naming * consistent section headers
2026-04-10 12:33:42 +00:00 · 2026-02-17 19:44:56 -05:00
parent 7b1f740f74
commit be5e2a3331
48 changed files with 419 additions and 297 deletions
--- a/ch04/01_main-chapter-code/ch04.ipynb
+++ b/ch04/01_main-chapter-code/ch04.ipynb
@@ -73,6 +73,7 @@
   "id": "53fe99ab-0bcf-4778-a6b5-6db81fb826ef",
   "metadata": {},
   "source": [
+    "&nbsp;\n",
    "## 4.1 Coding an LLM architecture"
   ]
  },
@@ -323,6 +324,7 @@
   "id": "f8332a00-98da-4eb4-b882-922776a89917",
   "metadata": {},
   "source": [
+    "&nbsp;\n",
    "## 4.2 Normalizing activations with layer normalization"
   ]
  },
@@ -606,6 +608,7 @@
   "id": "11190e7d-8c29-4115-824a-e03702f9dd54",
   "metadata": {},
   "source": [
+    "&nbsp;\n",
    "## 4.3 Implementing a feed forward network with GELU activations"
   ]
  },
@@ -789,6 +792,7 @@
   "id": "4ffcb905-53c7-4886-87d2-4464c5fecf89",
   "metadata": {},
   "source": [
+    "&nbsp;\n",
    "## 4.4 Adding shortcut connections"
   ]
  },
@@ -950,6 +954,7 @@
   "id": "cae578ca-e564-42cf-8635-a2267047cdff",
   "metadata": {},
   "source": [
+    "&nbsp;\n",
    "## 4.5 Connecting attention and linear layers in a transformer block"
   ]
  },
@@ -1068,6 +1073,7 @@
   "id": "46618527-15ac-4c32-ad85-6cfea83e006e",
   "metadata": {},
   "source": [
+    "&nbsp;\n",
    "## 4.6 Coding the GPT model"
   ]
  },
@@ -1332,6 +1338,7 @@
   "id": "da5d9bc0-95ab-45d4-9378-417628d86e35",
   "metadata": {},
   "source": [
+    "&nbsp;\n",
    "## 4.7 Generating text"
   ]
  },
@@ -1519,11 +1526,20 @@
   "id": "a35278b6-9e5c-480f-83e5-011a1173648f",
   "metadata": {},
   "source": [
+    "&nbsp;\n",
    "## Summary and takeaways\n",
    "\n",
    "- See the [./gpt.py](./gpt.py) script, a self-contained script containing the GPT model we implement in this Jupyter notebook\n",
    "- You can find the exercise solutions in [./exercise-solutions.ipynb](./exercise-solutions.ipynb)"
   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4821ac83-ef84-42c4-a327-32bf2820a8e5",
+   "metadata": {},
+   "outputs": [],
+   "source": []
  }
 ],
 "metadata": {
--- a/ch04/01_main-chapter-code/exercise-solutions.ipynb
+++ b/ch04/01_main-chapter-code/exercise-solutions.ipynb
@@ -53,7 +53,8 @@
   "id": "5fea8be3-30a1-4623-a6d7-b095c6c1092e",
   "metadata": {},
   "source": [
-    "# Exercise 4.1: Parameters in the feed forward versus attention module"
+    "&nbsp;\n",
+    "## Exercise 4.1: Parameters in the feed forward versus attention module"
   ]
  },
  {
@@ -182,7 +183,8 @@
   "id": "0f7b7c7f-0fa1-4d30-ab44-e499edd55b6d",
   "metadata": {},
   "source": [
-    "# Exercise 4.2: Initialize larger GPT models"
+    "&nbsp;\n",
+    "## Exercise 4.2: Initialize larger GPT models"
   ]
  },
  {
@@ -329,7 +331,8 @@
   "id": "f5f2306e-5dc8-498e-92ee-70ae7ec37ac1",
   "metadata": {},
   "source": [
-    "# Exercise 4.3: Using separate dropout parameters"
+    "&nbsp;\n",
+    "## Exercise 4.3: Using separate dropout parameters"
   ]
  },
  {
@@ -451,7 +454,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.16"
+   "version": "3.13.5"
  }
 },
 "nbformat": 4,