diff --git a/README.md b/README.md
index fabf0fa..e7fb85f 100644
--- a/README.md
+++ b/README.md
@@ -57,5 +57,5 @@ Alternatively, you can view this and other files on GitHub at [https://github.co
 
 Shown below is a mental model summarizing the contents covered in this book.
 
-<img src="images/mental-model.jpg" width="600px">
+<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/mental-model.jpg" width="600px">
 
diff --git a/ch02/01_main-chapter-code/ch02.ipynb b/ch02/01_main-chapter-code/ch02.ipynb
index 82ad59b..6c6caaa 100644
--- a/ch02/01_main-chapter-code/ch02.ipynb
+++ b/ch02/01_main-chapter-code/ch02.ipynb
@@ -41,12 +41,20 @@
     "print(\"tiktoken version:\", version(\"tiktoken\"))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "5a42fbfd-e3c2-43c2-bc12-f5f870a0b10a",
+   "metadata": {},
+   "source": [
+    "- This chapter covers data preparation and sampling to get input data \"ready\" for the LLM"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "628b2922-594d-4ff9-bd82-04f1ebdf41f5",
    "metadata": {},
    "source": [
-    "<img src=\"figures/1.webp\" width=\"500px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/01.webp\" width=\"500px\">"
    ]
   },
   {
@@ -57,14 +65,6 @@
     "## 2.1 Understanding word embeddings"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "ba08d16f-f237-4166-bf89-0e9fe703e7b4",
-   "metadata": {},
-   "source": [
-    "<img src=\"figures/2.webp\" width=\"500px\">"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "0b6816ae-e927-43a9-b4dd-e47a9b0e1cf6",
@@ -73,12 +73,37 @@
     "- No code in this section"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "4f69dab7-a433-427a-9e5b-b981062d6296",
+   "metadata": {},
+   "source": [
+    "- There are any forms of embeddings; we focus on text embeddings in this book"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba08d16f-f237-4166-bf89-0e9fe703e7b4",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/02.webp\" width=\"500px\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "288c4faf-b93a-4616-9276-7a4aa4b5e9ba",
+   "metadata": {},
+   "source": [
+    "- LLMs work embeddings in high-dimensional spaces (i.e., thousands of dimensions)\n",
+    "- Since we can't visualize such high-dimensional spaces (we humans think in 1, 2, or 3 dimensions), the figure below illustrates a 2-dimensipnal embedding space"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "d6b80160-1f10-4aad-a85e-9c79444de9e6",
    "metadata": {},
    "source": [
-    "<img src=\"figures/3.webp\" width=\"300px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/03.webp\" width=\"300px\">"
    ]
   },
   {
@@ -89,12 +114,20 @@
     "## 2.2 Tokenizing text"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "f9c90731-7dc9-4cd3-8c4a-488e33b48e80",
+   "metadata": {},
+   "source": [
+    "- In this section, we tokenize text, which means breaking text into smaller units, such as individual words and punctuation characters"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "09872fdb-9d4e-40c4-949d-52a01a43ec4b",
    "metadata": {},
    "source": [
-    "<img src=\"figures/4.webp\" width=\"300px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/04.webp\" width=\"300px\">"
    ]
   },
   {
@@ -261,7 +294,7 @@
    "id": "6cbe9330-b587-4262-be9f-497a84ec0e8a",
    "metadata": {},
    "source": [
-    "<img src=\"figures/5.webp\" width=\"350px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/05.webp\" width=\"350px\">"
    ]
   },
   {
@@ -318,12 +351,20 @@
     "## 2.3 Converting tokens into token IDs"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "a5204973-f414-4c0d-87b0-cfec1f06e6ff",
+   "metadata": {},
+   "source": [
+    "- Next, we convert the text tokens into token IDs that we can process via embedding layers later"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "177b041d-f739-43b8-bd81-0443ae3a7f8d",
    "metadata": {},
    "source": [
-    "<img src=\"figures/6.webp\" width=\"500px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/06.webp\" width=\"500px\">"
    ]
   },
   {
@@ -444,12 +485,20 @@
     "        break"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "3b1dc314-351b-476a-9459-0ec9ddc29b19",
+   "metadata": {},
+   "source": [
+    "- Below, we illustrate the tokenization of a short sample text using a small vocabulary:"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "67407a9f-0202-4e7c-9ed7-1b3154191ebc",
    "metadata": {},
    "source": [
-    "<img src=\"figures/7.webp\" width=\"500px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/07.webp\" width=\"500px\">"
    ]
   },
   {
@@ -485,12 +534,21 @@
     "        return text"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "dee7a1e5-b54f-4ca1-87ef-3d663c4ee1e7",
+   "metadata": {},
+   "source": [
+    "- The `encode` function turns text into token IDs\n",
+    "- The `decode` function turns token IDs back into text"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "cc21d347-ec03-4823-b3d4-9d686e495617",
    "metadata": {},
    "source": [
-    "<img src=\"figures/8.webp\" width=\"500px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/08.webp\" width=\"500px\">"
    ]
   },
   {
@@ -582,12 +640,20 @@
     "## 2.4 Adding special context tokens"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "863d6d15-a3e2-44e0-b384-bb37f17cf443",
+   "metadata": {},
+   "source": [
+    "- It's useful to add some \"special\" tokens for unknown words and to denote the end of a text"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "aa7fc96c-e1fd-44fb-b7f5-229d7c7922a4",
    "metadata": {},
    "source": [
-    "<img src=\"figures/9.webp\" width=\"500px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/09.webp\" width=\"500px\">"
    ]
   },
   {
@@ -609,12 +675,20 @@
     "\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "a336b43b-7173-49e7-bd80-527ad4efb271",
+   "metadata": {},
+   "source": [
+    "- We use the `<|endoftext|>` tokens between two independent sources of text:"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "52442951-752c-4855-9752-b121a17fef55",
    "metadata": {},
    "source": [
-    "<img src=\"figures/10.webp\" width=\"500px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/10.webp\" width=\"500px\">"
    ]
   },
   {
@@ -953,12 +1027,20 @@
     "print(strings)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "e8c2e7b4-6a22-42aa-8e4d-901f06378d4a",
+   "metadata": {},
+   "source": [
+    "- BPE tokenizers break down unknown words into subwords and individual characters:"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "c082d41f-33d7-4827-97d8-993d5a84bb3c",
    "metadata": {},
    "source": [
-    "<img src=\"figures/11.webp\" width=\"300px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/11.webp\" width=\"300px\">"
    ]
   },
   {
@@ -969,12 +1051,20 @@
     "## 2.6 Data sampling with a sliding window"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "509d9826-6384-462e-aa8a-a7c73cd6aad0",
+   "metadata": {},
+   "source": [
+    "- We train LLMs to generate one word at a time, so we want to prepare the training data accordingly where the next word in a sequence represents the target to predict:"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "39fb44f4-0c43-4a6a-9c2f-9cf31452354c",
    "metadata": {},
    "source": [
-    "<img src=\"figures/12.webp\" width=\"400px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/12.webp\" width=\"400px\">"
    ]
   },
   {
@@ -1101,14 +1191,6 @@
     "    print(tokenizer.decode(context), \"---->\", tokenizer.decode([desired]))"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "b59f90fe-fa73-4c2d-bd9b-ce7c2ce2ba00",
-   "metadata": {},
-   "source": [
-    "<img src=\"figures/13.webp\" width=\"500px\">"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "210d2dd9-fc20-4927-8d3d-1466cf41aae1",
@@ -1145,6 +1227,16 @@
     "print(\"PyTorch version:\", torch.__version__)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "0c9a3d50-885b-49bc-b791-9f5cc8bc7b7c",
+   "metadata": {},
+   "source": [
+    "- We use a sliding window approach where we slide the window one word at a time (this is also known as `stride=1`):\n",
+    "\n",
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/13.webp\" width=\"500px\">"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "92ac652d-7b38-4843-9fbd-494cdc8ec12c",
@@ -1268,12 +1360,20 @@
     "print(second_batch)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "b006212f-de45-468d-bdee-5806216d1679",
+   "metadata": {},
+   "source": [
+    "- An example using stride equal to the context length (here: 4) as shown below:"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "9cb467e0-bdcd-4dda-b9b0-a738c5d33ac3",
    "metadata": {},
    "source": [
-    "<img src=\"figures/14.webp\" width=\"500px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/14.webp\" width=\"500px\">"
    ]
   },
   {
@@ -1349,7 +1449,7 @@
    "id": "e85089aa-8671-4e5f-a2b3-ef252004ee4c",
    "metadata": {},
    "source": [
-    "<img src=\"figures/15.webp\" width=\"400px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/15.webp\" width=\"400px\">"
    ]
   },
   {
@@ -1489,12 +1589,30 @@
     "print(embedding_layer(input_ids))"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "be97ced4-bd13-42b7-866a-4d699a17e155",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "- An embedding layer is essentially a look-up operation:"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "f33c2741-bf1b-4c60-b7fd-61409d556646",
    "metadata": {},
    "source": [
-    "<img src=\"figures/16.webp\" width=\"500px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/16.webp\" width=\"500px\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08218d9f-aa1a-4afb-a105-72ff96a54e73",
+   "metadata": {},
+   "source": [
+    "- **You may be interested in the bonus content comparing embedding layers with regular linear layers: [../02_bonus_efficient-multihead-attention](../02_bonus_efficient-multihead-attention)**"
    ]
   },
   {
@@ -1505,12 +1623,28 @@
     "## 2.8 Encoding word positions"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "24940068-1099-4698-bdc0-e798515e2902",
+   "metadata": {},
+   "source": [
+    "- Embedding layer convert IDs into identical vector representations regardless of where they are located in the input sequence:"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "9e0b14a2-f3f3-490e-b513-f262dbcf94fa",
    "metadata": {},
    "source": [
-    "<img src=\"figures/17.webp\" width=\"400px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/17.webp\" width=\"400px\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92a7d7fe-38a5-46e6-8db6-b688887b0430",
+   "metadata": {},
+   "source": [
+    "- Positional embeddings are combined with the token embedding vector to form the input embeddings for a large language model:"
    ]
   },
   {
@@ -1518,7 +1652,7 @@
    "id": "48de37db-d54d-45c4-ab3e-88c0783ad2e4",
    "metadata": {},
    "source": [
-    "<img src=\"figures/18.webp\" width=\"500px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/18.webp\" width=\"500px\">"
    ]
   },
   {
@@ -1679,12 +1813,21 @@
     "print(input_embeddings.shape)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "1fbda581-6f9b-476f-8ea7-d244e6a4eaec",
+   "metadata": {},
+   "source": [
+    "- In the initial phase of the input processing workflow, the input text is segmented into separate tokens\n",
+    "- Following this segmentation, these tokens are transformed into token IDs based on a predefined vocabulary:"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "d1bb0f7e-460d-44db-b366-096adcd84fff",
    "metadata": {},
    "source": [
-    "<img src=\"figures/19.webp\" width=\"400px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch02_compressed/19.webp\" width=\"400px\">"
    ]
   },
   {
@@ -1722,7 +1865,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.2"
+   "version": "3.10.6"
   }
  },
  "nbformat": 4,
diff --git a/ch02/01_main-chapter-code/figures/1.webp b/ch02/01_main-chapter-code/figures/1.webp
deleted file mode 100644
index 48a4df4..0000000
Binary files a/ch02/01_main-chapter-code/figures/1.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/10.webp b/ch02/01_main-chapter-code/figures/10.webp
deleted file mode 100644
index 35dd7d5..0000000
Binary files a/ch02/01_main-chapter-code/figures/10.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/11.webp b/ch02/01_main-chapter-code/figures/11.webp
deleted file mode 100644
index 444d265..0000000
Binary files a/ch02/01_main-chapter-code/figures/11.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/12.webp b/ch02/01_main-chapter-code/figures/12.webp
deleted file mode 100644
index 2da9d68..0000000
Binary files a/ch02/01_main-chapter-code/figures/12.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/13.webp b/ch02/01_main-chapter-code/figures/13.webp
deleted file mode 100644
index fe2e6f4..0000000
Binary files a/ch02/01_main-chapter-code/figures/13.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/14.webp b/ch02/01_main-chapter-code/figures/14.webp
deleted file mode 100644
index d8f48e4..0000000
Binary files a/ch02/01_main-chapter-code/figures/14.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/15.webp b/ch02/01_main-chapter-code/figures/15.webp
deleted file mode 100644
index 61a93ef..0000000
Binary files a/ch02/01_main-chapter-code/figures/15.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/16.webp b/ch02/01_main-chapter-code/figures/16.webp
deleted file mode 100644
index 29d1609..0000000
Binary files a/ch02/01_main-chapter-code/figures/16.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/17.webp b/ch02/01_main-chapter-code/figures/17.webp
deleted file mode 100644
index c88db28..0000000
Binary files a/ch02/01_main-chapter-code/figures/17.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/18.webp b/ch02/01_main-chapter-code/figures/18.webp
deleted file mode 100644
index a3ae32a..0000000
Binary files a/ch02/01_main-chapter-code/figures/18.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/19.webp b/ch02/01_main-chapter-code/figures/19.webp
deleted file mode 100644
index b725c2d..0000000
Binary files a/ch02/01_main-chapter-code/figures/19.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/2.webp b/ch02/01_main-chapter-code/figures/2.webp
deleted file mode 100644
index 8be4ae9..0000000
Binary files a/ch02/01_main-chapter-code/figures/2.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/3.webp b/ch02/01_main-chapter-code/figures/3.webp
deleted file mode 100644
index b3b26ff..0000000
Binary files a/ch02/01_main-chapter-code/figures/3.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/4.webp b/ch02/01_main-chapter-code/figures/4.webp
deleted file mode 100644
index 54f1a7f..0000000
Binary files a/ch02/01_main-chapter-code/figures/4.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/5.webp b/ch02/01_main-chapter-code/figures/5.webp
deleted file mode 100644
index 10965b0..0000000
Binary files a/ch02/01_main-chapter-code/figures/5.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/6.webp b/ch02/01_main-chapter-code/figures/6.webp
deleted file mode 100644
index d8f30f2..0000000
Binary files a/ch02/01_main-chapter-code/figures/6.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/7.webp b/ch02/01_main-chapter-code/figures/7.webp
deleted file mode 100644
index ed07e3c..0000000
Binary files a/ch02/01_main-chapter-code/figures/7.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/8.webp b/ch02/01_main-chapter-code/figures/8.webp
deleted file mode 100644
index 6cbed99..0000000
Binary files a/ch02/01_main-chapter-code/figures/8.webp and /dev/null differ
diff --git a/ch02/01_main-chapter-code/figures/9.webp b/ch02/01_main-chapter-code/figures/9.webp
deleted file mode 100644
index ac9d879..0000000
Binary files a/ch02/01_main-chapter-code/figures/9.webp and /dev/null differ
diff --git a/ch03/01_main-chapter-code/ch03.ipynb b/ch03/01_main-chapter-code/ch03.ipynb
index d332a25..1b5ba29 100644
--- a/ch03/01_main-chapter-code/ch03.ipynb
+++ b/ch03/01_main-chapter-code/ch03.ipynb
@@ -37,6 +37,22 @@
     "print(\"torch version:\", version(\"torch\"))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "02a11208-d9d3-44b1-8e0d-0c8414110b93",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/01.webp\" width=\"500px\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50e020fd-9690-4343-80df-da96678bef5e",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/02.webp\" width=\"600px\">"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "ecc4dcee-34ea-4c05-9085-2f8887f70363",
@@ -53,6 +69,22 @@
     "- No code in this section"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "55c0c433-aa4b-491e-848a-54905ebb05ad",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/03.webp\" width=\"400px\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03d8df2c-c1c2-4df0-9977-ade9713088b2",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/04.webp\" width=\"400px\">"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "3602c585-b87a-41c7-a324-c5e8298849df",
@@ -69,6 +101,22 @@
     "- No code in this section"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "bc4f6293-8ab5-4aeb-a04c-50ee158485b1",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/05.webp\" width=\"400px\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6565dc9f-b1be-4c78-b503-42ccc743296c",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/06.webp\" width=\"200px\">"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "5efe05ff-b441-408e-8d66-cde4eb3397e3",
@@ -103,6 +151,14 @@
     "    - In short, think of $z^{(2)}$ as a modified version of $x^{(2)}$ that also incorporates information about all other input elements that are relevant to a given task at hand."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "fcc7c7a2-b6ab-478f-ae37-faa8eaa8049a",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/07.webp\" width=\"400px\">"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "ff856c58-8382-44c7-827f-798040e6e697",
@@ -141,14 +197,6 @@
     "    - The subscript \"21\" in $\\omega_{21}$ means that input sequence element 2 was used as a query against input sequence element 1."
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "2e29440f-9b77-4966-83aa-d1ff2e653b00",
-   "metadata": {},
-   "source": [
-    "<img src=\"figures/dot-product.png\" width=\"450px\">"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "35e55f7a-f2d0-4f24-858b-228e4fe88fb3",
@@ -176,6 +224,14 @@
     ")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "5cb3453a-58fa-42c4-b225-86850bc856f8",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/08.webp\" width=\"400px\">"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "77be52fb-82fd-4886-a4c8-f24a9c87af22",
@@ -242,6 +298,14 @@
     "print(torch.dot(inputs[0], query))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "dfd965d6-980c-476a-93d8-9efe603b1b3b",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/09.webp\" width=\"400px\">"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "7d444d76-e19e-4e9a-a268-f315d966609b",
@@ -346,6 +410,14 @@
     "- **Step 3**: compute the context vector $z^{(2)}$ by multiplying the embedded input tokens, $x^{(i)}$ with the attention weights and sum the resulting vectors:"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "f1c9f5ac-8d3d-4847-94e3-fd783b7d4d3d",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/10.webp\" width=\"400px\">"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 8,
@@ -394,7 +466,15 @@
    "id": "11c0fb55-394f-42f4-ba07-d01ae5c98ab4",
    "metadata": {},
    "source": [
-    "<img src=\"figures/attention-matrix.png\" width=\"400px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/11.webp\" width=\"400px\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9bffe4b-56fe-4c37-9762-24bd924b7d3c",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/12.webp\" width=\"400px\">"
    ]
   },
   {
@@ -594,6 +674,14 @@
     "## 3.4 Implementing self-attention with trainable weights"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "ac9492ba-6f66-4f65-bd1d-87cf16d59928",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/13.webp\" width=\"400px\">"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "2b90a77e-d746-4704-9354-1ddad86e6298",
@@ -617,6 +705,14 @@
     "  - These trainable weight matrices are crucial so that the model (specifically, the attention module inside the model) can learn to produce \"good\" context vectors."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "59db4093-93e8-4bee-be8f-c8fac8a08cdd",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/14.webp\" width=\"400px\">"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "4d996671-87aa-45c9-b2e0-07a7bcc9060a",
@@ -630,14 +726,6 @@
     "  - Value vector: $v^{(i)} = W_v \\,x^{(i)}$\n"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "d3b29bc6-4bde-4924-9aff-0af1421803f5",
-   "metadata": {},
-   "source": [
-    "<img src=\"figures/weight-selfattn-1.png\" width=\"600px\">"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "9f334313-5fd0-477b-8728-04080a427049",
@@ -755,7 +843,7 @@
    "id": "8ed0a2b7-5c50-4ede-90cf-7ad74412b3aa",
    "metadata": {},
    "source": [
-    "<img src=\"figures/weight-selfattn-2.png\" width=\"600px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/15.webp\" width=\"400px\">"
    ]
   },
   {
@@ -810,7 +898,7 @@
    "id": "8622cf39-155f-4eb5-a0c0-82a03ce9b999",
    "metadata": {},
    "source": [
-    "<img src=\"figures/weight-selfattn-3.png\" width=\"600px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/16.webp\" width=\"400px\">"
    ]
   },
   {
@@ -847,7 +935,7 @@
    "id": "b8f61a28-b103-434a-aee1-ae7cbd821126",
    "metadata": {},
    "source": [
-    "<img src=\"figures/weight-selfattn-4.png\" width=\"600px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/17.webp\" width=\"400px\">"
    ]
   },
   {
@@ -940,6 +1028,14 @@
     "print(sa_v1(inputs))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "7ee1a024-84a5-425a-9567-54ab4e4ed445",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/18.webp\" width=\"400px\">"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "048e0c16-d911-4ec8-b0bc-45ceec75c081",
@@ -1010,6 +1106,14 @@
     "## 3.5 Hiding future words with causal attention"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "71e91bb5-5aae-4f05-8a95-973b3f988a35",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/19.webp\" width=\"400px\">"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "82f405de-cd86-4e72-8f3c-9ea0354946ba",
@@ -1031,10 +1135,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "71e91bb5-5aae-4f05-8a95-973b3f988a35",
+   "id": "57f99af3-32bc-48f5-8eb4-63504670ca0a",
    "metadata": {},
    "source": [
-    "<img src=\"figures/masked.png\" width=\"600px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/20.webp\" width=\"400px\">"
    ]
   },
   {
@@ -1193,6 +1297,14 @@
     "- So, instead of zeroing out attention weights above the diagonal and renormalizing the results, we can mask the unnormalized attention scores above the diagonal with negative infinity before they enter the softmax function:"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "eb682900-8df2-4767-946c-a82bee260188",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/21.webp\" width=\"400px\">"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 29,
@@ -1279,7 +1391,7 @@
    "id": "ee799cf6-6175-45f2-827e-c174afedb722",
    "metadata": {},
    "source": [
-    "<img src=\"figures/dropout.png\" width=\"500px\">"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/21.webp\" width=\"400px\">"
    ]
   },
   {
@@ -1460,6 +1572,14 @@
     "- Note that dropout is only applied during training, not during inference."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "a554cf47-558c-4f45-84cd-bf9b839a8d50",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/23.webp\" width=\"400px\">"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "c8bef90f-cfd4-4289-b0e8-6a00dc9be44c",
@@ -1485,11 +1605,11 @@
     "\n",
     "- This is also called single-head attention:\n",
     "\n",
-    "<img src=\"figures/single-head.png\" width=\"600px\">\n",
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/24.webp\" width=\"400px\">\n",
     "\n",
     "- We simply stack multiple single-head attention modules to obtain a multi-head attention module:\n",
     "\n",
-    "<img src=\"figures/multi-head.png\" width=\"600px\">\n",
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/25.webp\" width=\"400px\">\n",
     "\n",
     "- The main idea behind multi-head attention is to run the attention mechanism multiple times (in parallel) with different, learned linear projections. This allows the model to jointly attend to information from different representation subspaces at different positions."
    ]
@@ -1678,6 +1798,14 @@
     "- Note that in addition, we added a linear projection layer (`self.out_proj `) to the `MultiHeadAttention` class above. This is simply a linear transformation that doesn't change the dimensions. It's a standard convention to use such a projection layer in LLM implementation, but it's not strictly necessary (recent research has shown that it can be removed without affecting the modeling performance; see the further reading section at the end of this chapter)\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "dbe5d396-c990-45dc-9908-2c621461f851",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch03_compressed/26.webp\" width=\"400px\">"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "8b0ed78c-e8ac-4f8f-a479-a98242ae8f65",
@@ -1802,7 +1930,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.12"
+   "version": "3.10.6"
   }
  },
  "nbformat": 4,
diff --git a/ch03/01_main-chapter-code/figures/attention-matrix.png b/ch03/01_main-chapter-code/figures/attention-matrix.png
deleted file mode 100644
index cfda9f6..0000000
Binary files a/ch03/01_main-chapter-code/figures/attention-matrix.png and /dev/null differ
diff --git a/ch03/01_main-chapter-code/figures/attention.png b/ch03/01_main-chapter-code/figures/attention.png
deleted file mode 100644
index 73edf26..0000000
Binary files a/ch03/01_main-chapter-code/figures/attention.png and /dev/null differ
diff --git a/ch03/01_main-chapter-code/figures/dot-product.png b/ch03/01_main-chapter-code/figures/dot-product.png
deleted file mode 100644
index b7f88e2..0000000
Binary files a/ch03/01_main-chapter-code/figures/dot-product.png and /dev/null differ
diff --git a/ch03/01_main-chapter-code/figures/dropout.png b/ch03/01_main-chapter-code/figures/dropout.png
deleted file mode 100644
index d139372..0000000
Binary files a/ch03/01_main-chapter-code/figures/dropout.png and /dev/null differ
diff --git a/ch03/01_main-chapter-code/figures/masked.png b/ch03/01_main-chapter-code/figures/masked.png
deleted file mode 100644
index 6b5d7ea..0000000
Binary files a/ch03/01_main-chapter-code/figures/masked.png and /dev/null differ
diff --git a/ch03/01_main-chapter-code/figures/multi-head.png b/ch03/01_main-chapter-code/figures/multi-head.png
deleted file mode 100644
index 53fe6c5..0000000
Binary files a/ch03/01_main-chapter-code/figures/multi-head.png and /dev/null differ
diff --git a/ch03/01_main-chapter-code/figures/single-head.png b/ch03/01_main-chapter-code/figures/single-head.png
deleted file mode 100644
index 8bb3367..0000000
Binary files a/ch03/01_main-chapter-code/figures/single-head.png and /dev/null differ
diff --git a/ch03/01_main-chapter-code/figures/weight-selfattn-1.png b/ch03/01_main-chapter-code/figures/weight-selfattn-1.png
deleted file mode 100644
index 54190e7..0000000
Binary files a/ch03/01_main-chapter-code/figures/weight-selfattn-1.png and /dev/null differ
diff --git a/ch03/01_main-chapter-code/figures/weight-selfattn-2.png b/ch03/01_main-chapter-code/figures/weight-selfattn-2.png
deleted file mode 100644
index e4db5bd..0000000
Binary files a/ch03/01_main-chapter-code/figures/weight-selfattn-2.png and /dev/null differ
diff --git a/ch03/01_main-chapter-code/figures/weight-selfattn-3.png b/ch03/01_main-chapter-code/figures/weight-selfattn-3.png
deleted file mode 100644
index 7f68655..0000000
Binary files a/ch03/01_main-chapter-code/figures/weight-selfattn-3.png and /dev/null differ
diff --git a/ch03/01_main-chapter-code/figures/weight-selfattn-4.png b/ch03/01_main-chapter-code/figures/weight-selfattn-4.png
deleted file mode 100644
index 40d538d..0000000
Binary files a/ch03/01_main-chapter-code/figures/weight-selfattn-4.png and /dev/null differ
diff --git a/ch04/01_main-chapter-code/ch04.ipynb b/ch04/01_main-chapter-code/ch04.ipynb
index ef4a388..4304e38 100644
--- a/ch04/01_main-chapter-code/ch04.ipynb
+++ b/ch04/01_main-chapter-code/ch04.ipynb
@@ -49,7 +49,7 @@
    "id": "7d4f11e0-4434-4979-9dee-e1207df0eb01",
    "metadata": {},
    "source": [
-    "<img src=\"figures/mental-model.webp\" width=450px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/01.webp\" width=\"400px\">"
    ]
   },
   {
@@ -76,7 +76,7 @@
    "id": "5c5213e9-bd1c-437e-aee8-f5e8fb717251",
    "metadata": {},
    "source": [
-    "<img src=\"figures/mental-model-2.webp\" width=350px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/02.webp\" width=\"400px\">"
    ]
   },
   {
@@ -136,7 +136,7 @@
    "id": "4adce779-857b-4418-9501-12a7f3818d88",
    "metadata": {},
    "source": [
-    "<img src=\"figures/chapter-steps.webp\" width=350px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/03.webp\" width=\"400px\">"
    ]
   },
   {
@@ -204,7 +204,7 @@
    "id": "9665e8ab-20ca-4100-b9b9-50d9bdee33be",
    "metadata": {},
    "source": [
-    "<img src=\"figures/gpt-in-out.webp\" width=350px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/04.webp\" width=\"400px\">"
    ]
   },
   {
@@ -294,7 +294,7 @@
    "id": "314ac47a-69cc-4597-beeb-65bed3b5910f",
    "metadata": {},
    "source": [
-    "<img src=\"figures/layernorm.webp\" width=350px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/05.webp\" width=\"400px\">"
    ]
   },
   {
@@ -380,7 +380,7 @@
    "id": "570db83a-205c-4f6f-b219-1f6195dde1a7",
    "metadata": {},
    "source": [
-    "<img src=\"figures/layernorm2.webp\" width=350px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/06.webp\" width=\"400px\">"
    ]
   },
   {
@@ -551,7 +551,7 @@
    "id": "e136cfc4-7c89-492e-b120-758c272bca8c",
    "metadata": {},
    "source": [
-    "<img src=\"figures/overview-after-ln.webp\" width=350px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/07.webp\" width=\"400px\">"
    ]
   },
   {
@@ -696,7 +696,7 @@
    "id": "fdcaacfa-3cfc-4c9e-b668-b71a2753145a",
    "metadata": {},
    "source": [
-    "<img src=\"figures/ffn.webp\" width=350px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/09.webp\" width=\"400px\">"
    ]
   },
   {
@@ -727,7 +727,15 @@
    "id": "8f8756c5-6b04-443b-93d0-e555a316c377",
    "metadata": {},
    "source": [
-    "<img src=\"figures/mental-model-3.webp\" width=350px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/10.webp\" width=\"400px\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5da2a50-04f4-4388-af23-ad32e405a972",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/11.webp\" width=\"400px\">"
    ]
   },
   {
@@ -749,7 +757,7 @@
     "- This is achieved by adding the output of one layer to the output of a later layer, usually skipping one or more layers in between\n",
     "- Let's illustrate this idea with a small example network:\n",
     "\n",
-    "<img src=\"figures/shortcut-example.webp\" width=350px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/12.webp\" width=\"400px\">"
    ]
   },
   {
@@ -957,7 +965,7 @@
    "id": "36b64d16-94a6-4d13-8c85-9494c50478a9",
    "metadata": {},
    "source": [
-    "<img src=\"figures/transformer-block.webp\" width=350px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/13.webp\" width=\"400px\">"
    ]
   },
   {
@@ -1000,7 +1008,7 @@
    "id": "91f502e4-f3e4-40cb-8268-179eec002394",
    "metadata": {},
    "source": [
-    "<img src=\"figures/mental-model-final.webp\" width=350px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/15.webp\" width=\"400px\">"
    ]
   },
   {
@@ -1025,7 +1033,7 @@
    "id": "9b7b362d-f8c5-48d2-8ebd-722480ac5073",
    "metadata": {},
    "source": [
-    "<img src=\"figures/gpt.webp\" width=350px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/14.webp\" width=\"400px\">"
    ]
   },
   {
@@ -1288,7 +1296,7 @@
    "id": "caade12a-fe97-480f-939c-87d24044edff",
    "metadata": {},
    "source": [
-    "<img src=\"figures/iterative-gen.webp\" width=350px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/16.webp\" width=\"400px\">"
    ]
   },
   {
@@ -1307,7 +1315,7 @@
    "id": "7ee0f32c-c18c-445e-b294-a879de2aa187",
    "metadata": {},
    "source": [
-    "<img src=\"figures/generate-text.webp\" width=350px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/17.webp\" width=\"600px\">"
    ]
   },
   {
@@ -1353,7 +1361,7 @@
    "source": [
     "- The `generate_text_simple` above implements an iterative process, where it creates one token at a time\n",
     "\n",
-    "<img src=\"figures/iterative-generate.webp\" width=350px>"
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch04_compressed/18.webp\" width=\"600px\">"
    ]
   },
   {
@@ -1506,7 +1514,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.10.6"
   }
  },
  "nbformat": 4,
diff --git a/ch04/01_main-chapter-code/figures/chapter-steps.webp b/ch04/01_main-chapter-code/figures/chapter-steps.webp
deleted file mode 100644
index aa3c17c..0000000
Binary files a/ch04/01_main-chapter-code/figures/chapter-steps.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/ffn.webp b/ch04/01_main-chapter-code/figures/ffn.webp
deleted file mode 100644
index d686cb3..0000000
Binary files a/ch04/01_main-chapter-code/figures/ffn.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/generate-text.webp b/ch04/01_main-chapter-code/figures/generate-text.webp
deleted file mode 100644
index d7f38e6..0000000
Binary files a/ch04/01_main-chapter-code/figures/generate-text.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/gpt-in-out.webp b/ch04/01_main-chapter-code/figures/gpt-in-out.webp
deleted file mode 100644
index 9127f29..0000000
Binary files a/ch04/01_main-chapter-code/figures/gpt-in-out.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/gpt.webp b/ch04/01_main-chapter-code/figures/gpt.webp
deleted file mode 100644
index f1a39fa..0000000
Binary files a/ch04/01_main-chapter-code/figures/gpt.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/iterative-gen.webp b/ch04/01_main-chapter-code/figures/iterative-gen.webp
deleted file mode 100644
index fa5fad2..0000000
Binary files a/ch04/01_main-chapter-code/figures/iterative-gen.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/iterative-generate.webp b/ch04/01_main-chapter-code/figures/iterative-generate.webp
deleted file mode 100644
index 8a1eb1b..0000000
Binary files a/ch04/01_main-chapter-code/figures/iterative-generate.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/layernorm.webp b/ch04/01_main-chapter-code/figures/layernorm.webp
deleted file mode 100644
index 38cf3a7..0000000
Binary files a/ch04/01_main-chapter-code/figures/layernorm.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/layernorm2.webp b/ch04/01_main-chapter-code/figures/layernorm2.webp
deleted file mode 100644
index 55821a5..0000000
Binary files a/ch04/01_main-chapter-code/figures/layernorm2.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/mental-model-2.webp b/ch04/01_main-chapter-code/figures/mental-model-2.webp
deleted file mode 100644
index 639175f..0000000
Binary files a/ch04/01_main-chapter-code/figures/mental-model-2.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/mental-model-3.webp b/ch04/01_main-chapter-code/figures/mental-model-3.webp
deleted file mode 100644
index cbf41d6..0000000
Binary files a/ch04/01_main-chapter-code/figures/mental-model-3.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/mental-model-final.webp b/ch04/01_main-chapter-code/figures/mental-model-final.webp
deleted file mode 100644
index 03817ec..0000000
Binary files a/ch04/01_main-chapter-code/figures/mental-model-final.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/mental-model.webp b/ch04/01_main-chapter-code/figures/mental-model.webp
deleted file mode 100644
index 16a9c19..0000000
Binary files a/ch04/01_main-chapter-code/figures/mental-model.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/overview-after-ln.webp b/ch04/01_main-chapter-code/figures/overview-after-ln.webp
deleted file mode 100644
index ed6530e..0000000
Binary files a/ch04/01_main-chapter-code/figures/overview-after-ln.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/shortcut-example.webp b/ch04/01_main-chapter-code/figures/shortcut-example.webp
deleted file mode 100644
index 32e7018..0000000
Binary files a/ch04/01_main-chapter-code/figures/shortcut-example.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/transformer-block.webp b/ch04/01_main-chapter-code/figures/transformer-block.webp
deleted file mode 100644
index 6dc41d0..0000000
Binary files a/ch04/01_main-chapter-code/figures/transformer-block.webp and /dev/null differ
diff --git a/ch04/01_main-chapter-code/figures/use-gpt.webp b/ch04/01_main-chapter-code/figures/use-gpt.webp
deleted file mode 100644
index b83a4b7..0000000
Binary files a/ch04/01_main-chapter-code/figures/use-gpt.webp and /dev/null differ