diff --git a/ch02/01_main-chapter-code/ch02.ipynb b/ch02/01_main-chapter-code/ch02.ipynb
index 461d920..82ad59b 100644
--- a/ch02/01_main-chapter-code/ch02.ipynb
+++ b/ch02/01_main-chapter-code/ch02.ipynb
@@ -867,7 +867,7 @@
     "- For instance, if GPT-2's vocabulary doesn't have the word \"unfamiliarword,\" it might tokenize it as [\"unfam\", \"iliar\", \"word\"] or some other subword breakdown, depending on its trained BPE merges\n",
     "- The original BPE tokenizer can be found here: [https://github.com/openai/gpt-2/blob/master/src/encoder.py](https://github.com/openai/gpt-2/blob/master/src/encoder.py)\n",
     "- In this chapter, we are using the BPE tokenizer from OpenAI's open-source [tiktoken](https://github.com/openai/tiktoken) library, which implements its core algorithms in Rust to improve computational performance\n",
-    "- I created a notebook in the [./bytepair_encoder](./bytepair_encoder) that compares these two implementations side-by-side (tiktoken was about 5x faster on the sample text)"
+    "- I created a notebook in the [./bytepair_encoder](../02_bonus_bytepair-encoder) that compares these two implementations side-by-side (tiktoken was about 5x faster on the sample text)"
    ]
   },
   {
@@ -1429,7 +1429,7 @@
    "id": "26fcf4f5-0801-4eb4-bb90-acce87935ac7",
    "metadata": {},
    "source": [
-    "- For those who are familiar with one-hot encoding, the embedding layer approach above is essentially just a more efficient way of implementing one-hot encoding followed by matrix multiplication in a fully-connected layer, which is described in the supplementary code in [./embedding_vs_matmul](./embedding_vs_matmul)\n",
+    "- For those who are familiar with one-hot encoding, the embedding layer approach above is essentially just a more efficient way of implementing one-hot encoding followed by matrix multiplication in a fully-connected layer, which is described in the supplementary code in [./embedding_vs_matmul](../03_bonus_embedding-vs-matmul)\n",
     "- Because the embedding layer is just a more efficient implementation that is equivalent to the one-hot encoding and matrix-multiplication approach it can be seen as a neural network layer that can be optimized via backpropagation"
    ]
   },
@@ -1722,7 +1722,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.12"
+   "version": "3.12.2"
   }
  },
  "nbformat": 4,