diff --git a/ch06/01_main-chapter-code/ch06.ipynb b/ch06/01_main-chapter-code/ch06.ipynb index 546fd66..fecff3f 100644 --- a/ch06/01_main-chapter-code/ch06.ipynb +++ b/ch06/01_main-chapter-code/ch06.ipynb @@ -1440,6 +1440,15 @@ "print(\"Outputs dimensions:\", outputs.shape) # shape: (batch_size, num_tokens, num_classes)" ] }, + { + "cell_type": "markdown", + "id": "75430a01-ef9c-426a-aca0-664689c4f461", + "metadata": {}, + "source": [ + "- As discussed in previous chapters, for each input token, there's one output vector\n", + "- Since we fed the model a text sample with 4 input tokens, the output consists of 4 2-dimensional output vectors above" + ] + }, { "cell_type": "markdown", "id": "7df9144f-6817-4be4-8d4b-5d4dadfe4a9b", @@ -1453,11 +1462,9 @@ "id": "e3bb8616-c791-4f5c-bac0-5302f663e46a", "metadata": {}, "source": [ - "- As discussed in previous chapters, for each input token, there's one output vector\n", - "- Since we fed the model a text sample with 6 input tokens, the output consists of 6 2-dimensional output vectors above\n", "- In chapter 3, we discussed the attention mechanism, which connects each input token to each other input token\n", "- In chapter 3, we then also introduced the causal attention mask that is used in GPT-like models; this causal mask lets a current token only attend to the current and previous token positions\n", - "- Based on this causal attention mechanism, the 6th (last) token above contains the most information among all tokens because it's the only token that includes information about all other tokens\n", + "- Based on this causal attention mechanism, the 4th (last) token contains the most information among all tokens because it's the only token that includes information about all other tokens\n", "- Hence, we are particularly interested in this last token, which we will finetune for the spam classification task" ] }, @@ -2265,7 +2272,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.6" + "version": "3.10.12" } }, "nbformat": 4,