mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2026-04-10 12:33:42 +00:00
Use figure numbers in ch05-7 (#881)
This commit is contained in:
committed by
GitHub
parent
bf039ff3dc
commit
b969b3ef7a
@@ -75,7 +75,7 @@
|
|||||||
"id": "efd27fcc-2886-47cb-b544-046c2c31f02a",
|
"id": "efd27fcc-2886-47cb-b544-046c2c31f02a",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/chapter-overview.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/01.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -91,7 +91,7 @@
|
|||||||
"id": "f67711d4-8391-4fee-aeef-07ea53dd5841",
|
"id": "f67711d4-8391-4fee-aeef-07ea53dd5841",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/mental-model--0.webp\" width=400px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/02.webp\" width=400px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -195,7 +195,7 @@
|
|||||||
"id": "741881f3-cee0-49ad-b11d-b9df3b3ac234",
|
"id": "741881f3-cee0-49ad-b11d-b9df3b3ac234",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/gpt-process.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/03.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -346,7 +346,7 @@
|
|||||||
"id": "384d86a9-0013-476c-bb6b-274fd5f20b29",
|
"id": "384d86a9-0013-476c-bb6b-274fd5f20b29",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/proba-to-text.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/04.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -440,7 +440,7 @@
|
|||||||
"id": "ad90592f-0d5d-4ec8-9ff5-e7675beab10e",
|
"id": "ad90592f-0d5d-4ec8-9ff5-e7675beab10e",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/proba-index.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/06.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -601,7 +601,7 @@
|
|||||||
"id": "5bd24b7f-b760-47ad-bc84-86d13794aa54",
|
"id": "5bd24b7f-b760-47ad-bc84-86d13794aa54",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/cross-entropy.webp?123\" width=400px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/07.webp\" width=400px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -945,7 +945,7 @@
|
|||||||
"id": "46bdaa07-ba96-4ac1-9d71-b3cc153910d9",
|
"id": "46bdaa07-ba96-4ac1-9d71-b3cc153910d9",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/batching.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/09.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1210,7 +1210,7 @@
|
|||||||
"id": "43875e95-190f-4b17-8f9a-35034ba649ec",
|
"id": "43875e95-190f-4b17-8f9a-35034ba649ec",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/mental-model-1.webp\" width=400px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/10.webp\" width=400px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1231,7 +1231,7 @@
|
|||||||
"- In this section, we finally implement the code for training the LLM\n",
|
"- In this section, we finally implement the code for training the LLM\n",
|
||||||
"- We focus on a simple training function (if you are interested in augmenting this training function with more advanced techniques, such as learning rate warmup, cosine annealing, and gradient clipping, please refer to [Appendix D](../../appendix-D/01_main-chapter-code))\n",
|
"- We focus on a simple training function (if you are interested in augmenting this training function with more advanced techniques, such as learning rate warmup, cosine annealing, and gradient clipping, please refer to [Appendix D](../../appendix-D/01_main-chapter-code))\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/train-steps.webp\" width=300px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/11.webp\" width=300px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1464,7 +1464,7 @@
|
|||||||
"id": "eb380c42-b31c-4ee1-b8b9-244094537272",
|
"id": "eb380c42-b31c-4ee1-b8b9-244094537272",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/mental-model-2.webp\" width=350px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/13.webp\" width=350px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1849,7 +1849,7 @@
|
|||||||
"id": "7ae6fffd-2730-4abe-a2d3-781fc4836f17",
|
"id": "7ae6fffd-2730-4abe-a2d3-781fc4836f17",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/topk.webp\" width=500px>\n",
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/15.webp\" width=500px>\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- (Please note that the numbers in this figure are truncated to two\n",
|
"- (Please note that the numbers in this figure are truncated to two\n",
|
||||||
"digits after the decimal point to reduce visual clutter. The values in the Softmax row should add up to 1.0.)"
|
"digits after the decimal point to reduce visual clutter. The values in the Softmax row should add up to 1.0.)"
|
||||||
@@ -2060,7 +2060,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"- Training LLMs is computationally expensive, so it's crucial to be able to save and load LLM weights\n",
|
"- Training LLMs is computationally expensive, so it's crucial to be able to save and load LLM weights\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/mental-model-3.webp\" width=400px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/16.webp\" width=400px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -2393,7 +2393,7 @@
|
|||||||
"id": "20f19d32-5aae-4176-9f86-f391672c8f0d",
|
"id": "20f19d32-5aae-4176-9f86-f391672c8f0d",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/gpt-sizes.webp?timestamp=123\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/17.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -2627,7 +2627,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.10.16"
|
"version": "3.13.5"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
@@ -76,7 +76,7 @@
|
|||||||
"id": "a445828a-ff10-4efa-9f60-a2e2aed4c87d",
|
"id": "a445828a-ff10-4efa-9f60-a2e2aed4c87d",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/chapter-overview.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/01.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -113,7 +113,7 @@
|
|||||||
"id": "6c29ef42-46d9-43d4-8bb4-94974e1665e4",
|
"id": "6c29ef42-46d9-43d4-8bb4-94974e1665e4",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/instructions.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/02.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -132,7 +132,7 @@
|
|||||||
"id": "0b37a0c4-0bb1-4061-b1fe-eaa4416d52c3",
|
"id": "0b37a0c4-0bb1-4061-b1fe-eaa4416d52c3",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/spam-non-spam.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/03.webp\" width=400px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -150,7 +150,7 @@
|
|||||||
"id": "5f628975-d2e8-4f7f-ab38-92bb868b7067",
|
"id": "5f628975-d2e8-4f7f-ab38-92bb868b7067",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-1.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/04.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -712,7 +712,7 @@
|
|||||||
"id": "0829f33f-1428-4f22-9886-7fee633b3666",
|
"id": "0829f33f-1428-4f22-9886-7fee633b3666",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/pad-input-sequences.webp?123\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/06.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -887,7 +887,7 @@
|
|||||||
"id": "64bcc349-205f-48f8-9655-95ff21f5e72f",
|
"id": "64bcc349-205f-48f8-9655-95ff21f5e72f",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/batch.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/07.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1019,7 +1019,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"- In this section, we initialize the pretrained model we worked with in the previous chapter\n",
|
"- In this section, we initialize the pretrained model we worked with in the previous chapter\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-2.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/08.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1217,7 +1217,7 @@
|
|||||||
"id": "d6e9d66f-76b2-40fc-9ec5-3f972a8db9c0",
|
"id": "d6e9d66f-76b2-40fc-9ec5-3f972a8db9c0",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/lm-head.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/09.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1550,7 +1550,7 @@
|
|||||||
"id": "0be7c1eb-c46c-4065-8525-eea1b8c66d10",
|
"id": "0be7c1eb-c46c-4065-8525-eea1b8c66d10",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/trainable.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/10.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1661,7 +1661,7 @@
|
|||||||
"id": "7df9144f-6817-4be4-8d4b-5d4dadfe4a9b",
|
"id": "7df9144f-6817-4be4-8d4b-5d4dadfe4a9b",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/input-and-output.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/11.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1704,7 +1704,7 @@
|
|||||||
"id": "8df08ae0-e664-4670-b7c5-8a2280d9b41b",
|
"id": "8df08ae0-e664-4670-b7c5-8a2280d9b41b",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/attention-mask.webp\" width=200px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/12.webp\" width=200px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1720,7 +1720,7 @@
|
|||||||
"id": "669e1fd1-ace8-44b4-b438-185ed0ba8b33",
|
"id": "669e1fd1-ace8-44b4-b438-185ed0ba8b33",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-3.webp?1\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/13.webp\" width=300px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1736,7 +1736,7 @@
|
|||||||
"id": "557996dd-4c6b-49c4-ab83-f60ef7e1d69e",
|
"id": "557996dd-4c6b-49c4-ab83-f60ef7e1d69e",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/class-argmax.webp\" width=600px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/14.webp\" width=600px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -2053,7 +2053,7 @@
|
|||||||
"id": "979b6222-1dc2-4530-9d01-b6b04fe3de12",
|
"id": "979b6222-1dc2-4530-9d01-b6b04fe3de12",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/training-loop.webp?1\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/15.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -2371,7 +2371,7 @@
|
|||||||
"id": "72ebcfa2-479e-408b-9cf0-7421f6144855",
|
"id": "72ebcfa2-479e-408b-9cf0-7421f6144855",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-4.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/18.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -2590,7 +2590,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.10.16"
|
"version": "3.13.5"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
@@ -79,7 +79,7 @@
|
|||||||
"id": "264fca98-2f9a-4193-b435-2abfa3b4142f"
|
"id": "264fca98-2f9a-4193-b435-2abfa3b4142f"
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/overview.webp?1\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/01.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -111,7 +111,7 @@
|
|||||||
"id": "18dc0535-0904-44ed-beaf-9b678292ef35"
|
"id": "18dc0535-0904-44ed-beaf-9b678292ef35"
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/instruction-following.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/02.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -123,7 +123,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"- The topics covered in this chapter are summarized in the figure below\n",
|
"- The topics covered in this chapter are summarized in the figure below\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/chapter-overview-1.webp?1\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/03.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -312,7 +312,7 @@
|
|||||||
"id": "dffa4f70-44d4-4be4-89a9-2159f4885b10"
|
"id": "dffa4f70-44d4-4be4-89a9-2159f4885b10"
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/prompt-style.webp?1\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/04.webp?2\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -509,7 +509,7 @@
|
|||||||
"id": "233f63bd-9755-4d07-8884-5e2e5345cf27"
|
"id": "233f63bd-9755-4d07-8884-5e2e5345cf27"
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/chapter-overview-2.webp?1\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/05.webp?1\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -521,7 +521,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"- We tackle this dataset batching in several steps, as summarized in the figure below\n",
|
"- We tackle this dataset batching in several steps, as summarized in the figure below\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/detailed-batching.webp?1\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/06.webp?1\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -533,7 +533,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"- First, we implement an `InstructionDataset` class that pre-tokenizes all inputs in the dataset, similar to the `SpamDataset` in chapter 6\n",
|
"- First, we implement an `InstructionDataset` class that pre-tokenizes all inputs in the dataset, similar to the `SpamDataset` in chapter 6\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/pretokenizing.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/07.webp?1\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -627,7 +627,7 @@
|
|||||||
"id": "65c4d943-4aa8-4a44-874e-05bc6831fbd3"
|
"id": "65c4d943-4aa8-4a44-874e-05bc6831fbd3"
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/padding.webp\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/08.webp?1\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -710,12 +710,10 @@
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"id": "c46832ab-39b7-45f8-b330-ac9adfa10d1b",
|
"id": "5673ade5-be4c-4a2c-9a9a-d5c63fb1c424",
|
||||||
"metadata": {
|
"metadata": {},
|
||||||
"id": "c46832ab-39b7-45f8-b330-ac9adfa10d1b"
|
|
||||||
},
|
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/batching-step-4.webp?1\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/09.webp?1\" width=400px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -736,7 +734,7 @@
|
|||||||
"id": "0386b6fe-3455-4e70-becd-a5a4681ba2ef"
|
"id": "0386b6fe-3455-4e70-becd-a5a4681ba2ef"
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/inputs-targets.webp?1\" width=400px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/10.webp?1\" width=400px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -819,7 +817,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"- Next, we introduce an `ignore_index` value to replace all padding token IDs with a new value; the purpose of this `ignore_index` is that we can ignore padding values in the loss function (more on that later)\n",
|
"- Next, we introduce an `ignore_index` value to replace all padding token IDs with a new value; the purpose of this `ignore_index` is that we can ignore padding values in the loss function (more on that later)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/batching-step-5.webp?1\" width=500px>\n",
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/11.webp?1\" width=400px>\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- Concretely, this means that we replace the token IDs corresponding to `50256` with `-100` as illustrated below"
|
"- Concretely, this means that we replace the token IDs corresponding to `50256` with `-100` as illustrated below"
|
||||||
]
|
]
|
||||||
@@ -831,7 +829,7 @@
|
|||||||
"id": "bd4bed33-956e-4b3f-a09c-586d8203109a"
|
"id": "bd4bed33-956e-4b3f-a09c-586d8203109a"
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/ignore-index.webp?1\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/12.webp?2\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1085,7 +1083,7 @@
|
|||||||
"id": "fab8f0ed-80e8-4fd9-bf84-e5d0e0bc0a39"
|
"id": "fab8f0ed-80e8-4fd9-bf84-e5d0e0bc0a39"
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/mask-instructions.webp?1\" width=600px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/13.webp\" width=600px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1095,6 +1093,7 @@
|
|||||||
"id": "bccaf048-ec95-498c-9155-d5b3ccba6c96"
|
"id": "bccaf048-ec95-498c-9155-d5b3ccba6c96"
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
|
" \n",
|
||||||
"## 7.4 Creating data loaders for an instruction dataset"
|
"## 7.4 Creating data loaders for an instruction dataset"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -1115,7 +1114,7 @@
|
|||||||
"id": "9fffe390-b226-4d5c-983f-9f4da773cb82"
|
"id": "9fffe390-b226-4d5c-983f-9f4da773cb82"
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/chapter-overview-3.webp?1\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/14.webp\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1515,7 +1514,7 @@
|
|||||||
"id": "8d1b438f-88af-413f-96a9-f059c6c55fc4"
|
"id": "8d1b438f-88af-413f-96a9-f059c6c55fc4"
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/chapter-overview-4.webp?1\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/15.webp?1\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1746,7 +1745,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"- In this section, we finetune the model\n",
|
"- In this section, we finetune the model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/chapter-overview-5.webp?1\" width=500px>\n",
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/16.webp\" width=500px>\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- Note that we can reuse all the loss calculation and training functions that we used in previous chapters"
|
"- Note that we can reuse all the loss calculation and training functions that we used in previous chapters"
|
||||||
]
|
]
|
||||||
@@ -2015,7 +2014,7 @@
|
|||||||
"id": "5a25cc88-1758-4dd0-b8bf-c044cbf2dd49"
|
"id": "5a25cc88-1758-4dd0-b8bf-c044cbf2dd49"
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/chapter-overview-6.webp?1\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/18.webp?1\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -2271,7 +2270,7 @@
|
|||||||
"id": "805b9d30-7336-499f-abb5-4a21be3129f5"
|
"id": "805b9d30-7336-499f-abb5-4a21be3129f5"
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/chapter-overview-7.webp?1\" width=500px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/19.webp?1\" width=500px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -2309,7 +2308,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"- In general, before we can use ollama from the command line, we have to either start the ollama application or run `ollama serve` in a separate terminal\n",
|
"- In general, before we can use ollama from the command line, we have to either start the ollama application or run `ollama serve` in a separate terminal\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/ollama-run.webp?1\" width=700px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/20.webp?1\" width=700px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -2854,7 +2853,7 @@
|
|||||||
"- This marks the final chapter of this book\n",
|
"- This marks the final chapter of this book\n",
|
||||||
"- We covered the major steps of the LLM development cycle: implementing an LLM architecture, pretraining an LLM, and finetuning it\n",
|
"- We covered the major steps of the LLM development cycle: implementing an LLM architecture, pretraining an LLM, and finetuning it\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/final-overview.webp?1\" width=500px>\n",
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/21.webp?1\" width=500px>\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- An optional step that is sometimes followed after instruction finetuning, as described in this chapter, is preference finetuning\n",
|
"- An optional step that is sometimes followed after instruction finetuning, as described in this chapter, is preference finetuning\n",
|
||||||
"- Preference finetuning process can be particularly useful for customizing a model to better align with specific user preferences; see the [../04_preference-tuning-with-dpo](../04_preference-tuning-with-dpo) folder if you are interested in this\n",
|
"- Preference finetuning process can be particularly useful for customizing a model to better align with specific user preferences; see the [../04_preference-tuning-with-dpo](../04_preference-tuning-with-dpo) folder if you are interested in this\n",
|
||||||
@@ -2929,7 +2928,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.10.16"
|
"version": "3.13.5"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
Reference in New Issue
Block a user