Add and link bonus material (#84)

2026-04-10 12:33:42 +00:00 · 2024-03-23 07:27:43 -05:00
parent 35c6e12730
commit cf39abac04
12 changed files with 110 additions and 13 deletions
--- a/ch05/04_learning_rate_schedulers/README.md
+++ b/ch05/04_learning_rate_schedulers/README.md
@@ -0,0 +1,5 @@
+# Adding Bells and Whistles to the Training Loop
+
+The main chapter used a relatively simple training function to keep the code readable and fit Chapter 5 within the page limits. Optionally, we can add a linear warm-up, a cosine decay schedule, and gradient clipping to improve the training stability and convergence.
+
+You can find the code for this more sophisticated training function in [Appendix D: Adding Bells and Whistles to the Training Loop](../../appendix-D/01_main-chapter-code/appendix-D.ipynb).
--- a/ch05/05_bonus_hparam_tuning/README.md
+++ b/ch05/05_bonus_hparam_tuning/README.md
@@ -0,0 +1,10 @@
+# Optimizing Hyperparameters for Pretraining
+
+The [hparam_search.py](hparam_search.py) is script based on the extended training function in [
+Appendix D: Adding Bells and Whistles to the Training Loop](../appendix-D/01_main-chapter-code/appendix-D.ipynb) to find optimal hyperparameters via grid search 
+
+The [hparam_search.py](hparam_search.py) script, based on the extended training function in [
+Appendix D: Adding Bells and Whistles to the Training Loop](../appendix-D/01_main-chapter-code/appendix-D.ipynb), is designed to find optimal hyperparameters via grid search.
+
+>[!NOTE]
+This script will take a long time to run. You may want to reduce the number of hyperparameter configurations explored in the `HPARAM_GRID` dictionary at the top.
--- a/ch05/05_bonus_hparam_tuning/hparam_search.py
+++ b/ch05/05_bonus_hparam_tuning/hparam_search.py
--- a/ch05/05_bonus_hparam_tuning/previous_chapters.py
+++ b/ch05/05_bonus_hparam_tuning/previous_chapters.py
--- a/ch05/05_bonus_hparam_tuning/the-verdict.txt
+++ b/ch05/05_bonus_hparam_tuning/the-verdict.txt
--- a/ch05/README.md
+++ b/ch05/README.md
@@ -3,4 +3,5 @@
 - [01_main-chapter-code](01_main-chapter-code) contains the main chapter code
 - [02_alternative_weight_loading](02_alternative_weight_loading) contains code to load the GPT model weights from alternative places in case the model weights become unavailable from OpenAI
 - [03_bonus_pretraining_on_gutenberg](03_bonus_pretraining_on_gutenberg) contains code to pretrain the LLM longer on the whole corpus of books from Project Gutenberg
- [04_hparam_tuning](04_hparam_tuning) contains an optional hyperparameter tuning script
+- [04_learning_rate_schedulers] contains code implementing a more sophisticated training function including learning rate schedulers and gradient clipping
+- [05_hparam_tuning](05_hparam_tuning) contains an optional hyperparameter tuning script