Add and link bonus material (#84)

This commit is contained in:
Sebastian Raschka
2024-03-23 07:27:43 -05:00
committed by GitHub
parent 35c6e12730
commit cf39abac04
12 changed files with 110 additions and 13 deletions

View File

@@ -0,0 +1,5 @@
# Adding Bells and Whistles to the Training Loop
The main chapter used a relatively simple training function to keep the code readable and fit Chapter 5 within the page limits. Optionally, we can add a linear warm-up, a cosine decay schedule, and gradient clipping to improve the training stability and convergence.
You can find the code for this more sophisticated training function in [Appendix D: Adding Bells and Whistles to the Training Loop](../../appendix-D/01_main-chapter-code/appendix-D.ipynb).

View File

@@ -0,0 +1,10 @@
# Optimizing Hyperparameters for Pretraining
The [hparam_search.py](hparam_search.py) is script based on the extended training function in [
Appendix D: Adding Bells and Whistles to the Training Loop](../appendix-D/01_main-chapter-code/appendix-D.ipynb) to find optimal hyperparameters via grid search
The [hparam_search.py](hparam_search.py) script, based on the extended training function in [
Appendix D: Adding Bells and Whistles to the Training Loop](../appendix-D/01_main-chapter-code/appendix-D.ipynb), is designed to find optimal hyperparameters via grid search.
>[!NOTE]
This script will take a long time to run. You may want to reduce the number of hyperparameter configurations explored in the `HPARAM_GRID` dictionary at the top.

View File

@@ -3,4 +3,5 @@
- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code
- [02_alternative_weight_loading](02_alternative_weight_loading) contains code to load the GPT model weights from alternative places in case the model weights become unavailable from OpenAI
- [03_bonus_pretraining_on_gutenberg](03_bonus_pretraining_on_gutenberg) contains code to pretrain the LLM longer on the whole corpus of books from Project Gutenberg
- [04_hparam_tuning](04_hparam_tuning) contains an optional hyperparameter tuning script
- [04_learning_rate_schedulers] contains code implementing a more sophisticated training function including learning rate schedulers and gradient clipping
- [05_hparam_tuning](05_hparam_tuning) contains an optional hyperparameter tuning script