mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2026-04-10 12:33:42 +00:00
Add and link bonus material (#84)
This commit is contained in:
committed by
GitHub
parent
35c6e12730
commit
cf39abac04
5
ch05/04_learning_rate_schedulers/README.md
Normal file
5
ch05/04_learning_rate_schedulers/README.md
Normal file
@@ -0,0 +1,5 @@
|
||||
# Adding Bells and Whistles to the Training Loop
|
||||
|
||||
The main chapter used a relatively simple training function to keep the code readable and fit Chapter 5 within the page limits. Optionally, we can add a linear warm-up, a cosine decay schedule, and gradient clipping to improve the training stability and convergence.
|
||||
|
||||
You can find the code for this more sophisticated training function in [Appendix D: Adding Bells and Whistles to the Training Loop](../../appendix-D/01_main-chapter-code/appendix-D.ipynb).
|
||||
10
ch05/05_bonus_hparam_tuning/README.md
Normal file
10
ch05/05_bonus_hparam_tuning/README.md
Normal file
@@ -0,0 +1,10 @@
|
||||
# Optimizing Hyperparameters for Pretraining
|
||||
|
||||
The [hparam_search.py](hparam_search.py) is script based on the extended training function in [
|
||||
Appendix D: Adding Bells and Whistles to the Training Loop](../appendix-D/01_main-chapter-code/appendix-D.ipynb) to find optimal hyperparameters via grid search
|
||||
|
||||
The [hparam_search.py](hparam_search.py) script, based on the extended training function in [
|
||||
Appendix D: Adding Bells and Whistles to the Training Loop](../appendix-D/01_main-chapter-code/appendix-D.ipynb), is designed to find optimal hyperparameters via grid search.
|
||||
|
||||
>[!NOTE]
|
||||
This script will take a long time to run. You may want to reduce the number of hyperparameter configurations explored in the `HPARAM_GRID` dictionary at the top.
|
||||
@@ -3,4 +3,5 @@
|
||||
- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code
|
||||
- [02_alternative_weight_loading](02_alternative_weight_loading) contains code to load the GPT model weights from alternative places in case the model weights become unavailable from OpenAI
|
||||
- [03_bonus_pretraining_on_gutenberg](03_bonus_pretraining_on_gutenberg) contains code to pretrain the LLM longer on the whole corpus of books from Project Gutenberg
|
||||
- [04_hparam_tuning](04_hparam_tuning) contains an optional hyperparameter tuning script
|
||||
- [04_learning_rate_schedulers] contains code implementing a more sophisticated training function including learning rate schedulers and gradient clipping
|
||||
- [05_hparam_tuning](05_hparam_tuning) contains an optional hyperparameter tuning script
|
||||
Reference in New Issue
Block a user