Reduce Llama 3 RoPE memory requirements (#658)

* Llama3 from scratch improvements

* Fix Llama 3 expensive RoPE memory issue

* updates

* update package

* benchmark

* remove unused rescale_theta
This commit is contained in:
Sebastian Raschka
2025-06-12 11:08:02 -05:00
committed by GitHub
parent c278745aff
commit c4cde1c21b
9 changed files with 405 additions and 2577 deletions

3
.gitignore vendored
View File

@@ -51,6 +51,9 @@ ch05/07_gpt_to_llama/Llama-3.2-3B-Instruct
ch05/10_llm-training-speed/middlemarch.txt
ch05/10_llm-training-speed/loss.pdf
ch05/10_llm-training-speed/model.pth
ch05/07_gpt_to_llama/Untitled.ipynb
ch05/07_gpt_to_llama/llama3.2-1B-instruct.pth
ch05/07_gpt_to_llama/tokenizer.model
ch06/01_main-chapter-code/gpt2
ch06/02_bonus_additional-experiments/gpt2