Reduce Llama 3 RoPE memory requirements (#658)

* Llama3 from scratch improvements * Fix Llama 3 expensive RoPE memory issue * updates * update package * benchmark * remove unused rescale_theta
2026-04-10 12:33:42 +00:00 · 2025-06-12 11:08:02 -05:00
parent c278745aff
commit c4cde1c21b
9 changed files with 405 additions and 2577 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -51,6 +51,9 @@ ch05/07_gpt_to_llama/Llama-3.2-3B-Instruct
 ch05/10_llm-training-speed/middlemarch.txt
 ch05/10_llm-training-speed/loss.pdf
 ch05/10_llm-training-speed/model.pth
+ch05/07_gpt_to_llama/Untitled.ipynb
+ch05/07_gpt_to_llama/llama3.2-1B-instruct.pth
+ch05/07_gpt_to_llama/tokenizer.model

 ch06/01_main-chapter-code/gpt2
 ch06/02_bonus_additional-experiments/gpt2