Add alternative attention structure (#880)

2026-04-10 12:33:42 +00:00 · 2025-10-13 14:31:13 -05:00
parent 6eb6adfa33
commit bf039ff3dc
2 changed files with 17 additions and 3 deletions
--- a/README.md
+++ b/README.md
@@ -168,9 +168,10 @@ Several folders contain optional materials as a bonus for interested readers:
 - **Chapter 4: Implementing a GPT model from scratch**
  - [FLOPS Analysis](ch04/02_performance-analysis/flops-analysis.ipynb)
  - [KV Cache](ch04/03_kv-cache)
-  - [Grouped-Query Attention](ch04/04_gqa)
-  - [Multi-Head Latent Attention](ch04/05_mla)
-  - [Sliding Window Attention](ch04/06_swa)
+  - [Attention alternatives](ch04/#attention-alternatives)
+    - [Grouped-Query Attention](ch04/04_gqa)
+    - [Multi-Head Latent Attention](ch04/05_mla)
+    - [Sliding Window Attention](ch04/06_swa)
 - **Chapter 5: Pretraining on unlabeled data:**
  - [Alternative Weight Loading Methods](ch05/02_alternative_weight_loading/)
  - [Pretraining GPT on the Project Gutenberg Dataset](ch05/03_bonus_pretraining_on_gutenberg)