Mixture-of-Experts intro (#888)

2026-04-10 12:33:42 +00:00 · 2025-10-19 22:17:59 -05:00
parent 27b6dfab9e
commit 218221ab62
13 changed files with 1333 additions and 228 deletions
--- a/README.md
+++ b/README.md
@@ -172,6 +172,7 @@ Several folders contain optional materials as a bonus for interested readers:
    - [Grouped-Query Attention](ch04/04_gqa)
    - [Multi-Head Latent Attention](ch04/05_mla)
    - [Sliding Window Attention](ch04/06_swa)
+  - [Mixture-of-Experts (MoE)](ch04/07_moe)
 - **Chapter 5: Pretraining on unlabeled data:**
  - [Alternative Weight Loading Methods](ch05/02_alternative_weight_loading/)
  - [Pretraining GPT on the Project Gutenberg Dataset](ch05/03_bonus_pretraining_on_gutenberg)