mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2026-04-10 12:33:42 +00:00
Add alternative attention structure (#880)
This commit is contained in:
committed by
GitHub
parent
6eb6adfa33
commit
bf039ff3dc
@@ -168,9 +168,10 @@ Several folders contain optional materials as a bonus for interested readers:
|
||||
- **Chapter 4: Implementing a GPT model from scratch**
|
||||
- [FLOPS Analysis](ch04/02_performance-analysis/flops-analysis.ipynb)
|
||||
- [KV Cache](ch04/03_kv-cache)
|
||||
- [Grouped-Query Attention](ch04/04_gqa)
|
||||
- [Multi-Head Latent Attention](ch04/05_mla)
|
||||
- [Sliding Window Attention](ch04/06_swa)
|
||||
- [Attention alternatives](ch04/#attention-alternatives)
|
||||
- [Grouped-Query Attention](ch04/04_gqa)
|
||||
- [Multi-Head Latent Attention](ch04/05_mla)
|
||||
- [Sliding Window Attention](ch04/06_swa)
|
||||
- **Chapter 5: Pretraining on unlabeled data:**
|
||||
- [Alternative Weight Loading Methods](ch05/02_alternative_weight_loading/)
|
||||
- [Pretraining GPT on the Project Gutenberg Dataset](ch05/03_bonus_pretraining_on_gutenberg)
|
||||
|
||||
Reference in New Issue
Block a user