Commit Graph

4 Commits

Author SHA1 Message Date
Sebastian Raschka
28a8408d4d Update README wrt multi-query attention
Clarified the implications of using multi-query attention on modeling performance and memory usage.
2025-11-17 16:39:32 -06:00
Sebastian Raschka
9b9586688d Multi-Head Latent Attention (#876)
* Multi-Head Latent Attention

* update
2025-10-11 20:08:30 -05:00
Sebastian Raschka
bf27ad1485 Use GB instead of GiB consistently (#875) 2025-10-11 09:11:33 -05:00
Sebastian Raschka
c814814d72 Grouped-Query Attention memory (#874)
* GQA memory

* remove redundant code

* update links

* update
2025-10-11 08:44:19 -05:00