6 Commits

Author SHA1 Message Date
rasbt
4612d20fa8 User argpars utils to show default args on command line 2026-03-01 20:15:21 -06:00
Sebastian Raschka
be5e2a3331 Readability and code quality improvements (#959)
* Consistent dataset naming

* consistent section headers
2026-02-17 18:44:56 -06:00
Sebastian Raschka
28a8408d4d Update README wrt multi-query attention
Clarified the implications of using multi-query attention on modeling performance and memory usage.
2025-11-17 16:39:32 -06:00
Sebastian Raschka
9b9586688d Multi-Head Latent Attention (#876)
* Multi-Head Latent Attention

* update
2025-10-11 20:08:30 -05:00
Sebastian Raschka
bf27ad1485 Use GB instead of GiB consistently (#875) 2025-10-11 09:11:33 -05:00
Sebastian Raschka
c814814d72 Grouped-Query Attention memory (#874)
* GQA memory

* remove redundant code

* update links

* update
2025-10-11 08:44:19 -05:00