Commit Graph

11 Commits

Author SHA1 Message Date
casinca
9c4be478f8 Optional weight tying for Qwen3 and Llama3.2 pretraining (#949)
* optional weight tying for Qwen3 and Llama3.2

* typo
2026-01-14 09:07:04 -06:00
Sebastian Raschka
e742d8af2c Improve MoE implementation (#841) 2025-09-22 15:21:06 -05:00
Sebastian Raschka
32965e0edd remove redundant next_cache (#817) 2025-09-11 15:16:08 -05:00
Sebastian Raschka
c7a4362ca4 Add defensive context trimming for multiturn (#815)
* Add defensive context trimming for multiturn

* add all mods
2025-09-09 20:19:00 -05:00
Sebastian Raschka
f92b40e4ab Qwen3 Coder Flash & MoE from Scratch (#760)
* Qwen3 Coder Flash & MoE from Scratch

* update

* refinements

* updates

* update

* update

* update
2025-08-01 19:13:17 -05:00
Sebastian Raschka
3c9dc4807b Simplify KV cache usage (#728)
* Simplify KV cache usage

* Swap mark text with ghostwriter
2025-07-08 12:56:55 -05:00
Sebastian Raschka
c4ec55edac Support different Qwen3 sizes in pkg (#714) 2025-06-28 08:00:23 -05:00
Sebastian Raschka
81eda38d3b Improve KV cache code for torch.compile (#705)
* Improve KV cache code for torch.compile

* cleanup

* cleanup
2025-06-23 18:08:49 -05:00
Sebastian Raschka
0a2e8c39c4 Qwen3 KV cache (#688) 2025-06-21 17:34:39 -05:00
Sebastian Raschka
fdc3e1b701 Add GPT-2 KV cache to pkg (#687) 2025-06-21 12:29:04 -05:00
Sebastian Raschka
3be0f3202a Llama 3 KV Cache (#685)
* Llama 3 KV Cache

* skip expensive tests on Gh actions

* Update __init__.py
2025-06-21 10:55:20 -05:00