Commit Graph

112 Commits

Author SHA1 Message Date
rasbt
4612d20fa8 User argpars utils to show default args on command line 2026-03-01 20:15:21 -06:00
Sebastian Raschka
2d600ccb5b Use correct input in layernorm example (#960)
* Update CI

* Use correct example in layernorm section

* update
2026-02-18 21:35:57 -06:00
Sebastian Raschka
be5e2a3331 Readability and code quality improvements (#959)
* Consistent dataset naming

* consistent section headers
2026-02-17 18:44:56 -06:00
Sebastian Raschka
57430d2a13 Gated DeltaNet updates (#926) 2025-12-18 20:28:53 -06:00
talentJay-ux
d7f178d28b Sliding window KV Cache bug fix (#925)
1. Fix bug because of KV cache and GPT's ptr pointer doesn't get reset when window_size > context_length
2. Fix bug because of KV cache and GPT's ptr pointer doesn't get reset
3. Fix KV Cache import issue for gpt_with_kv_cache_optimized
2025-12-15 18:47:01 -06:00
Sebastian Raschka
a11965fbd9 Remove persistent flag from cache buffers (#916) 2025-11-24 20:10:02 -06:00
Sebastian Raschka
28a8408d4d Update README wrt multi-query attention
Clarified the implications of using multi-query attention on modeling performance and memory usage.
2025-11-17 16:39:32 -06:00
casinca
7d92267170 fix(GatedDeltaNet): Init param A from log of a uniform distrib (#906) 2025-11-09 14:22:52 -06:00
Sebastian Raschka
bcc73f731d n_heads × d_head -> d_head × d_head in DeltaNet (#903)
Clarified the explanation of the memory size calculation for `KV_cache_DeltaNet` and updated the quadratic term from `n_heads × d_head` to `d_head × d_head`.
2025-11-05 18:28:37 -06:00
Sebastian Raschka
488bef7e3f Image resizing 2025-11-02 21:05:38 -06:00
Sebastian Raschka
c6b8332a59 Gated DeltaNet write-up (#901)
* Gated DeltaNet write-up

* Add copyright and source information to script

Added copyright notice and source information.

* Remove unused import of Path in plot_memory_estimates

* Fix url
2025-11-02 21:03:42 -06:00
Sebastian Raschka
d6c3990c57 Training on MPS in PyTorch 2.9 (#900)
* Training on MPS in PyTorch 2.9

* update
2025-11-01 16:55:09 -05:00
Sebastian Raschka
0adb5b8c65 Fix ffn link (#892)
* Fix ffn link

* Apply suggestion from @rasbt

* Apply suggestion from @rasbt
2025-10-21 21:19:44 -05:00
Sebastian Raschka
7ca7c47e4a Make quote style consistent (#891) 2025-10-21 19:42:33 -05:00
casinca
9276edbc37 - docs(moe): correct arXiv link for DeepSeekMoE (#890)
- docs(moe): correct paper name for 2022
2025-10-20 19:29:06 -05:00
Sebastian Raschka
218221ab62 Mixture-of-Experts intro (#888) 2025-10-19 22:17:59 -05:00
Sebastian Raschka
7fe4874dda Update the compression rate comment in MLA (#883)
* compression comment

* update
2025-10-14 11:10:06 -05:00
Sebastian Raschka
bf039ff3dc Add alternative attention structure (#880) 2025-10-13 14:31:13 -05:00
Sebastian Raschka
6eb6adfa33 sliding window attention (#879) 2025-10-12 22:13:20 -05:00
rasbt
44eda5340a rm plot 2025-10-12 08:55:03 -05:00
Sebastian Raschka
9b9586688d Multi-Head Latent Attention (#876)
* Multi-Head Latent Attention

* update
2025-10-11 20:08:30 -05:00
Sebastian Raschka
bf27ad1485 Use GB instead of GiB consistently (#875) 2025-10-11 09:11:33 -05:00
Sebastian Raschka
c814814d72 Grouped-Query Attention memory (#874)
* GQA memory

* remove redundant code

* update links

* update
2025-10-11 08:44:19 -05:00
Sebastian Raschka
2f53bf5fe5 Link the other KV cache sections (#708) 2025-06-24 16:52:29 -05:00
Sebastian Raschka
81eda38d3b Improve KV cache code for torch.compile (#705)
* Improve KV cache code for torch.compile

* cleanup

* cleanup
2025-06-23 18:08:49 -05:00
Martin Ma
6522be94be Fix bug in masking when kv cache is used. (#697)
* Fix bug in masking when kv cache is used.

* add tests

* dd tests

* upd

* add kv cache test to gh workflow

* explicit mask slicing

* upd

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
2025-06-23 13:12:56 -05:00
Shamik
f5bc863752 Update README.md (#702)
Typo in kv cache readme
2025-06-23 07:21:51 -05:00
Sebastian Raschka
fdc3e1b701 Add GPT-2 KV cache to pkg (#687) 2025-06-21 12:29:04 -05:00
Sebastian Raschka
ece59ba587 Optimize KV cache (#673)
* Optimize KV cache

* style

* interpretable generate

* interpretable generate

* update readme
2025-06-16 16:00:50 -05:00
Sebastian Raschka
ba0370abd1 Optimized KV cache (#672)
* Optimized KV cache

* typo fix
2025-06-15 14:26:16 -05:00
Sebastian Raschka
2af686d70b Add KV cache (#671) 2025-06-15 09:58:08 -05:00
Sebastian Raschka
c21bfe4a23 Add PyPI package (#576)
* Add PyPI package

* fixes

* fixes
2025-03-23 19:28:49 -05:00
Sebastian Raschka
73f4342664 add ch04 code along video (#573) 2025-03-17 11:20:55 -05:00
Sebastian Raschka
a08d7aaa84 Uv workflow improvements (#531)
* Uv workflow improvements

* Uv workflow improvements

* linter improvements

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* windows fixes

* windows fixes

* windows fixes

* windows fixes

* windows fixes

* windows fixes

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix
2025-02-16 13:16:51 -06:00
Sebastian Raschka
68e2efe1c9 Mention small discrepancy due to Dropout non-reproducibility in PyTorch (#519)
* Mention small discrepancy due to Dropout non-reproducibility in PyTorch

* bump pytorch version
2025-02-06 14:59:52 -06:00
Sebastian Raschka
126adb7663 Include mathematical breakdown for exercise solution 4.1 (#483) 2025-01-14 19:23:00 -06:00
Sebastian Raschka
b6c4b2f9f1 Update bonus section formatting (#400) 2024-10-12 10:26:08 -05:00
rasbt
93d9dae95f update card 2024-10-11 12:15:01 -05:00
rasbt
1f4fca9f8e update reference numbers 2024-10-11 12:13:10 -05:00
Sebastian Raschka
6d0f59a49c Add MFU formula as reference material (#395)
* add MFU formula as reference material

* Update previous_chapters.py
2024-10-10 19:42:53 -05:00
rasbt
dc1b1a05b0 note about random numbers 2024-09-22 12:02:03 -05:00
Sebastian Raschka
222f7b16f8 update gpt-2 paper url 2024-09-20 07:00:06 -07:00
rasbt
8ad50a3315 update gpt-2 paper link 2024-09-09 06:31:28 -05:00
rasbt
1e48c13e89 update gpt-2 paper link 2024-09-08 15:49:44 -05:00
Sebastian Raschka
08040f024c Test code in pytorch 2.4 (#285)
* test code in pytorch 2.4

* update
2024-07-24 21:53:41 -05:00
Thanh Tran
070a69fc8b fix typos & inconsistent texts (#269)
Co-authored-by: TRAN <you@example.com>
2024-07-17 07:34:51 -05:00
Jeroen Van Goey
48bd72c890 fix typos, add codespell pre-commit hook (#264)
* fix typos, add codespell pre-commit hook

* Update .pre-commit-config.yaml

---------

Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-07-16 07:07:04 -05:00
rasbt
6ffd628bb6 add missing "be" to figure 2024-07-15 08:06:05 -05:00
rasbt
921e91a05f use correct chapter reference 2024-07-02 17:29:57 -05:00
rasbt
31806828d0 add links to summary sections 2024-06-29 07:33:26 -05:00