Commit Graph

95 Commits

Author SHA1 Message Date
Sebastian Raschka
bf039ff3dc Add alternative attention structure (#880) 2025-10-13 14:31:13 -05:00
Sebastian Raschka
6eb6adfa33 sliding window attention (#879) 2025-10-12 22:13:20 -05:00
rasbt
44eda5340a rm plot 2025-10-12 08:55:03 -05:00
Sebastian Raschka
9b9586688d Multi-Head Latent Attention (#876)
* Multi-Head Latent Attention

* update
2025-10-11 20:08:30 -05:00
Sebastian Raschka
bf27ad1485 Use GB instead of GiB consistently (#875) 2025-10-11 09:11:33 -05:00
Sebastian Raschka
c814814d72 Grouped-Query Attention memory (#874)
* GQA memory

* remove redundant code

* update links

* update
2025-10-11 08:44:19 -05:00
Sebastian Raschka
2f53bf5fe5 Link the other KV cache sections (#708) 2025-06-24 16:52:29 -05:00
Sebastian Raschka
81eda38d3b Improve KV cache code for torch.compile (#705)
* Improve KV cache code for torch.compile

* cleanup

* cleanup
2025-06-23 18:08:49 -05:00
Martin Ma
6522be94be Fix bug in masking when kv cache is used. (#697)
* Fix bug in masking when kv cache is used.

* add tests

* dd tests

* upd

* add kv cache test to gh workflow

* explicit mask slicing

* upd

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
2025-06-23 13:12:56 -05:00
Shamik
f5bc863752 Update README.md (#702)
Typo in kv cache readme
2025-06-23 07:21:51 -05:00
Sebastian Raschka
fdc3e1b701 Add GPT-2 KV cache to pkg (#687) 2025-06-21 12:29:04 -05:00
Sebastian Raschka
ece59ba587 Optimize KV cache (#673)
* Optimize KV cache

* style

* interpretable generate

* interpretable generate

* update readme
2025-06-16 16:00:50 -05:00
Sebastian Raschka
ba0370abd1 Optimized KV cache (#672)
* Optimized KV cache

* typo fix
2025-06-15 14:26:16 -05:00
Sebastian Raschka
2af686d70b Add KV cache (#671) 2025-06-15 09:58:08 -05:00
Sebastian Raschka
c21bfe4a23 Add PyPI package (#576)
* Add PyPI package

* fixes

* fixes
2025-03-23 19:28:49 -05:00
Sebastian Raschka
73f4342664 add ch04 code along video (#573) 2025-03-17 11:20:55 -05:00
Sebastian Raschka
a08d7aaa84 Uv workflow improvements (#531)
* Uv workflow improvements

* Uv workflow improvements

* linter improvements

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* windows fixes

* windows fixes

* windows fixes

* windows fixes

* windows fixes

* windows fixes

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix
2025-02-16 13:16:51 -06:00
Sebastian Raschka
68e2efe1c9 Mention small discrepancy due to Dropout non-reproducibility in PyTorch (#519)
* Mention small discrepancy due to Dropout non-reproducibility in PyTorch

* bump pytorch version
2025-02-06 14:59:52 -06:00
Sebastian Raschka
126adb7663 Include mathematical breakdown for exercise solution 4.1 (#483) 2025-01-14 19:23:00 -06:00
Sebastian Raschka
b6c4b2f9f1 Update bonus section formatting (#400) 2024-10-12 10:26:08 -05:00
rasbt
93d9dae95f update card 2024-10-11 12:15:01 -05:00
rasbt
1f4fca9f8e update reference numbers 2024-10-11 12:13:10 -05:00
Sebastian Raschka
6d0f59a49c Add MFU formula as reference material (#395)
* add MFU formula as reference material

* Update previous_chapters.py
2024-10-10 19:42:53 -05:00
rasbt
dc1b1a05b0 note about random numbers 2024-09-22 12:02:03 -05:00
Sebastian Raschka
222f7b16f8 update gpt-2 paper url 2024-09-20 07:00:06 -07:00
rasbt
8ad50a3315 update gpt-2 paper link 2024-09-09 06:31:28 -05:00
rasbt
1e48c13e89 update gpt-2 paper link 2024-09-08 15:49:44 -05:00
Sebastian Raschka
08040f024c Test code in pytorch 2.4 (#285)
* test code in pytorch 2.4

* update
2024-07-24 21:53:41 -05:00
Thanh Tran
070a69fc8b fix typos & inconsistent texts (#269)
Co-authored-by: TRAN <you@example.com>
2024-07-17 07:34:51 -05:00
Jeroen Van Goey
48bd72c890 fix typos, add codespell pre-commit hook (#264)
* fix typos, add codespell pre-commit hook

* Update .pre-commit-config.yaml

---------

Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-07-16 07:07:04 -05:00
rasbt
6ffd628bb6 add missing "be" to figure 2024-07-15 08:06:05 -05:00
rasbt
921e91a05f use correct chapter reference 2024-07-02 17:29:57 -05:00
rasbt
31806828d0 add links to summary sections 2024-06-29 07:33:26 -05:00
rasbt
796f0e2a30 add clarifying note about GELU 2024-06-29 07:14:36 -05:00
rasbt
ab23ca5b1b force refresh figure 2024-06-29 07:01:37 -05:00
rasbt
6a8acf5135 remove redundant plus sign 2024-06-29 06:59:36 -05:00
Daniel Kleine
81c843bdc0 minor fixes (#246)
* removed duplicated white spaces

* Update ch07/01_main-chapter-code/ch07.ipynb

* Update ch07/05_dataset-generation/llama3-ollama.ipynb

* removed duplicated white spaces

* fixed title again

---------

Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-06-25 17:30:30 -05:00
Sebastian Raschka
5944ab0678 Update README.md 2024-06-22 12:09:02 -05:00
rasbt
283397aaf2 add main and optional sections 2024-06-19 17:48:25 -05:00
Daniel Kleine
bbb2a0c3d5 fixed num_workers (#229)
* fixed num_workers

* ch06 & ch07: added num_workers to create_dataloader_v1
2024-06-19 17:36:46 -05:00
rasbt
e24fd98cdf distinguish better between main chapter code and bonus materials 2024-06-11 21:07:42 -05:00
Daniel Kleine
dcbdc1d2e5 fixes for code (#206)
* updated .gitignore

* removed unused GELU import

* fixed model_configs, fixed all tensors on same device

* removed unused tiktoken

* update

* update hparam search

* remove redundant tokenizer argument

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
2024-06-11 20:59:48 -05:00
rasbt
39c4a887eb add allowed_special={"<|endoftext|>"} 2024-06-09 06:04:02 -05:00
Sebastian Raschka
72a073bbbf Remove leftover instances of self.tokenizer (#201)
* Remove leftover instances of self.tokenizer

* add endoftext token
2024-06-08 14:57:34 -05:00
rasbt
98d453b666 update formatting 2024-05-24 07:20:37 -05:00
rasbt
e5e6aaf9f1 flops analysis 2024-05-23 20:35:41 -05:00
rasbt
c735c21e87 fix swiglu acronym 2024-05-01 20:26:17 -05:00
Sebastian Raschka
97ed38116a Rename drop_resid to drop_shortcut (#136) 2024-04-28 14:31:27 -05:00
rasbt
d202cabdee update figures 2024-04-20 11:42:03 -05:00
Sebastian Raschka
dd51d4ad83 Make datesets and loaders compatible with multiprocessing (#118) 2024-04-13 13:57:56 -05:00