114 Commits

Author SHA1 Message Date
Sebastian Raschka
be5e2a3331 Readability and code quality improvements (#959)
* Consistent dataset naming

* consistent section headers
2026-02-17 18:44:56 -06:00
Sebastian Raschka
7b1f740f74 Fix flex attention in PyTorch 2.10 (#957) 2026-02-09 14:12:40 -06:00
Aviral Garg
27d52d6378 Fix MHAEinsum weight dimension bug when d_in != d_out (#857) (#893)
* Fix MHAEinsum weight dimension bug when d_in != d_out (#857)

Previously MHAEinsum initialized weight matrices with shape (d_out, d_in) and used inappropriate einsum notation, causing failures for non-square input-output dimensions. This commit corrects weight initialization to shape (d_in, d_out), updates einsum notation to 'bnd,do->bno', and adds three unit tests to verify parity across different d_in and d_out settings. All tests pass successfully.

* use pytest

* Update .gitignore

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
2025-10-31 21:45:31 -05:00
Sebastian Raschka
7084123d10 Note about output dimensions (#862) 2025-10-01 10:47:04 -05:00
casinca
00c240ff87 some typo fixes (#858)
* fix(typo): correct scaling

* fix(typo): correct comment for `instruct`
2025-09-30 11:18:02 -05:00
Synix
bfc6389fab fix code comment (#834) 2025-09-17 01:36:02 +00:00
Sebastian Raschka
fc101b710e Added Apple Silicon GPU device update (#820)
* Added Apple Silicon GPU device

* Added Apple Silicon GPU device

* delete: remove unused model.pth file from understanding-buffers

* update

* update

---------

Co-authored-by: missflash <missflash@gmail.com>
2025-09-13 12:48:06 -05:00
Jestine Paul
a3a62c509a Improve MHA einsum (#781)
Efficiency update for einsum as mentioned in #772
2025-08-22 15:12:26 -05:00
Sebastian Raschka
8c1f9ccf54 Improve MHA einsum (#775) 2025-08-19 10:38:15 -05:00
Sebastian Raschka
4e61dc4224 Fix d_out code comment in bonus materials (#715) 2025-06-28 10:07:16 -05:00
Sebastian Raschka
d37ddb668a Fix code comment: embed_dim -> d_out (#698) 2025-06-22 16:36:39 -05:00
Sebastian Raschka
3654571184 align formulas in notes with code (#605) 2025-04-06 16:46:53 -05:00
Sebastian Raschka
49330d0990 Fix link (#596) 2025-04-02 09:47:07 -05:00
Sebastian Raschka
4db0e826b7 Add chapter 3 coding along video link (#572) 2025-03-16 16:07:14 -05:00
Sebastian Raschka
96ca2fcb2f Update mha plot (#560) 2025-03-06 20:29:04 -06:00
Greg Gandenberger
b92c0dff89 Add note about context_length (#549)
* Add note about context_length

* update note

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
2025-02-27 08:36:41 -06:00
Sebastian Raschka
a08d7aaa84 Uv workflow improvements (#531)
* Uv workflow improvements

* Uv workflow improvements

* linter improvements

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* windows fixes

* windows fixes

* windows fixes

* windows fixes

* windows fixes

* windows fixes

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix
2025-02-16 13:16:51 -06:00
rasbt
1183fd7837 add dropout scaling note 2024-11-06 05:52:47 -06:00
Daniel Kleine
5ff72c2850 fixed typos (#414)
* fixed typos

* fixed formatting

* Update ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb

* del weights after load into model

---------

Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-10-24 18:23:53 -05:00
Daniel Kleine
ef4018181e updates for PyTorch 2.5 (#408)
* updated Dockerfile

* updated MHA implementations for PT 2.5

* fixed typo

* update installation instruction

* Update setup/03_optional-docker-environment/.devcontainer/Dockerfile

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
2024-10-22 20:23:31 -05:00
Sebastian Raschka
b6c4b2f9f1 Update bonus section formatting (#400) 2024-10-12 10:26:08 -05:00
Daniel Kleine
2ee3df622e nbviewer links / typo (#346)
* fixed typo

* removed remaining nbviewer links

* Update mha-implementations.ipynb

---------

Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-09-07 07:27:28 +02:00
Sebastian Raschka
ad12c8da06 Einsum multi-head attention (#345)
* Einsum multi-head attention

* update diff
2024-09-05 18:24:33 +02:00
Daniel Kleine
c65928f7dc added std error bars (#320)
* added std error bars

* fixed changes

* Update on A100

---------

Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-08-13 20:57:41 -05:00
Jeroen Van Goey
76e6910a1a Small typo fix (#313)
* typo fix

* Update ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb

* Update ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb

* Update ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb

---------

Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-08-12 07:54:12 -05:00
Sebastian Raschka
3f6652d87e update attention benchmarks (#307) 2024-08-10 09:44:11 -05:00
Sebastian Raschka
f5a003744e Update README.md 2024-07-30 06:55:41 -05:00
rasbt
0dad0a3c04 add state_dict example 2024-07-28 14:15:32 -05:00
Sebastian Raschka
f4fc0ededd buffer tutorial 2024-07-27 17:06:16 -05:00
rasbt
7f1e071fff update 2024-07-27 07:12:42 -05:00
Sebastian Raschka
deea13e5c2 Understanding PyTorch Buffers (#288) 2024-07-26 08:45:36 -05:00
Sebastian Raschka
08040f024c Test code in pytorch 2.4 (#285)
* test code in pytorch 2.4

* update
2024-07-24 21:53:41 -05:00
Jeroen Van Goey
48bd72c890 fix typos, add codespell pre-commit hook (#264)
* fix typos, add codespell pre-commit hook

* Update .pre-commit-config.yaml

---------

Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-07-16 07:07:04 -05:00
rasbt
31806828d0 add links to summary sections 2024-06-29 07:33:26 -05:00
rasbt
c7f892550e add clarification about :num_tokens 2024-06-29 07:16:42 -05:00
rasbt
283397aaf2 add main and optional sections 2024-06-19 17:48:25 -05:00
rasbt
5d1fbbbfd2 update dotted line 2024-06-17 20:17:56 -05:00
rasbt
aaa54b10b3 dim-consistency 2024-06-12 19:43:25 -05:00
rasbt
e24fd98cdf distinguish better between main chapter code and bonus materials 2024-06-11 21:07:42 -05:00
Sebastian Raschka
72a073bbbf Remove leftover instances of self.tokenizer (#201)
* Remove leftover instances of self.tokenizer

* add endoftext token
2024-06-08 14:57:34 -05:00
Sebastian Raschka
c303a7f36d Explain value truncation in some figures (#199)
* clarify truncation

* typo fix
2024-06-08 13:24:37 -05:00
rasbt
1e12da90e6 clarify truncation 2024-06-08 13:13:43 -05:00
Sebastian Raschka
c577f52bfc Merge pull request #184 from rasbt/api-key-approach
Change API key retrieval approach
2024-05-27 08:47:04 -04:00
rasbt
71831890a0 update mha dim 2024-05-27 07:46:29 -05:00
rasbt
42af52fef4 revert unnecessary changes 2024-05-27 07:37:06 -05:00
rasbt
87c3e78dcb Revert "Revert "newline""
This reverts commit a53ca10508.
2024-05-27 07:32:45 -05:00
rasbt
a53ca10508 Revert "newline"
This reverts commit 23982ed3fa.
2024-05-27 07:32:22 -05:00
rasbt
23982ed3fa newline 2024-05-27 07:30:27 -05:00
rasbt
050c8b7b73 update pr 2024-05-26 15:38:35 -05:00
Kostyantyn Borysenko
76cdf5e299 Fix an incorrect input dimension 2024-05-26 13:05:07 -07:00