Commit Graph

119 Commits

Author SHA1 Message Date
Sebastian Raschka
c6b8332a59 Gated DeltaNet write-up (#901)
* Gated DeltaNet write-up

* Add copyright and source information to script

Added copyright notice and source information.

* Remove unused import of Path in plot_memory_estimates

* Fix url
2025-11-02 21:03:42 -06:00
Sebastian Raschka
218221ab62 Mixture-of-Experts intro (#888) 2025-10-19 22:17:59 -05:00
Sebastian Raschka
bf039ff3dc Add alternative attention structure (#880) 2025-10-13 14:31:13 -05:00
Sebastian Raschka
6eb6adfa33 sliding window attention (#879) 2025-10-12 22:13:20 -05:00
Sebastian Raschka
21f0617ea3 Add other appendices for completeness (#878)
* Add other appendices for completeness

* update

* update

* Update
2025-10-12 19:04:53 -05:00
Sebastian Raschka
9b9586688d Multi-Head Latent Attention (#876)
* Multi-Head Latent Attention

* update
2025-10-11 20:08:30 -05:00
Sebastian Raschka
c814814d72 Grouped-Query Attention memory (#874)
* GQA memory

* remove redundant code

* update links

* update
2025-10-11 08:44:19 -05:00
Sebastian Raschka
fecfdd16ff Add simpler BPE, and make previous BPE better (#870)
* Add simpler BPE, and make previous BPE better

* update

* Update README.md
2025-10-08 22:22:34 -05:00
Sebastian Raschka
1164cb3e8f Qwen3 and evaluation bonus materials (#869) 2025-10-08 18:22:19 -05:00
Sebastian Raschka
6d175a22df Fix IMDb spelling (#811)
* Add SSL instructions

* Fix IMDb spelling
2025-09-06 12:04:47 -05:00
Sebastian Raschka
a51ff65488 reasoning-from-scratch (#793) 2025-08-28 18:36:41 -05:00
Sebastian Raschka
a6b883c9f9 Gemma 3 270M From Scratch (#771)
* Gemma 3 270M From Scratch

* fix path

* update readme
2025-08-17 08:23:05 -05:00
Sebastian Raschka
f92b40e4ab Qwen3 Coder Flash & MoE from Scratch (#760)
* Qwen3 Coder Flash & MoE from Scratch

* update

* refinements

* updates

* update

* update

* update
2025-08-01 19:13:17 -05:00
Sebastian Raschka
7e9ce325de Add link to official video course (#741) 2025-07-13 10:35:12 -05:00
Sebastian Raschka
3c9dc4807b Simplify KV cache usage (#728)
* Simplify KV cache usage

* Swap mark text with ghostwriter
2025-07-08 12:56:55 -05:00
Sebastian Raschka
c8c6e7814a Update README.md 2025-07-06 17:58:33 -05:00
Sebastian Raschka
6103acbedb Add prerequisite section (#723) 2025-07-06 12:45:42 -05:00
Sebastian Raschka
47a750014d Add link to free exercise PDF (#706) 2025-06-24 08:24:02 -05:00
Sebastian Raschka
e719bd86ad Qwen3 From Scratch (#678)
* Qwen3 From Scratch

* rev other file

* upd

* upd

* upd

* url fixes
2025-06-19 18:44:38 -05:00
Sebastian Raschka
2af686d70b Add KV cache (#671) 2025-06-15 09:58:08 -05:00
Sebastian Raschka
3f93d73d6d Alt weight loading code via PyTorch (#585)
* Alt weight loading code via PyTorch

* commit additional files
2025-03-27 20:10:23 -05:00
Sebastian Raschka
f12b899d96 GitHub markdown updates (#545)
* GitHub markdown updates

* Apply suggestions from code review

* Apply suggestions from code review
2025-02-23 12:25:44 -06:00
Sebastian Raschka
67c226bf67 Badge url updates 2025-02-17 12:07:47 -06:00
rasbt
9ccecd13ae update badges 2025-02-17 12:02:06 -06:00
rasbt
24f78865df update badges 2025-02-17 12:00:46 -06:00
rasbt
2f67cbca0b update readme badges 2025-02-17 11:49:41 -06:00
Sebastian Raschka
bacb7aa90c Update README.md 2025-02-16 13:37:32 -06:00
Sebastian Raschka
908dd2f71e PyTorch tips for better training performance (#525)
* PyTorch tips for better training performance

* formatting

* pep 8
2025-02-12 16:10:34 -06:00
Sebastian Raschka
a22d612be6 Bonus material: extending tokenizers (#496)
* Bonus material: extending tokenizers

* small wording update
2025-01-22 09:26:54 -06:00
Sebastian Raschka
0d4967eda6 Implementingthe BPE Tokenizer from Scratch (#487) 2025-01-17 12:22:00 -06:00
Sebastian Raschka
27a6a7e64a Add chapter names 2024-11-08 08:39:34 -06:00
Sebastian Raschka
b5f2aa3500 Update README.md 2024-10-29 20:20:48 -05:00
Sebastian Raschka
05b04f2a5a Memory efficient weight loading (#401)
* memory efficient weight loading

* remove unused code
2024-10-14 10:30:25 -05:00
Sebastian Raschka
8a448a4410 Llama 3 (#384)
* Implement Llama 3.2

* Add Llama 3.2 files

* exclude IMDB link because stanford website seems down
2024-10-05 07:52:15 -05:00
Sebastian Raschka
0467c8289b GPT to Llama (#368)
* GPT to Llama

* fix urls
2024-09-23 07:34:06 -05:00
Sebastian Raschka
76e9a9ec02 Add user interface to ch06 and ch07 (#366)
* Add user interface to ch06 and ch07

* pep8

* fix url
2024-09-21 20:33:00 -05:00
Sebastian Raschka
ea9b4e83a4 Add chatpgpt-like user interface (#360)
* Add chatpgpt-like user interface

* fixes
2024-09-17 08:26:44 -05:00
Sebastian Raschka
835ed29dbf reflection-tuning dataset generation (#349) 2024-09-10 21:42:12 -05:00
Daniel Kleine
2ee3df622e nbviewer links / typo (#346)
* fixed typo

* removed remaining nbviewer links

* Update mha-implementations.ipynb

---------

Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-09-07 07:27:28 +02:00
Sebastian Raschka
91db4e3a0f Revert nbviewer links 2024-09-05 08:09:33 +02:00
Sebastian Raschka
d391796ec2 use nbviewer links (#339) 2024-08-29 09:09:10 +02:00
Sebastian Raschka
26f94876f7 Update README.md 2024-08-24 07:22:18 -05:00
Sebastian Raschka
f1c3d451fe Update README.md 2024-08-08 07:50:45 -05:00
Sebastian Raschka
81e9cea3d3 Update README.md 2024-08-08 07:47:31 -05:00
Sebastian Raschka
98d24a1607 Update README.md 2024-08-06 08:02:01 -05:00
Sebastian Raschka
50332cf75b Update README.md 2024-08-05 17:47:06 -05:00
Sebastian Raschka
16e83434b5 Update README.md 2024-08-04 16:06:38 -05:00
Sebastian Raschka
52435804eb Direct Preference Optimization from scratch (#294) 2024-08-04 08:57:36 -05:00
Sebastian Raschka
ff7a6db212 Update README.md 2024-08-01 18:17:42 -05:00
Sebastian Raschka
9bf5d67d61 Update README.md 2024-07-28 09:28:11 -05:00