Sebastian Raschka
bc6f335526
Olmo 3 from scratch ( #914 )
...
* Olmo 3 from scratch
* update
* update
* update
2025-11-22 22:42:18 -06:00
Sebastian Raschka
c6b8332a59
Gated DeltaNet write-up ( #901 )
...
* Gated DeltaNet write-up
* Add copyright and source information to script
Added copyright notice and source information.
* Remove unused import of Path in plot_memory_estimates
* Fix url
2025-11-02 21:03:42 -06:00
Aviral Garg
27d52d6378
Fix MHAEinsum weight dimension bug when d_in != d_out ( #857 ) ( #893 )
...
* Fix MHAEinsum weight dimension bug when d_in != d_out (#857 )
Previously MHAEinsum initialized weight matrices with shape (d_out, d_in) and used inappropriate einsum notation, causing failures for non-square input-output dimensions. This commit corrects weight initialization to shape (d_in, d_out), updates einsum notation to 'bnd,do->bno', and adds three unit tests to verify parity across different d_in and d_out settings. All tests pass successfully.
* use pytest
* Update .gitignore
---------
Co-authored-by: rasbt <mail@sebastianraschka.com >
2025-10-31 21:45:31 -05:00
Sebastian Raschka
218221ab62
Mixture-of-Experts intro ( #888 )
2025-10-19 22:17:59 -05:00
Sebastian Raschka
6eb6adfa33
sliding window attention ( #879 )
2025-10-12 22:13:20 -05:00
Sebastian Raschka
9b9586688d
Multi-Head Latent Attention ( #876 )
...
* Multi-Head Latent Attention
* update
2025-10-11 20:08:30 -05:00
Sebastian Raschka
c814814d72
Grouped-Query Attention memory ( #874 )
...
* GQA memory
* remove redundant code
* update links
* update
2025-10-11 08:44:19 -05:00
Sebastian Raschka
fecfdd16ff
Add simpler BPE, and make previous BPE better ( #870 )
...
* Add simpler BPE, and make previous BPE better
* update
* Update README.md
2025-10-08 22:22:34 -05:00
Sebastian Raschka
e742d8af2c
Improve MoE implementation ( #841 )
2025-09-22 15:21:06 -05:00
rasbt
9ea2c57c5f
simplify
2025-09-01 22:15:47 -05:00
rasbt
643f800a94
remove local config files
2025-09-01 20:52:40 -05:00
Sebastian Raschka
9eee9296d9
Interactive qwen3 chat interface ( #801 )
...
* Interactive qwen3 chat interface
* update
* update
* update url
2025-09-01 20:50:25 -05:00
Sebastian Raschka
a6b883c9f9
Gemma 3 270M From Scratch ( #771 )
...
* Gemma 3 270M From Scratch
* fix path
* update readme
2025-08-17 08:23:05 -05:00
Sebastian Raschka
b14325e56d
Qwen3 and Llama3 equivalency teests with HF transformers ( #768 )
...
* Qwen3 and Llama3 equivalency teests with HF transformers
* update
2025-08-14 18:36:07 -05:00
Sebastian Raschka
190c66b3b0
Add Qwen3 1.7, 4B, 8B, and 32B support to from-scratch nb ( #709 )
2025-06-25 08:53:09 -05:00
Sebastian Raschka
e719bd86ad
Qwen3 From Scratch ( #678 )
...
* Qwen3 From Scratch
* rev other file
* upd
* upd
* upd
* url fixes
2025-06-19 18:44:38 -05:00
Sebastian Raschka
c4cde1c21b
Reduce Llama 3 RoPE memory requirements ( #658 )
...
* Llama3 from scratch improvements
* Fix Llama 3 expensive RoPE memory issue
* updates
* update package
* benchmark
* remove unused rescale_theta
2025-06-12 11:08:02 -05:00
Daniel Kleine
f01e163aad
updated .gitignore ( #581 )
2025-03-26 13:21:14 -05:00
Sebastian Raschka
f63f04d8d5
Fix BPE bonus materials ( #561 )
...
* Fix BPE bonus materials
* fix bpe implementation
* update
* Add 'Hello, world. Is this-- a test?' test case
* update link to test file
* update path handling
* update path handling
* fix pytest paths
2025-03-08 17:21:30 -06:00
rasbt
24f78865df
update badges
2025-02-17 12:00:46 -06:00
Matthew Feickert
a8b8eb4731
feat: Add pixi environment ( #534 )
...
* feat: Add pixi environment
* Add pixi manifest pixi.toml for Linux x86, macOS arm64, Windows 64.
* ci: Update CI workflow and unify to one
* Enable workflow dispatch.
* Add concurrency limits.
* Use pixi for CI workflow.
* Unify to a single workflow for all OS tested
* feat: Add pixi lock file
* Ensure tensorflow-cpu installed on Windows
* fix package check
* fix package check
* simplification plus uv and pip runners
* some fixes to pixi and pip
* create pixi.lock
* fix pixi.lock issue
* another attempt trying to fix get_packages
* another attempt trying to fix get_packages
* clean up python_environment_check.py
* updated runner and docs
* use bash
* proper env activiation
* proper env activiation
---------
Co-authored-by: rasbt <mail@sebastianraschka.com >
2025-02-17 11:33:53 -06:00
Sebastian Raschka
3e3dc3c5dc
Native uv docs ( #530 )
...
* Replace pip by more modern uv
* uv tests
* Native uv docs
* resolve merge conflicts
* resolve merge conflicts
2025-02-15 20:35:23 -06:00
Sebastian Raschka
25ea71e713
Alternative weight loading via .safetensors ( #507 )
2025-01-29 08:15:29 -06:00
Daniel Kleine
60acb94894
BPE: fixed typo ( #492 )
...
* fixed typo
* use rel path if exists
* mod gitignore and use existing vocab files
---------
Co-authored-by: rasbt <mail@sebastianraschka.com >
2025-01-20 20:49:53 -06:00
Daniel Kleine
81eed9afe2
updated RoPE statement ( #423 )
...
* updated RoPE statement
* updated .gitignore
* Update ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com >
2024-10-30 08:00:08 -05:00
Daniel Kleine
d38083c401
Updated Llama 2 to 3 paths ( #413 )
...
* llama 2 and 3 path fixes
* updated llama 3, 3.1 and 3.2 paths
* updated .gitignore
* Typo fix
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com >
2024-10-24 07:40:08 -05:00
Sebastian Raschka
8a448a4410
Llama 3 ( #384 )
...
* Implement Llama 3.2
* Add Llama 3.2 files
* exclude IMDB link because stanford website seems down
2024-10-05 07:52:15 -05:00
Sebastian Raschka
b993c2b25b
Improve rope settings for llama3 ( #380 )
2024-10-03 08:29:54 -05:00
rasbt
6bc3de165c
move access token to config.json
2024-09-23 08:56:16 -05:00
Sebastian Raschka
0467c8289b
GPT to Llama ( #368 )
...
* GPT to Llama
* fix urls
2024-09-23 07:34:06 -05:00
Sebastian Raschka
76e9a9ec02
Add user interface to ch06 and ch07 ( #366 )
...
* Add user interface to ch06 and ch07
* pep8
* fix url
2024-09-21 20:33:00 -05:00
Daniel Kleine
eefe4bf12b
Chainlit bonus material fixes ( #361 )
...
* fix cmd
* moved idx to device
* improved code with clone().detach()
* fixed path
* fix: added extra line for pep8
* updated .gitginore
* Update ch05/06_user_interface/app_orig.py
* Update ch05/06_user_interface/app_own.py
* Apply suggestions from code review
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com >
2024-09-18 08:08:50 -07:00
Sebastian Raschka
ea9b4e83a4
Add chatpgpt-like user interface ( #360 )
...
* Add chatpgpt-like user interface
* fixes
2024-09-17 08:26:44 -05:00
Eric Thomson
da5236ee72
Adds .vscode folder to .gitignore ( #314 )
...
* added .vscode folder to .gitignore
* Update .gitignore
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com >
2024-08-12 07:49:11 -05:00
Daniel Kleine
8318d1f002
minor DPO fixes ( #298 )
...
* fixed issues, updated .gitignore
* added closing paren
* fixed CEL spelling
* fixed more minor issues
* Update ch07/01_main-chapter-code/ch07.ipynb
* Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb
* Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb
* Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com >
2024-08-05 08:40:46 -05:00
Daniel Kleine
3ac363d005
updated .gitignore for ch07/01 artefacts ( #242 )
...
* fixed markdown
* removed redundant imports
* updated .gitignore for ch07/01 artefacts
2024-06-22 18:12:01 -05:00
Sebastian Raschka
ec5baa1f33
Add CI tests for chapter 7 ( #239 )
2024-06-22 08:57:18 -05:00
Sebastian Raschka
b90c7ad2d6
Exercise solutions ( #237 )
2024-06-22 08:30:45 -05:00
Sebastian Raschka
6c0dc2362b
Add standalone finetuning and evaluation scripts for chapter 7 ( #234 )
...
* add finetuning and eval scripts
* update link
* update links
* fix link
2024-06-21 05:23:24 -05:00
Daniel Kleine
dcbdc1d2e5
fixes for code ( #206 )
...
* updated .gitignore
* removed unused GELU import
* fixed model_configs, fixed all tensors on same device
* removed unused tiktoken
* update
* update hparam search
* remove redundant tokenizer argument
---------
Co-authored-by: rasbt <mail@sebastianraschka.com >
2024-06-11 20:59:48 -05:00
Daniel Kleine
da9f64215a
ch07 fixes ( #204 )
...
* updated .gitginore for ch07
* fixed extract_response()
2024-06-10 17:31:13 -05:00
rasbt
42af52fef4
revert unnecessary changes
2024-05-27 07:37:06 -05:00
rasbt
dd7ba32b56
add comment
2024-05-27 07:18:07 -05:00
Daniel Kleine
e7914182c6
updated .gitignore
2024-05-19 16:07:20 +00:00
Daniel Kleine
fabdefe959
updated .gitignore with appendix artifacts
2024-05-15 06:30:24 +00:00
Daniel Kleine
88ee7793d4
updated .gitignore with 06/02 und /03 artifacts
2024-05-14 12:16:24 +00:00
rasbt
21172a6a7e
add chapter 6 unit test
2024-05-12 18:51:28 -05:00
rasbt
2e47a6e61c
update dataset naming
2024-05-12 09:22:42 -05:00
rasbt
16e276f8df
show downloads
2024-05-06 07:40:09 -05:00
rasbt
258dcad5ee
ch06 csv
2024-05-06 07:16:30 -05:00