LLMs-from-scratch

mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2026-04-10 12:33:42 +00:00

Author	SHA1	Message	Date
Sebastian Raschka	58f45ae5a7	Fix empty device issue (#904 )	2025-11-05 20:04:44 -06:00
Sebastian Raschka	bcc73f731d	n_heads × d_head -> d_head × d_head in DeltaNet (#903 ) Clarified the explanation of the memory size calculation for `KV_cache_DeltaNet` and updated the quadratic term from `n_heads × d_head` to `d_head × d_head`.	2025-11-05 18:28:37 -06:00
Sebastian Raschka	488bef7e3f	Image resizing	2025-11-02 21:05:38 -06:00
Sebastian Raschka	c6b8332a59	Gated DeltaNet write-up (#901 ) * Gated DeltaNet write-up * Add copyright and source information to script Added copyright notice and source information. * Remove unused import of Path in plot_memory_estimates * Fix url	2025-11-02 21:03:42 -06:00
Sebastian Raschka	d6c3990c57	Training on MPS in PyTorch 2.9 (#900 ) * Training on MPS in PyTorch 2.9 * update	2025-11-01 16:55:09 -05:00
Aviral Garg	27d52d6378	Fix MHAEinsum weight dimension bug when d_in != d_out (#857 ) (#893 ) * Fix MHAEinsum weight dimension bug when d_in != d_out (#857) Previously MHAEinsum initialized weight matrices with shape (d_out, d_in) and used inappropriate einsum notation, causing failures for non-square input-output dimensions. This commit corrects weight initialization to shape (d_in, d_out), updates einsum notation to 'bnd,do->bno', and adds three unit tests to verify parity across different d_in and d_out settings. All tests pass successfully. * use pytest * Update .gitignore --------- Co-authored-by: rasbt <mail@sebastianraschka.com>	2025-10-31 21:45:31 -05:00
Sebastian Raschka	b1db33b384	simplify uv command (#898 )	2025-10-31 19:44:57 -05:00
Sebastian Raschka	760f4c9ecc	Add bonus dependencies to pyproject (#897 ) * Add bonus dependencies to pyproject * update	2025-10-28 20:36:21 -05:00
Sebastian Raschka	0adb5b8c65	Fix ffn link (#892 ) * Fix ffn link * Apply suggestion from @rasbt * Apply suggestion from @rasbt	2025-10-21 21:19:44 -05:00
Sebastian Raschka	7ca7c47e4a	Make quote style consistent (#891 )	2025-10-21 19:42:33 -05:00
casinca	9276edbc37	- docs(moe): correct arXiv link for DeepSeekMoE (#890 ) - docs(moe): correct paper name for 2022	2025-10-20 19:29:06 -05:00
Sebastian Raschka	218221ab62	Mixture-of-Experts intro (#888 )	2025-10-19 22:17:59 -05:00
Sebastian Raschka	27b6dfab9e	Make it easier to toggle between thinking and instruct variants (#887 )	2025-10-16 20:37:31 -05:00
Sebastian Raschka	7fe4874dda	Update the compression rate comment in MLA (#883 ) * compression comment * update	2025-10-14 11:10:06 -05:00
Sebastian Raschka	b969b3ef7a	Use figure numbers in ch05-7 (#881 )	2025-10-13 16:26:35 -05:00
Sebastian Raschka	bf039ff3dc	Add alternative attention structure (#880 )	2025-10-13 14:31:13 -05:00
Sebastian Raschka	6eb6adfa33	sliding window attention (#879 )	2025-10-12 22:13:20 -05:00
Sebastian Raschka	21f0617ea3	Add other appendices for completeness (#878 ) * Add other appendices for completeness * update * update * Update	2025-10-12 19:04:53 -05:00
rasbt	44eda5340a	rm plot	2025-10-12 08:55:03 -05:00
Sebastian Raschka	9b9586688d	Multi-Head Latent Attention (#876 ) * Multi-Head Latent Attention * update	2025-10-11 20:08:30 -05:00
Sebastian Raschka	bf27ad1485	Use GB instead of GiB consistently (#875 )	2025-10-11 09:11:33 -05:00
Sebastian Raschka	c814814d72	Grouped-Query Attention memory (#874 ) * GQA memory * remove redundant code * update links * update	2025-10-11 08:44:19 -05:00
rasbt	b8e12e1dd1	Use inference_device	2025-10-09 10:59:17 -05:00
Sebastian Raschka	fecfdd16ff	Add simpler BPE, and make previous BPE better (#870 ) * Add simpler BPE, and make previous BPE better * update * Update README.md	2025-10-08 22:22:34 -05:00
Sebastian Raschka	1164cb3e8f	Qwen3 and evaluation bonus materials (#869 )	2025-10-08 18:22:19 -05:00
Sebastian Raschka	7bd263144e	Switch from urllib to requests to improve reliability (#867 ) * Switch from urllib to requests to improve reliability * Keep ruff linter-specific * update * update * update	2025-10-07 15:22:59 -05:00
Sebastian Raschka	8552565bda	Add missing comma in imports in README (#865 )	2025-10-06 16:03:04 -05:00
Sebastian Raschka	7084123d10	Note about output dimensions (#862 )	2025-10-01 10:47:04 -05:00
Sebastian Raschka	4d9f9dcb6c	Update ollama address (#861 )	2025-09-30 21:05:53 -05:00
casinca	00c240ff87	some typo fixes (#858 ) * fix(typo): correct scaling * fix(typo): correct comment for `instruct`	2025-09-30 11:18:02 -05:00
Sebastian Raschka	458f2d9b67	Test dependencies with Python 3.13 (#843 ) * Custom python 3.13 entry in pyproject.toml * amend * update * update * update * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * update	2025-09-27 08:38:07 -05:00
Sebastian Raschka	47867bc1cb	Update generate script (#847 ) * Custom python 3.13 entry in pyproject.toml * amend * Update generate script * update * Update pyproject.toml	2025-09-27 08:03:54 -05:00
Sebastian Raschka	9bc827ea7e	Numerically stable generate on mps (#849 ) * Numerically stable generate on mps * add file	2025-09-26 22:42:44 -05:00
Sebastian Raschka	f492c949d3	Requirements update (#851 ) * Requirements update * Code change to tricker workers * update	2025-09-26 22:19:57 -05:00
Sebastian Raschka	322000d833	Windows compile (#845 ) * Custom python 3.13 entry in pyproject.toml * amend * Note about compile on Windows * update	2025-09-26 12:01:19 -05:00
Sebastian Raschka	3b83705988	Update package dependencies (#842 )	2025-09-22 18:32:39 -05:00
Sebastian Raschka	e742d8af2c	Improve MoE implementation (#841 )	2025-09-22 15:21:06 -05:00
Sebastian Raschka	20041fb94b	Note about devcontainer root usage (#833 )	2025-09-21 11:12:44 -05:00
Sebastian Raschka	2aa8e8130d	Note about RoPE usage (#839 ) * Note about devcontainer root usage * Add note about RoPE implementation	2025-09-20 16:25:58 +00:00
casinca	42c130623b	`Qwen3Tokenizer` fix for Qwen3 Base models and generation mismatch with HF (#828 ) * prevent `self.apply_chat_template` being applied for base Qwen models * - added no chat template comparison in `test_chat_wrap_and_equivalence` - removed duplicate comparison * Revert "- added no chat template comparison in `test_chat_wrap_and_equivalence`" This reverts commit `3a5ee8cfa1`. * Revert "prevent `self.apply_chat_template` being applied for base Qwen models" This reverts commit `df504397a8`. * copied `download_file` in `utils` from https://github.com/rasbt/reasoning-from-scratch/blob/main/reasoning_from_scratch/utils.py * added copy of test `def test_tokenizer_equivalence()` from `reasoning-from-scratch` in `test_qwen3.py` * removed duplicate code fragment in`test_chat_wrap_and_equivalence` * use apply_chat_template * add toggle for instruct model * Update tokenizer usage --------- Co-authored-by: rasbt <mail@sebastianraschka.com>	2025-09-17 08:14:11 -05:00
Synix	bfc6389fab	fix code comment (#834 )	2025-09-17 01:36:02 +00:00
Sebastian Raschka	b6cd0a312f	More efficient angles computation in RoPE (#830 )	2025-09-16 03:23:33 +00:00
Sebastian Raschka	147dc49ab5	rename eval method (#832 )	2025-09-16 02:47:20 +00:00
Sebastian Raschka	8add26cbe9	Improve weight tying handling (#826 ) * Improve weight tying handling * fix	2025-09-14 15:46:48 -05:00
rasbt	1412b139f2	main push to sync github ruleset	2025-09-14 11:59:52 -05:00
Sebastian Raschka	8f3e5b024d	Add LoRA scaling (#823 )	2025-09-14 11:57:55 -05:00
Sebastian Raschka	fc101b710e	Added Apple Silicon GPU device update (#820 ) * Added Apple Silicon GPU device * Added Apple Silicon GPU device * delete: remove unused model.pth file from understanding-buffers * update * update --------- Co-authored-by: missflash <missflash@gmail.com>	2025-09-13 12:48:06 -05:00
Andreas Yin	8e170312fe	fix: correct role of the beta hyperparameter on the DPO loss (#818 ) Increasing beta leads to less divergence between the new model and the reference model.	2025-09-12 20:21:38 -05:00
Sebastian Raschka	32965e0edd	remove redundant next_cache (#817 )	2025-09-11 15:16:08 -05:00
Sebastian Raschka	c7a4362ca4	Add defensive context trimming for multiturn (#815 ) * Add defensive context trimming for multiturn * add all mods	2025-09-09 20:19:00 -05:00

1 2 3 4 5 ...

1043 Commits