LLMs-from-scratch

mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2026-04-10 12:33:42 +00:00

Author	SHA1	Message	Date
rasbt	4612d20fa8	User argpars utils to show default args on command line	2026-03-01 20:15:21 -06:00
Sebastian Raschka	2d600ccb5b	Use correct input in layernorm example (#960 ) * Update CI * Use correct example in layernorm section * update	2026-02-18 21:35:57 -06:00
Sebastian Raschka	be5e2a3331	Readability and code quality improvements (#959 ) * Consistent dataset naming * consistent section headers	2026-02-17 18:44:56 -06:00
Sebastian Raschka	57430d2a13	Gated DeltaNet updates (#926 )	2025-12-18 20:28:53 -06:00
talentJay-ux	d7f178d28b	Sliding window KV Cache bug fix (#925 ) 1. Fix bug because of KV cache and GPT's ptr pointer doesn't get reset when window_size > context_length 2. Fix bug because of KV cache and GPT's ptr pointer doesn't get reset 3. Fix KV Cache import issue for gpt_with_kv_cache_optimized	2025-12-15 18:47:01 -06:00
Sebastian Raschka	a11965fbd9	Remove persistent flag from cache buffers (#916 )	2025-11-24 20:10:02 -06:00
Sebastian Raschka	28a8408d4d	Update README wrt multi-query attention Clarified the implications of using multi-query attention on modeling performance and memory usage.	2025-11-17 16:39:32 -06:00
casinca	7d92267170	fix(GatedDeltaNet): Init param A from log of a uniform distrib (#906 )	2025-11-09 14:22:52 -06:00
Sebastian Raschka	bcc73f731d	n_heads × d_head -> d_head × d_head in DeltaNet (#903 ) Clarified the explanation of the memory size calculation for `KV_cache_DeltaNet` and updated the quadratic term from `n_heads × d_head` to `d_head × d_head`.	2025-11-05 18:28:37 -06:00
Sebastian Raschka	488bef7e3f	Image resizing	2025-11-02 21:05:38 -06:00
Sebastian Raschka	c6b8332a59	Gated DeltaNet write-up (#901 ) * Gated DeltaNet write-up * Add copyright and source information to script Added copyright notice and source information. * Remove unused import of Path in plot_memory_estimates * Fix url	2025-11-02 21:03:42 -06:00
Sebastian Raschka	d6c3990c57	Training on MPS in PyTorch 2.9 (#900 ) * Training on MPS in PyTorch 2.9 * update	2025-11-01 16:55:09 -05:00
Sebastian Raschka	0adb5b8c65	Fix ffn link (#892 ) * Fix ffn link * Apply suggestion from @rasbt * Apply suggestion from @rasbt	2025-10-21 21:19:44 -05:00
Sebastian Raschka	7ca7c47e4a	Make quote style consistent (#891 )	2025-10-21 19:42:33 -05:00
casinca	9276edbc37	- docs(moe): correct arXiv link for DeepSeekMoE (#890 ) - docs(moe): correct paper name for 2022	2025-10-20 19:29:06 -05:00
Sebastian Raschka	218221ab62	Mixture-of-Experts intro (#888 )	2025-10-19 22:17:59 -05:00
Sebastian Raschka	7fe4874dda	Update the compression rate comment in MLA (#883 ) * compression comment * update	2025-10-14 11:10:06 -05:00
Sebastian Raschka	bf039ff3dc	Add alternative attention structure (#880 )	2025-10-13 14:31:13 -05:00
Sebastian Raschka	6eb6adfa33	sliding window attention (#879 )	2025-10-12 22:13:20 -05:00
rasbt	44eda5340a	rm plot	2025-10-12 08:55:03 -05:00
Sebastian Raschka	9b9586688d	Multi-Head Latent Attention (#876 ) * Multi-Head Latent Attention * update	2025-10-11 20:08:30 -05:00
Sebastian Raschka	bf27ad1485	Use GB instead of GiB consistently (#875 )	2025-10-11 09:11:33 -05:00
Sebastian Raschka	c814814d72	Grouped-Query Attention memory (#874 ) * GQA memory * remove redundant code * update links * update	2025-10-11 08:44:19 -05:00
Sebastian Raschka	2f53bf5fe5	Link the other KV cache sections (#708 )	2025-06-24 16:52:29 -05:00
Sebastian Raschka	81eda38d3b	Improve KV cache code for torch.compile (#705 ) * Improve KV cache code for torch.compile * cleanup * cleanup	2025-06-23 18:08:49 -05:00
Martin Ma	6522be94be	Fix bug in masking when kv cache is used. (#697 ) * Fix bug in masking when kv cache is used. * add tests * dd tests * upd * add kv cache test to gh workflow * explicit mask slicing * upd --------- Co-authored-by: rasbt <mail@sebastianraschka.com>	2025-06-23 13:12:56 -05:00
Shamik	f5bc863752	Update README.md (#702 ) Typo in kv cache readme	2025-06-23 07:21:51 -05:00
Sebastian Raschka	fdc3e1b701	Add GPT-2 KV cache to pkg (#687 )	2025-06-21 12:29:04 -05:00
Sebastian Raschka	ece59ba587	Optimize KV cache (#673 ) * Optimize KV cache * style * interpretable generate * interpretable generate * update readme	2025-06-16 16:00:50 -05:00
Sebastian Raschka	ba0370abd1	Optimized KV cache (#672 ) * Optimized KV cache * typo fix	2025-06-15 14:26:16 -05:00
Sebastian Raschka	2af686d70b	Add KV cache (#671 )	2025-06-15 09:58:08 -05:00
Sebastian Raschka	c21bfe4a23	Add PyPI package (#576 ) * Add PyPI package * fixes * fixes	2025-03-23 19:28:49 -05:00
Sebastian Raschka	73f4342664	add ch04 code along video (#573 )	2025-03-17 11:20:55 -05:00
Sebastian Raschka	a08d7aaa84	Uv workflow improvements (#531 ) * Uv workflow improvements * Uv workflow improvements * linter improvements * pytproject.toml fixes * pytproject.toml fixes * pytproject.toml fixes * pytproject.toml fixes * pytproject.toml fixes * pytproject.toml fixes * windows fixes * windows fixes * windows fixes * windows fixes * windows fixes * windows fixes * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix	2025-02-16 13:16:51 -06:00
Sebastian Raschka	68e2efe1c9	Mention small discrepancy due to Dropout non-reproducibility in PyTorch (#519 ) * Mention small discrepancy due to Dropout non-reproducibility in PyTorch * bump pytorch version	2025-02-06 14:59:52 -06:00
Sebastian Raschka	126adb7663	Include mathematical breakdown for exercise solution 4.1 (#483 )	2025-01-14 19:23:00 -06:00
Sebastian Raschka	b6c4b2f9f1	Update bonus section formatting (#400 )	2024-10-12 10:26:08 -05:00
rasbt	93d9dae95f	update card	2024-10-11 12:15:01 -05:00
rasbt	1f4fca9f8e	update reference numbers	2024-10-11 12:13:10 -05:00
Sebastian Raschka	6d0f59a49c	Add MFU formula as reference material (#395 ) * add MFU formula as reference material * Update previous_chapters.py	2024-10-10 19:42:53 -05:00
rasbt	dc1b1a05b0	note about random numbers	2024-09-22 12:02:03 -05:00
Sebastian Raschka	222f7b16f8	update gpt-2 paper url	2024-09-20 07:00:06 -07:00
rasbt	8ad50a3315	update gpt-2 paper link	2024-09-09 06:31:28 -05:00
rasbt	1e48c13e89	update gpt-2 paper link	2024-09-08 15:49:44 -05:00
Sebastian Raschka	08040f024c	Test code in pytorch 2.4 (#285 ) * test code in pytorch 2.4 * update	2024-07-24 21:53:41 -05:00
Thanh Tran	070a69fc8b	fix typos & inconsistent texts (#269 ) Co-authored-by: TRAN <you@example.com>	2024-07-17 07:34:51 -05:00
Jeroen Van Goey	48bd72c890	fix typos, add codespell pre-commit hook (#264 ) * fix typos, add codespell pre-commit hook * Update .pre-commit-config.yaml --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-07-16 07:07:04 -05:00
rasbt	6ffd628bb6	add missing "be" to figure	2024-07-15 08:06:05 -05:00
rasbt	921e91a05f	use correct chapter reference	2024-07-02 17:29:57 -05:00
rasbt	31806828d0	add links to summary sections	2024-06-29 07:33:26 -05:00

1 2 3

112 Commits