rasbt
7489500d90
Make change in a code file
2026-04-04 12:05:25 -05:00
Sebastian Raschka
052c2dea4f
Bpe whitespace fixes ( #975 )
2026-03-07 13:56:25 -06:00
Sebastian Raschka
be5e2a3331
Readability and code quality improvements ( #959 )
...
* Consistent dataset naming
* consistent section headers
2026-02-17 18:44:56 -06:00
Maxwell De Jong
e0dbec3331
Fix encoding of multiple preceding spaces in BPE tokenizer. ( #945 )
...
* Fix encoding of multiple preceding spaces in BPE tokenizer.
* Add test
---------
Co-authored-by: rasbt <mail@sebastianraschka.com >
2026-01-10 10:27:23 -06:00
Sebastian Raschka
14c7afaa58
Fix GitHub CI timeout issue for link checker ( #937 )
...
* Fix GitHub CI timeout issue for link checker
* update problematic links
2026-01-02 14:34:31 -06:00
Sebastian Raschka
7ca7c47e4a
Make quote style consistent ( #891 )
2025-10-21 19:42:33 -05:00
Sebastian Raschka
fecfdd16ff
Add simpler BPE, and make previous BPE better ( #870 )
...
* Add simpler BPE, and make previous BPE better
* update
* Update README.md
2025-10-08 22:22:34 -05:00
Sebastian Raschka
7bd263144e
Switch from urllib to requests to improve reliability ( #867 )
...
* Switch from urllib to requests to improve reliability
* Keep ruff linter-specific
* update
* update
* update
2025-10-07 15:22:59 -05:00
rasbt
1412b139f2
main push to sync github ruleset
2025-09-14 11:59:52 -05:00
Sebastian Raschka
18c6b970ab
Add additional notes on debugging SSL issues ( #810 )
...
* Add SSL instructions
* update old pytorch tests
* update
* update
* update
* update
* update
* update
* update
* update
2025-09-06 11:46:50 -05:00
Matthew Hernandez
6f12edb0cc
Fix issue: 731 by resolving semantic error ( #738 )
...
* fix issue 731
* update test path
---------
Co-authored-by: rasbt <mail@sebastianraschka.com >
2025-07-10 16:30:15 -05:00
Matthew Hernandez
83c76891fc
Fix issue 724: unused args ( #726 )
...
* Fix issue 724: unused args
* Update 02_opt_multi_gpu_ddp.py
2025-07-08 06:37:39 -05:00
casinca
564e986496
fix issue #664 - inverted token and pos emb layers ( #665 )
...
* fix inverted token and pos layers
* remove redundant code
---------
Co-authored-by: rasbt <mail@sebastianraschka.com >
2025-06-22 12:15:01 -05:00
Shimpei Kojio
baaa6c9283
fixed video link ( #646 )
2025-06-13 08:16:18 -05:00
Sebastian Raschka
4ff743051e
BPE cosmetics ( #629 )
...
* Llama3 from scratch improvements
* Cosmetic BPE improvements
* restore
* Update ch02/05_bpe-from-scratch/bpe-from-scratch.ipynb
* Update ch02/05_bpe-from-scratch/bpe-from-scratch.ipynb
* endoftext whitespace
2025-04-18 18:57:09 -05:00
Sebastian Raschka
72efebd7f8
add special token handling to bpe from scratch code ( #616 )
2025-04-13 12:38:22 -05:00
Sebastian Raschka
6ea4dd3ae7
Clarify dataset length in chapter 2 ( #589 )
2025-03-30 16:01:37 -05:00
Sebastian Raschka
2f41429cf4
Cosmetic improvements to the BPE code ( #562 )
2025-03-09 10:49:40 -05:00
Sebastian Raschka
f63f04d8d5
Fix BPE bonus materials ( #561 )
...
* Fix BPE bonus materials
* fix bpe implementation
* update
* Add 'Hello, world. Is this-- a test?' test case
* update link to test file
* update path handling
* update path handling
* fix pytest paths
2025-03-08 17:21:30 -06:00
Sebastian Raschka
e9ad6cf86d
add link to supplementary ch02 video ( #553 )
2025-03-02 13:17:42 -06:00
Sebastian Raschka
e7740b3312
Use correct ch02 title ( #551 )
2025-02-28 10:16:21 -06:00
Sebastian Raschka
b1773897d3
Add BPE from scratch link ( #550 )
2025-02-28 09:57:41 -06:00
Kasen
7bd36dccb4
Improve BPE vocabulary saving and pair frequency handling ( #539 )
2025-02-19 09:51:04 -06:00
Kasen
b47884ced0
Fix incorrect indentation ( #536 )
2025-02-18 14:47:31 -06:00
Sebastian Raschka
2dc46bedc6
Fix typo in Ch02 comments ( #516 )
2025-02-04 20:16:07 -06:00
Sebastian Raschka
a22d612be6
Bonus material: extending tokenizers ( #496 )
...
* Bonus material: extending tokenizers
* small wording update
2025-01-22 09:26:54 -06:00
Daniel Kleine
dce46038da
add GPT2TokenizerFast to BPE comparison ( #498 )
...
* added HF BPE Fast
* update benchmarks
* add note about performance
* revert accidental changes
---------
Co-authored-by: rasbt <mail@sebastianraschka.com >
2025-01-22 09:26:44 -06:00
Austin Welch
0f35e370ed
fix: preserve newline tokens in BPE encoder ( #495 )
...
* fix: preserve newline tokens in BPE encoder
* further fixes
* more fixes
---------
Co-authored-by: rasbt <mail@sebastianraschka.com >
2025-01-21 12:47:15 -06:00
Daniel Kleine
60acb94894
BPE: fixed typo ( #492 )
...
* fixed typo
* use rel path if exists
* mod gitignore and use existing vocab files
---------
Co-authored-by: rasbt <mail@sebastianraschka.com >
2025-01-20 20:49:53 -06:00
Sebastian Raschka
0d4967eda6
Implementingthe BPE Tokenizer from Scratch ( #487 )
2025-01-17 12:22:00 -06:00
Henry Shi
b3150eebd8
Print out embeddings for more illustrative learning ( #481 )
...
* print out embeddings for illustrative learning
* suggestion print embeddingcontents
---------
Co-authored-by: rasbt <mail@sebastianraschka.com >
2025-01-13 14:44:06 -06:00
Tao Qian
cec445f146
Minor readability improvement in dataloader.ipynb ( #461 )
...
* Minor readability improvement in dataloader.ipynb
- The tokenizer and encoded_text variables at the root level are unused.
- The default params for create_dataloader_v1 are confusing, especially for the default batch_size 4, which happens to be the same as the max_length.
* readability improvements
---------
Co-authored-by: rasbt <mail@sebastianraschka.com >
2025-01-04 11:26:10 -06:00
Sebastian Raschka
1f61aeb7c4
Note about SSL certificates ( #404 )
2024-10-19 16:27:19 -05:00
Sebastian Raschka
b6c4b2f9f1
Update bonus section formatting ( #400 )
2024-10-12 10:26:08 -05:00
rasbt
b94546aa14
minor spelling fix
2024-09-08 15:35:36 -05:00
Gustavo Monti
34e16991bb
updating REAMDE from chapter 02 inclund 04_bonus section ( #344 )
...
* updating REAMDE from chapter 02 inclund 04_bonus section
* Update ch02/README.md
---------
Co-authored-by: Gustavo Monti Rocha <gustavo.rocha@intelliway.com.br >
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com >
2024-09-05 08:09:46 +02:00
Sebastian Raschka
263eee8921
Test with PyTorch 2.0 and 2.4 ( #290 )
...
* Test with PyTorch 2.0 and 2.4
* Update basic-tests-old-pytorch.yml
* skip version cell
2024-07-27 15:09:02 -05:00
Sebastian Raschka
08040f024c
Test code in pytorch 2.4 ( #285 )
...
* test code in pytorch 2.4
* update
2024-07-24 21:53:41 -05:00
Sebastian Raschka
fa56c80402
Simplify embedding vs linear layer code ( #278 )
2024-07-21 12:21:10 -05:00
Thanh Tran
070a69fc8b
fix typos & inconsistent texts ( #269 )
...
Co-authored-by: TRAN <you@example.com >
2024-07-17 07:34:51 -05:00
rasbt
a33e89c12c
fixes bold font #267
2024-07-16 17:51:15 -05:00
Daniel Kleine
88186bf64a
minor: removed redundant imports ( #260 )
...
* removed duplicated imports
* removed empty cell
2024-07-05 15:33:19 -05:00
rasbt
b92dea8bc6
update decode method
2024-07-05 08:34:27 -05:00
Suman Debnath
2cdcf68598
fixing the regular expression used in the SimpleTokenizer ( #259 )
...
* fixing the regular expression used in the SimpleTokenizer class and a typo in the 2.7 Creating token embedding introduction section
* rerun
---------
Co-authored-by: rasbt <mail@sebastianraschka.com >
2024-07-04 12:27:27 -05:00
rasbt
0988996eb8
update figures
2024-07-02 17:12:42 -05:00
rasbt
31806828d0
add links to summary sections
2024-06-29 07:33:26 -05:00
Sebastian Raschka
7e78b52a30
remove redundant code lines ( #247 )
2024-06-25 21:44:19 -05:00
rasbt
7095e84fab
update with latest versions
2024-06-25 21:09:27 -05:00
Daniel Kleine
81c843bdc0
minor fixes ( #246 )
...
* removed duplicated white spaces
* Update ch07/01_main-chapter-code/ch07.ipynb
* Update ch07/05_dataset-generation/llama3-ollama.ipynb
* removed duplicated white spaces
* fixed title again
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com >
2024-06-25 17:30:30 -05:00
rasbt
283397aaf2
add main and optional sections
2024-06-19 17:48:25 -05:00