Commit Graph

71 Commits

Author SHA1 Message Date
Sebastian Raschka
222803737d Fix data download if UCI is temporarily down (#592) 2025-03-31 16:25:53 -05:00
Sebastian Raschka
c21bfe4a23 Add PyPI package (#576)
* Add PyPI package

* fixes

* fixes
2025-03-23 19:28:49 -05:00
Sebastian Raschka
86b714a5e0 Specify UTF-8 encoding in the json load command explicitely (#557) 2025-03-05 11:46:21 -06:00
Sebastian Raschka
d1e99f6092 Fix timeout issue related to spam data backup url (#544)
* Add backup url for Spam Dataset

* import urllib

* fix url

* fix timeout issue
2025-02-20 09:26:23 -06:00
Sebastian Raschka
c39aa32ef5 Add backup url for Spam Dataset (#543)
* Add backup url for Spam Dataset

* import urllib

* fix url
2025-02-20 08:08:28 -06:00
Sebastian Raschka
a08d7aaa84 Uv workflow improvements (#531)
* Uv workflow improvements

* Uv workflow improvements

* linter improvements

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* windows fixes

* windows fixes

* windows fixes

* windows fixes

* windows fixes

* windows fixes

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix
2025-02-16 13:16:51 -06:00
Sebastian Raschka
a6cc574605 Upgrade to NumPy 2.0 (#520)
* Upgrade to NumPy 2.0

* bump pytorch

* bump pytorch

* bump pytorch

* bump pytorch

* bump pytorch

* update

* update packages
2025-02-09 06:21:58 -06:00
Sebastian Raschka
8cfa52bf1d More pythonic way to find the longest sequence (#512)
* More pythonic way to find the longest sequence

* pep8 fix
2025-02-01 10:22:47 -06:00
Sebastian Raschka
701090815e Add backup URL for gpt2 weights (#469)
* Add backup URL for gpt2 weights

* newline
2025-01-05 11:28:09 -06:00
Sebastian Raschka
f6281ab91b Add utility to prevent double execution of certain cells (#437) 2024-11-14 19:56:49 +09:00
rasbt
a20ce1b817 remove redundant code line 2024-10-13 15:58:11 -05:00
Sebastian Raschka
7ef5129e18 Fix truncation issue in classify_review function (#373) 2024-09-25 19:54:36 -05:00
Sebastian Raschka
52ee1c7cdb Add missing bullet point 2024-09-21 12:59:12 -05:00
Mingyuan Xu
f77c376b05 Run generate example in ch06 optionally on GPU (#352)
* model.to("cuda")

model.to("cuda")

* update device placement

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
2024-09-13 08:01:52 -05:00
Sebastian Raschka
c443035d56 Note about MPS in ch06 and ch07 (#325) 2024-08-19 08:11:33 -05:00
TITC
38390b2a8d track tokens seen in chapter5, track examples seen in chapter6 (#319) 2024-08-13 07:09:05 -05:00
Sebastian Raschka
08040f024c Test code in pytorch 2.4 (#285)
* test code in pytorch 2.4

* update
2024-07-24 21:53:41 -05:00
Sebastian Raschka
8d02cb1cee Add download help message (#274) 2024-07-19 08:29:29 -05:00
Sebastian Raschka
4f0a107692 show how to use the finetuned model 2024-07-09 06:43:26 -07:00
Sebastian Raschka
f6bcdd37bd Fix links in summary sections (#254) 2024-06-29 07:51:31 -05:00
rasbt
31806828d0 add links to summary sections 2024-06-29 07:33:26 -05:00
Daniel Kleine
1e69c8e0b5 fixed minor issues (#252)
* fixed typo

* fixed var name in md text
2024-06-29 06:38:25 -05:00
Daniel Kleine
06921f3333 minor markdown fixes (#236) 2024-06-21 13:55:34 -05:00
Sebastian Raschka
6c0dc2362b Add standalone finetuning and evaluation scripts for chapter 7 (#234)
* add finetuning and eval scripts

* update link

* update links

* fix link
2024-06-21 05:23:24 -05:00
rasbt
283397aaf2 add main and optional sections 2024-06-19 17:48:25 -05:00
Daniel Kleine
bbb2a0c3d5 fixed num_workers (#229)
* fixed num_workers

* ch06 & ch07: added num_workers to create_dataloader_v1
2024-06-19 17:36:46 -05:00
Jinge Wang
10018e00ff Fixed some typos in ch06.ipynb (#219) 2024-06-18 05:54:01 -05:00
rasbt
3e0b0c66a8 fix spelling 2024-06-18 05:50:40 -05:00
rasbt
19c5784f82 replace figure 2024-06-18 05:46:36 -05:00
Daniel Kleine
dcbdc1d2e5 fixes for code (#206)
* updated .gitignore

* removed unused GELU import

* fixed model_configs, fixed all tensors on same device

* removed unused tiktoken

* update

* update hparam search

* remove redundant tokenizer argument

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
2024-06-11 20:59:48 -05:00
rasbt
1b1fd21d64 fix typo in comment 2024-06-09 06:14:02 -05:00
Sebastian Raschka
72a073bbbf Remove leftover instances of self.tokenizer (#201)
* Remove leftover instances of self.tokenizer

* add endoftext token
2024-06-08 14:57:34 -05:00
rasbt
98d453b666 update formatting 2024-05-24 07:20:37 -05:00
rasbt
18e729643d add assertion about data set length 2024-05-23 06:50:43 -05:00
rasbt
86f6c2df43 Fix device setting 2024-05-22 17:51:51 -05:00
rasbt
a8a28017c0 remove duplicated text 2024-05-19 11:34:47 -05:00
rasbt
02e6f06a11 add test mode for dataset download 2024-05-18 17:38:19 -05:00
rasbt
c7c83904a0 tokens seen -> examples seen 2024-05-13 20:08:48 -05:00
rasbt
16d19751b0 spelling 2024-05-13 20:06:38 -05:00
rasbt
cd7ea15e8d add readme 2024-05-13 08:50:55 -05:00
rasbt
b28cc0cb8c pep8 fixes 2024-05-13 07:50:51 -05:00
rasbt
a740a62239 tests and exercises 2024-05-13 07:45:59 -05:00
rasbt
8bc15ab316 fix tests 2024-05-12 19:03:14 -05:00
rasbt
21172a6a7e add chapter 6 unit test 2024-05-12 18:51:28 -05:00
rasbt
281400feca add missing figure 2024-05-12 18:37:02 -05:00
rasbt
88176a82eb chapter 06 summary file 2024-05-12 18:27:50 -05:00
rasbt
2e47a6e61c update dataset naming 2024-05-12 09:22:42 -05:00
rasbt
55c3a91838 rename download_and_unzip to make it more specific 2024-05-12 08:36:24 -05:00
rasbt
4b4e1e1ad5 use spam / not spam labels 2024-05-11 13:42:18 -05:00
rasbt
02ad1bef3a reorder section 6.6 2024-05-11 08:27:07 -05:00