Commit Graph

32 Commits

Author SHA1 Message Date
Sebastian Raschka
7feb8cad86 Update README.md 2024-08-10 07:54:51 -05:00
Daniel Kleine
13dbc548f8 fixed bash command (#305)
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-08-09 21:29:04 -05:00
TITC
09a3a73f2d remove all non-English texts and notice (#304)
* remove all non-English texts and notice

1. almost 18GB txt left after `is_english` filtered.
2. remove notice use gutenberg's strip_headers
3. after re-run get_data.py, seems all data are under `gutenberg/data/.mirror` folder.

* some improvements

* update readme

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
2024-08-09 17:09:14 -05:00
Sebastian Raschka
cf0df54d7d Show epochs as integers on x-axis (#241)
* Show epochs as integers on x-axis

* Update ch07/01_main-chapter-code/previous_chapters.py

* remove extra s

* modify exercise plots

* update chapter 7 plot

* resave ch07 for better file diff
2024-06-23 07:41:25 -05:00
rasbt
85827e0a0b note about dropout 2024-06-19 17:37:48 -05:00
Daniel Kleine
bbb2a0c3d5 fixed num_workers (#229)
* fixed num_workers

* ch06 & ch07: added num_workers to create_dataloader_v1
2024-06-19 17:36:46 -05:00
Sebastian Raschka
97ed38116a Rename drop_resid to drop_shortcut (#136) 2024-04-28 14:31:27 -05:00
rasbt
72be9f4e8e update numbering 2024-04-22 07:00:20 -05:00
rasbt
868955f6a5 file header 2024-04-22 06:53:38 -05:00
Sebastian Raschka
c70ddff558 Return nan if val loader is empty (#124) 2024-04-20 08:02:30 -05:00
Sebastian Raschka
dd51d4ad83 Make datesets and loaders compatible with multiprocessing (#118) 2024-04-13 13:57:56 -05:00
Sebastian Raschka
e757091301 Organized setup instructions (#115)
* Organized setup instructions

* update tets

* link checker action

* raise error upon broken link

* fix links

* fix links

* delete duplicated paragraph
2024-04-10 22:09:46 -04:00
James Holcombe
05718c6b94 Use instance tokenizer (#116)
* Use instance tokenizer

* consistency updates

---------

Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-04-10 21:16:19 -04:00
Daniel Kleine
61b6e35ddf Added PDF display support to Docker image and VS Code and updated first step for gutenberg project (#111)
* added VS Code extensions recommendations

* Added PDF display support to Docker image and VS Code

* fixed steps to download the dataset
2024-04-08 20:37:55 -04:00
Daniel Kleine
44c0494406 Updated devcontainer, .gitignore and README for gutenberg project (#107)
* added ch05/03_bonus_pretraining_on_gutenberg model checkpoints and preprocessing output folders to .gitignore

* removed prettier extension, added github alerts markdown extension

* specified download instructions and fixed code markdown

* Update ch05/03_bonus_pretraining_on_gutenberg/README.md

* Update ch05/03_bonus_pretraining_on_gutenberg/README.md

---------

Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-04-05 06:53:01 -05:00
Sebastian Raschka
adc2964fc5 Fix Loss in Gutenberg bonus section (#109) 2024-04-04 20:54:09 -05:00
Sebastian Raschka
2de60d1bfb Rename variable to context_length to make it easier on readers (#106)
* rename to context length

* fix spacing
2024-04-04 07:27:41 -05:00
Sebastian Raschka
3829ccdb34 Remove reundant dropout in MLP module (#105) 2024-04-03 20:19:08 -05:00
rasbt
7d1eadd0be update notes 2024-04-02 18:27:13 -05:00
Sebastian Raschka
2fab89d47e Use max size properly 2024-04-02 13:29:23 -05:00
Sebastian Raschka
4a617b8343 Gutenberg for Windows users (#99) 2024-04-02 08:54:24 -05:00
rasbt
f30dd2dd2b improve instructions 2024-04-02 07:12:22 -05:00
rasbt
c10f5c9bf2 suggest galore 2024-03-27 19:58:32 -05:00
rasbt
88b2dd780a make batch loss calculatution more efficient 2024-03-27 07:11:56 -05:00
rasbt
3cb5a52a1b simplify calc_loss_loader 2024-03-26 20:34:50 -05:00
rasbt
12fff1ddcb add endoftext token 2024-03-26 06:47:05 -05:00
rasbt
de576296de simplify .view code 2024-03-25 08:09:31 -05:00
Sebastian Raschka
d4989e01c5 Update README.md 2024-03-25 06:39:43 -05:00
Sebastian Raschka
a2cd8436cb Ch05 supplementary code (#81) 2024-03-19 09:26:26 -05:00
Sebastian Raschka
9d6da22ebb Update pep8 (#78)
* simplify requirements file

* style

* apply linter
2024-03-18 08:16:17 -05:00
rasbt
ee8efcbcf6 fix plotting 2024-03-14 07:41:45 -05:00
rasbt
f2c8eeb6b8 pretraining on project gutenberg 2024-03-13 08:34:39 -05:00