Fix BPE bonus materials (#561)

* Fix BPE bonus materials

* fix bpe implementation

* update

* Add 'Hello, world. Is this-- a test?' test case

* update link to test file

* update path handling

* update path handling

* fix pytest paths
This commit is contained in:
Sebastian Raschka
2025-03-08 17:21:30 -06:00
committed by GitHub
parent 96ca2fcb2f
commit f63f04d8d5
5 changed files with 307 additions and 87 deletions

8
.gitignore vendored
View File

@@ -1,3 +1,4 @@
# Configs and keys
ch05/07_gpt_to_llama/config.json
ch07/02_dataset-utilities/config.json
@@ -63,6 +64,8 @@ ch07/01_main-chapter-code/Smalltestmodel-sft-standalone.pth
ch07/01_main-chapter-code/gpt2/
# Datasets
the-verdict.txt
appendix-E/01_main-chapter-code/sms_spam_collection.zip
appendix-E/01_main-chapter-code/sms_spam_collection
appendix-E/01_main-chapter-code/train.csv
@@ -70,6 +73,7 @@ appendix-E/01_main-chapter-code/test.csv
appendix-E/01_main-chapter-code/validation.csv
ch02/01_main-chapter-code/number-data.txt
ch02/05_bpe-from-scratch/the-verdict.txt
ch05/03_bonus_pretraining_on_gutenberg/gutenberg
ch05/03_bonus_pretraining_on_gutenberg/gutenberg_preprocessed
@@ -107,7 +111,9 @@ ch02/05_bpe-from-scratch/bpe_merges.txt
ch02/05_bpe-from-scratch/encoder.json
ch02/05_bpe-from-scratch/vocab.bpe
ch02/05_bpe-from-scratch/vocab.json
encoder.json
vocab.bpe
vocab.json
# Other
ch0?/0?_user_interface/.chainlit/