Implementingthe BPE Tokenizer from Scratch (#487)

This commit is contained in:
Sebastian Raschka
2025-01-17 12:22:00 -06:00
committed by GitHub
parent 2fef2116a6
commit 0d4967eda6
4 changed files with 1463 additions and 86 deletions

View File

@@ -1900,7 +1900,9 @@
"source": [
"See the [./dataloader.ipynb](./dataloader.ipynb) code notebook, which is a concise version of the data loader that we implemented in this chapter and will need for training the GPT model in upcoming chapters.\n",
"\n",
"See [./exercise-solutions.ipynb](./exercise-solutions.ipynb) for the exercise solutions."
"See [./exercise-solutions.ipynb](./exercise-solutions.ipynb) for the exercise solutions.\n",
"\n",
"See the [Byte Pair Encoding (BPE) Tokenizer From Scratch](../02_bonus_bytepair-encoder/compare-bpe-tiktoken.ipynb) notebook if you are interested in learning how the GPT-2 tokenizer can be implemented and trained from scratch."
]
}
],