Files
LLMs-from-scratch/ch02
Daniel Kleine 9175590ea4 add GPT2TokenizerFast to BPE comparison (#498)
* added HF BPE Fast

* update benchmarks

* add note about performance

* revert accidental changes

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
2025-01-22 09:26:44 -06:00
..

Chapter 2: Working with Text Data

 

Main Chapter Code

 

Bonus Materials

  • 02_bonus_bytepair-encoder contains optional code to benchmark different byte pair encoder implementations

  • 03_bonus_embedding-vs-matmul contains optional (bonus) code to explain that embedding layers and fully connected layers applied to one-hot encoded vectors are equivalent.

  • 04_bonus_dataloader-intuition contains optional (bonus) code to explain the data loader more intuitively with simple numbers rather than text.