mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2026-04-10 12:33:42 +00:00
2.8 KiB
2.8 KiB
Additional Experiments
The table below adds experiments to answer additional questions about various design choices. The first row uses the same settings as the main chapter and is used as a reference. For example,
- comparing rows 1 and 2 answers the question: "What is the performance difference when we train the last or first token?";
- comparing rows 1 and 3 answers the question: "What is the performance difference when we train only the last layer instead of the last block?";
- and so forth.
| Model | Weights | Trainable token | Trainable layers | Context length | CPU/GPU | Training time | Training acc | Validation acc | Test acc | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | gpt2-small (124M) | pretrained | last | last_block | longest train ex. (120) | V100 | 0.39 min | 96.63% | 97.99% | 94.33% |
| 2 | gpt2-small (124M) | pretrained | first | last_block | longest train ex. (120) | V100 | 0.37 min | 78.46% | 80.54% | 75.00% |
| 3 | gpt2-small (124M) | pretrained | last | last_layer | longest train ex. (120) | V100 | 0.33 min | 78.65% | 87.25% | 78.33% |
| 4 | gpt2-small (124M) | pretrained | last | all | longest train ex. (120) | V100 | 0.94 min | 99.62% | 96.64% | 96.33% |
| 5 | gpt2-medium (355M) | pretrained | last | last_block | longest train ex. (120) | V100 | 0.91 min | 87.50% | 51.01% | 56.67% |
| 6 | gpt2-large (774M) | pretrained | last | last_block | longest train ex. (120) | V100 | 1.91 min | 99.52% | 98.66% | 96.67% |
| 7 | gpt2-small (124M) | random | last | all | longest train ex. (120) | V100 | 0.93 min | 100% | 97.32% | 93.00% |
| 8 | gpt2-small (124M) | pretrained | last | last_block | context length (1024) | V100 | 3.24 min | 83.08% | 87.92% | 78.33% |
Usage:
- Row 1:
python additional-experiments.py - Row 2:
python additional-experiments.py --trainable_token first - Row 3:
python additional-experiments.py --trainable_layers last_layer - Row 4:
python additional-experiments.py --trainable_layers all - Row 5:
python additional-experiments.py --model_size gpt2-medium (355M) - Row 6:
python additional-experiments.py --model_size gpt2-large (774M) - Row 7:
python additional-experiments.py --weights random --trainable_layers all - Row 8:
python additional-experiments.py --context_length "model_context_length"