Files
LLMs-from-scratch/ch04/README.md

26 lines
1.5 KiB
Markdown
Raw Normal View History

2024-03-27 07:30:09 -05:00
# Chapter 4: Implementing a GPT Model from Scratch to Generate Text
2024-02-05 06:51:58 -06:00
2024-10-12 10:26:08 -05:00
 
## Main Chapter Code
2024-05-23 20:35:41 -05:00
- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code.
2024-10-12 10:26:08 -05:00
 
## Bonus Materials
2024-06-19 17:48:25 -05:00
2024-10-12 10:26:08 -05:00
- [02_performance-analysis](02_performance-analysis) contains optional code analyzing the performance of the GPT model(s) implemented in the main chapter
2025-06-15 09:58:08 -05:00
- [03_kv-cache](03_kv-cache) implements a KV cache to speed up the text generation during inference
2024-10-12 10:26:08 -05:00
- [ch05/07_gpt_to_llama](../ch05/07_gpt_to_llama) contains a step-by-step guide for converting a GPT architecture implementation to Llama 3.2 and loads pretrained weights from Meta AI (it might be interesting to look at alternative architectures after completing chapter 4, but you can also save that for after reading chapter 5)
- [04_gqa](04_gqa) contains an introduction to Grouped-Query Attention (GQA), which is used by most modern LLMs (Llama 4, gpt-oss, Qwen3, Gemma 3, and many more) as alternative to regular Multi-Head Attention (MHA)
- [05_mla](05_mla) contains an introduction to Multi-Head Latent Attention (MLA), which is used by DeepSeek V3, as alternative to regular Multi-Head Attention (MHA)
2025-10-12 22:13:20 -05:00
- [06_swa](06_swa) contains an introduction to Sliding Window Attention (SWA), which is used by Gemma 3 and others
2025-03-17 11:20:55 -05:00
In the video below, I provide a code-along session that covers some of the chapter contents as supplementary material.
<br>
<br>
[![Link to the video](https://img.youtube.com/vi/YSAkgEarBGE/0.jpg)](https://www.youtube.com/watch?v=YSAkgEarBGE)