ch04/README.md

# Chapter 4: Implementing a GPT Model from Scratch to Generate Text

&nbsp;
## Main Chapter Code

- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code.

&nbsp;
## Bonus Materials

- [02_performance-analysis](02_performance-analysis) contains optional code analyzing the performance of the GPT model(s) implemented in the main chapter
- [03_kv-cache](03_kv-cache) implements a KV cache to speed up the text generation during inference
- [ch05/07_gpt_to_llama](../ch05/07_gpt_to_llama) contains a step-by-step guide for converting a GPT architecture implementation to Llama 3.2 and loads pretrained weights from Meta AI (it might be interesting to look at alternative architectures after completing chapter 4, but you can also save that for after reading chapter 5)
- [04_gqa](04_gqa) contains an introduction to Grouped-Query Attention (GQA), which is used by most modern LLMs (Llama 4, gpt-oss, Qwen3, Gemma 3, and many more) as alternative to regular Multi-Head Attention (MHA)
- [05_mla](05_mla) contains an introduction to Multi-Head Latent Attention (MLA), which is used by DeepSeek V3, as alternative to regular Multi-Head Attention (MHA)
- [06_swa](06_swa) contains an introduction to Sliding Window Attention (SWA), which is used by Gemma 3 and others


In the video below, I provide a code-along session that covers some of the chapter contents as supplementary material.

<br>
<br>

[![Link to the video](https://img.youtube.com/vi/YSAkgEarBGE/0.jpg)](https://www.youtube.com/watch?v=YSAkgEarBGE)
title case 2024-03-27 07:30:09 -05:00			`# Chapter 4: Implementing a GPT Model from Scratch to Generate Text`
add and update readme files 2024-02-05 06:51:58 -06:00
Update bonus section formatting (#400) 2024-10-12 10:26:08 -05:00			` `
distinguish better between main chapter code and bonus materials 2024-06-11 21:07:42 -05:00			`## Main Chapter Code`

flops analysis 2024-05-23 20:35:41 -05:00			`- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code.`
distinguish better between main chapter code and bonus materials 2024-06-11 21:07:42 -05:00
Update bonus section formatting (#400) 2024-10-12 10:26:08 -05:00			` `
			`## Bonus Materials`
add main and optional sections 2024-06-19 17:48:25 -05:00
Update bonus section formatting (#400) 2024-10-12 10:26:08 -05:00			`- [02_performance-analysis](02_performance-analysis) contains optional code analyzing the performance of the GPT model(s) implemented in the main chapter`
Add KV cache (#671) 2025-06-15 09:58:08 -05:00			`- [03_kv-cache](03_kv-cache) implements a KV cache to speed up the text generation during inference`
Update bonus section formatting (#400) 2024-10-12 10:26:08 -05:00			`- [ch05/07_gpt_to_llama](../ch05/07_gpt_to_llama) contains a step-by-step guide for converting a GPT architecture implementation to Llama 3.2 and loads pretrained weights from Meta AI (it might be interesting to look at alternative architectures after completing chapter 4, but you can also save that for after reading chapter 5)`
Multi-Head Latent Attention (#876) * Multi-Head Latent Attention * update 2025-10-11 20:08:30 -05:00			`- [04_gqa](04_gqa) contains an introduction to Grouped-Query Attention (GQA), which is used by most modern LLMs (Llama 4, gpt-oss, Qwen3, Gemma 3, and many more) as alternative to regular Multi-Head Attention (MHA)`
			`- [05_mla](05_mla) contains an introduction to Multi-Head Latent Attention (MLA), which is used by DeepSeek V3, as alternative to regular Multi-Head Attention (MHA)`
sliding window attention (#879) 2025-10-12 22:13:20 -05:00			`- [06_swa](06_swa) contains an introduction to Sliding Window Attention (SWA), which is used by Gemma 3 and others`
distinguish better between main chapter code and bonus materials 2024-06-11 21:07:42 -05:00
add ch04 code along video (#573) 2025-03-17 11:20:55 -05:00

			`In the video below, I provide a code-along session that covers some of the chapter contents as supplementary material.`

			`<br>`
			`<br>`

			`[![Link to the video](https://img.youtube.com/vi/YSAkgEarBGE/0.jpg)](https://www.youtube.com/watch?v=YSAkgEarBGE)`