Readability and code quality improvements (#959)

* Consistent dataset naming

* consistent section headers
This commit is contained in:
Sebastian Raschka
2026-02-17 19:44:56 -05:00
committed by GitHub
parent 7b1f740f74
commit be5e2a3331
48 changed files with 419 additions and 297 deletions

View File

@@ -117,7 +117,7 @@
"<br>\n",
"&nbsp;\n",
"\n",
"## 1) CausalAttention MHA wrapper class from chapter 3"
"## 1. CausalAttention MHA wrapper class from chapter 3"
]
},
{
@@ -208,7 +208,7 @@
"<br>\n",
"&nbsp;\n",
"\n",
"## 2) The multi-head attention class from chapter 3"
"## 2. The multi-head attention class from chapter 3"
]
},
{
@@ -311,7 +311,7 @@
"<br>\n",
"&nbsp;\n",
"\n",
"## 3) An alternative multi-head attention with combined weights"
"## 3. An alternative multi-head attention with combined weights"
]
},
{
@@ -435,7 +435,7 @@
"<br>\n",
"&nbsp;\n",
"\n",
"## 4) Multi-head attention with Einsum\n",
"## 4. Multi-head attention with Einsum\n",
"\n",
"- Implementing multi-head attention using Einstein summation via [`torch.einsum`](https://pytorch.org/docs/stable/generated/torch.einsum.html)"
]
@@ -567,7 +567,7 @@
"<br>\n",
"&nbsp;\n",
"\n",
"## 5) Multi-head attention with PyTorch's scaled dot product attention and FlashAttention"
"## 5. Multi-head attention with PyTorch's scaled dot product attention and FlashAttention"
]
},
{
@@ -676,7 +676,7 @@
"<br>\n",
"&nbsp;\n",
"\n",
"## 6) PyTorch's scaled dot product attention without FlashAttention\n",
"## 6. PyTorch's scaled dot product attention without FlashAttention\n",
"\n",
"- This is similar to above, except that we disable FlashAttention by passing an explicit causal mask"
]
@@ -785,7 +785,7 @@
"<br>\n",
"&nbsp;\n",
"\n",
"## 7) Using PyTorch's torch.nn.MultiheadAttention"
"## 7. Using PyTorch's torch.nn.MultiheadAttention"
]
},
{
@@ -883,7 +883,7 @@
"<br>\n",
"&nbsp;\n",
"\n",
"## 8) Using PyTorch's torch.nn.MultiheadAttention with `scaled_dot_product_attention`"
"## 8. Using PyTorch's torch.nn.MultiheadAttention with `scaled_dot_product_attention`"
]
},
{
@@ -948,7 +948,7 @@
"<br>\n",
"&nbsp;\n",
"\n",
"## 9) Using PyTorch's FlexAttention\n",
"## 9. Using PyTorch's FlexAttention\n",
"\n",
"- See [FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention](https://pytorch.org/blog/flexattention/) to learn more about FlexAttention\n",
"- FlexAttention caveat: It currently doesn't support dropout\n",
@@ -1108,7 +1108,18 @@
"<br>\n",
"&nbsp;\n",
"\n",
"## Quick speed comparison (M3 Macbook Air CPU)"
"## 10. Quick speed comparisons"
]
},
{
"cell_type": "markdown",
"id": "992e28f4-a6b9-4dd3-9705-30d0b9f4b5f0",
"metadata": {},
"source": [
"<br>\n",
"&nbsp;\n",
"\n",
"### 10.1 Speed comparisons on M3 Macbook Air CPU"
]
},
{
@@ -1361,7 +1372,7 @@
"<br>\n",
"&nbsp;\n",
"\n",
"## Quick speed comparison (Nvidia A100 GPU)"
"### 10.2 Quick speed comparison on Nvidia A100 GPU"
]
},
{
@@ -1643,7 +1654,18 @@
"&nbsp;\n",
"\n",
"\n",
"# Visualizations"
"## 11. Visualizations"
]
},
{
"cell_type": "markdown",
"id": "e6baf5ce-45ac-4e26-9523-5c32b82dc784",
"metadata": {},
"source": [
"<br>\n",
"&nbsp;\n",
"\n",
"### 11.1 Visualization utility functions"
]
},
{
@@ -1752,7 +1774,8 @@
"id": "4df834dc"
},
"source": [
"## Speed comparison (Nvidia A100 GPU) with warmup (forward pass only)"
"&nbsp;\n",
"### 11.2 Speed comparison (Nvidia A100 GPU) with warmup (forward pass only)"
]
},
{
@@ -1834,7 +1857,7 @@
"&nbsp;\n",
"\n",
"\n",
"## Speed comparison (Nvidia A100 GPU) with warmup (forward and backward pass)"
"### 11.3 Speed comparison (Nvidia A100 GPU) with warmup (forward and backward pass)"
]
},
{
@@ -1920,7 +1943,7 @@
"&nbsp;\n",
"\n",
"\n",
"## Speed comparison (Nvidia A100 GPU) with warmup and compilation (forward and backward pass)"
"### 11.4 Speed comparison (Nvidia A100 GPU) with warmup and compilation (forward and backward pass)"
]
},
{