mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2026-04-10 12:33:42 +00:00
Readability and code quality improvements (#959)
* Consistent dataset naming * consistent section headers
This commit is contained in:
committed by
GitHub
parent
7b1f740f74
commit
be5e2a3331
@@ -117,7 +117,7 @@
|
||||
"<br>\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"## 1) CausalAttention MHA wrapper class from chapter 3"
|
||||
"## 1. CausalAttention MHA wrapper class from chapter 3"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -208,7 +208,7 @@
|
||||
"<br>\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"## 2) The multi-head attention class from chapter 3"
|
||||
"## 2. The multi-head attention class from chapter 3"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -311,7 +311,7 @@
|
||||
"<br>\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"## 3) An alternative multi-head attention with combined weights"
|
||||
"## 3. An alternative multi-head attention with combined weights"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -435,7 +435,7 @@
|
||||
"<br>\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"## 4) Multi-head attention with Einsum\n",
|
||||
"## 4. Multi-head attention with Einsum\n",
|
||||
"\n",
|
||||
"- Implementing multi-head attention using Einstein summation via [`torch.einsum`](https://pytorch.org/docs/stable/generated/torch.einsum.html)"
|
||||
]
|
||||
@@ -567,7 +567,7 @@
|
||||
"<br>\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"## 5) Multi-head attention with PyTorch's scaled dot product attention and FlashAttention"
|
||||
"## 5. Multi-head attention with PyTorch's scaled dot product attention and FlashAttention"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -676,7 +676,7 @@
|
||||
"<br>\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"## 6) PyTorch's scaled dot product attention without FlashAttention\n",
|
||||
"## 6. PyTorch's scaled dot product attention without FlashAttention\n",
|
||||
"\n",
|
||||
"- This is similar to above, except that we disable FlashAttention by passing an explicit causal mask"
|
||||
]
|
||||
@@ -785,7 +785,7 @@
|
||||
"<br>\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"## 7) Using PyTorch's torch.nn.MultiheadAttention"
|
||||
"## 7. Using PyTorch's torch.nn.MultiheadAttention"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -883,7 +883,7 @@
|
||||
"<br>\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"## 8) Using PyTorch's torch.nn.MultiheadAttention with `scaled_dot_product_attention`"
|
||||
"## 8. Using PyTorch's torch.nn.MultiheadAttention with `scaled_dot_product_attention`"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -948,7 +948,7 @@
|
||||
"<br>\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"## 9) Using PyTorch's FlexAttention\n",
|
||||
"## 9. Using PyTorch's FlexAttention\n",
|
||||
"\n",
|
||||
"- See [FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention](https://pytorch.org/blog/flexattention/) to learn more about FlexAttention\n",
|
||||
"- FlexAttention caveat: It currently doesn't support dropout\n",
|
||||
@@ -1108,7 +1108,18 @@
|
||||
"<br>\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"## Quick speed comparison (M3 Macbook Air CPU)"
|
||||
"## 10. Quick speed comparisons"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "992e28f4-a6b9-4dd3-9705-30d0b9f4b5f0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<br>\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"### 10.1 Speed comparisons on M3 Macbook Air CPU"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -1361,7 +1372,7 @@
|
||||
"<br>\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"## Quick speed comparison (Nvidia A100 GPU)"
|
||||
"### 10.2 Quick speed comparison on Nvidia A100 GPU"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -1643,7 +1654,18 @@
|
||||
" \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Visualizations"
|
||||
"## 11. Visualizations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e6baf5ce-45ac-4e26-9523-5c32b82dc784",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<br>\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"### 11.1 Visualization utility functions"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -1752,7 +1774,8 @@
|
||||
"id": "4df834dc"
|
||||
},
|
||||
"source": [
|
||||
"## Speed comparison (Nvidia A100 GPU) with warmup (forward pass only)"
|
||||
" \n",
|
||||
"### 11.2 Speed comparison (Nvidia A100 GPU) with warmup (forward pass only)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -1834,7 +1857,7 @@
|
||||
" \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Speed comparison (Nvidia A100 GPU) with warmup (forward and backward pass)"
|
||||
"### 11.3 Speed comparison (Nvidia A100 GPU) with warmup (forward and backward pass)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -1920,7 +1943,7 @@
|
||||
" \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Speed comparison (Nvidia A100 GPU) with warmup and compilation (forward and backward pass)"
|
||||
"### 11.4 Speed comparison (Nvidia A100 GPU) with warmup and compilation (forward and backward pass)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user