From bcc73f731d09cec9c091b4ed563eed68fbdeecf0 Mon Sep 17 00:00:00 2001 From: Sebastian Raschka Date: Wed, 5 Nov 2025 18:28:37 -0600 Subject: [PATCH] =?UTF-8?q?n=5Fheads=20=C3=97=20d=5Fhead=20->=20d=5Fhead?= =?UTF-8?q?=20=C3=97=20d=5Fhead=20in=20DeltaNet=20(#903)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Clarified the explanation of the memory size calculation for `KV_cache_DeltaNet` and updated the quadratic term from `n_heads × d_head` to `d_head × d_head`. --- ch04/08_deltanet/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch04/08_deltanet/README.md b/ch04/08_deltanet/README.md index ca533e0..ca50fe5 100644 --- a/ch04/08_deltanet/README.md +++ b/ch04/08_deltanet/README.md @@ -331,7 +331,7 @@ For the simplified DeltaNet version implemented above, we have: KV_cache_DeltaNet = batch_size × n_heads × d_head × d_head × bytes ``` -Note that the `KV_cache_DeltaNet` memory size doesn't have a context length (`n_tokens`) dependency. Also, we have only the memory state S that we store instead of separate keys and values, hence `2 × bytes` becomes just `bytes`. However, note that we now have a quadratic `n_heads × d_head` in here. This comes from the state : +Note that the `KV_cache_DeltaNet` memory size doesn't have a context length (`n_tokens`) dependency. Also, we have only the memory state S that we store instead of separate keys and values, hence `2 × bytes` becomes just `bytes`. However, note that we now have a quadratic `d_head × d_head` in here. This comes from the state : ``` S = x.new_zeros(b, self.num_heads, self.head_dim, self.head_dim)