Files
LLMs-from-scratch/ch05/16_qwen3.5
Sebastian Raschka ae8eebf0d7 Use full HF url
2026-03-03 16:38:05 -06:00
..
2026-03-03 16:31:16 -06:00
2026-03-03 16:38:05 -06:00
2026-03-03 16:31:16 -06:00
2026-03-03 16:31:16 -06:00

Qwen3.5 0.8B From Scratch

This folder contains a from-scratch style implementation of Qwen/Qwen3.5-0.8B.

Qwen3.5 is based on the Qwen3-Next architecture, which I described in more detail in section 2. (Linear) Attention Hybrids of my Beyond Standard LLMs article

Note that Qwen3.5 alternates linear_attention and full_attention layers.
The notebooks keep the full model flow readable while reusing the linear-attention building blocks from the qwen3_5_transformers.py, which contains the linear attention code from Hugging Face under an Apache version 2.0 open source license.

 

Files