Projection Splitting for Muon Optimizer
Muon and MoE are becoming standard in frontier LLMs, but neither is plug and play at smaller scale. Part I covers getting Muon to work at 280M parameters and why splitting fused projections is the key.
Kirill Luka • March 26, 2026