An independent researcher has analyzed the internal dynamics of Transformers by tracking the geometric trajectories of hidden states across layers during inference. The findings suggest a universal "dynamic grammar" regarding how states move and stabilize, applicable to architectures ranging from GPT-2 to Llama-3.2.
Read original
reddit/r/machinelearningnews