VibeThinker-3B: Scaling Reasoning Capabilities via the Spectrum-to-Signal Post-Training Pipeline

VibeThinker-3B is a high-performance 3B dense reasoning model based on Qwen2.5-Coder-3B, utilizing a novel "Spectrum-to-Signal" pipeline to achieve competitive performance on verifiable mathematics and coding benchmarks, rivaling models significantly larger in parameter scale.

Architectural Foundation and Development

VibeThinker-3B is an open-source model released under the MIT license, designed to bring advanced reasoning capabilities to a small-parameter dense architecture. The model is built upon the Qwen2.5-Coder-3B base, leveraging the Spectrum-to-Signal post-training pipeline to optimize its reasoning trajectories.

Benchmark Performance: Mathematics and Code

The model demonstrates exceptional proficiency in verifiable domains, particularly in mathematics and competitive programming, where it competes with systems orders of magnitude larger.

Mathematical Reasoning

VibeThinker-3B shows strong baseline performance across several rigorous benchmarks, which further improves when applying CLR (Correctness-Led Reasoning) test-time scaling:

  • AIME26: 94.3% (increases to 97.1% with CLR)
  • HMMT25: 89.3% (increases to 95.4% with CLR)
  • BruMO25: 93.8% (increases to 99.2% with CLR)
  • IMO-AnswerBench: 76.4% (increases to 80.6% with CLR)

Coding and Instruction Following

In addition to its mathematical capabilities, the model maintains high efficiency in software engineering tasks and general instruction following:

  • LiveCodeBench v6: 80.2 Pass@1
  • OJBench: 38.6%
  • IFEval: 93.4% (notably maintaining high instruction-following accuracy even after the reasoning Reinforcement Learning phase)

Technical Analysis

The integration of the Spectrum-to-Signal pipeline allows the model to bridge the gap between standard language modeling and complex reasoning. The ability to maintain a 93.4 score on IFEval suggests that the RL process used for reasoning did not result in the "catastrophic forgetting" or degradation of general instruction-following capabilities often seen in specialized reasoning models.

Note: Detailed technical specifications regarding the internal mechanics of the "Spectrum-to-Signal" pipeline and the specific implementation of the CLR test-time scaling were not provided in the source material.

Original Source
Large Language Models Reasoning Models Qwen2.5-Coder Reinforcement Learning Test-Time Scaling Open Source AI