VibeThinker-3B: Scaling Reasoning Capabilities via the Spectrum-to-Signal Post-Training Pipeline
VibeThinker-3B is a high-performance 3B dense reasoning model based on Qwen2.5-Coder-3B, utilizing a novel "Spectrum-to-Signal" pipeline to achieve competitive performance on verifiable mathematics and coding benchmarks, rivaling models significantly larger in parameter scale.
Architectural Foundation and Development
VibeThinker-3B is an open-source model released under the MIT license, designed to bring advanced reasoning capabilities to a small-parameter dense architecture. The model is built upon the Qwen2.5-Coder-3B base, leveraging the Spectrum-to-Signal post-training pipeline to optimize its reasoning trajectories.
Benchmark Performance: Mathematics and Code
The model demonstrates exceptional proficiency in verifiable domains, particularly in mathematics and competitive programming, where it competes with systems orders of magnitude larger.
Mathematical Reasoning
VibeThinker-3B shows strong baseline performance across several rigorous benchmarks, which further improves when applying CLR (Correctness-Led Reasoning) test-time scaling:
- AIME26: 94.3% (increases to 97.1% with CLR)
- HMMT25: 89.3% (increases to 95.4% with CLR)
- BruMO25: 93.8% (increases to 99.2% with CLR)
- IMO-AnswerBench: 76.4% (increases to 80.6% with CLR)
Coding and Instruction Following
In addition to its mathematical capabilities, the model maintains high efficiency in software engineering tasks and general instruction following:
- LiveCodeBench v6: 80.2 Pass@1
- OJBench: 38.6%
- IFEval: 93.4% (notably maintaining high instruction-following accuracy even after the reasoning Reinforcement Learning phase)
Technical Analysis
The integration of the Spectrum-to-Signal pipeline allows the model to bridge the gap between standard language modeling and complex reasoning. The ability to maintain a 93.4 score on IFEval suggests that the RL process used for reasoning did not result in the "catastrophic forgetting" or degradation of general instruction-following capabilities often seen in specialized reasoning models.
Note: Detailed technical specifications regarding the internal mechanics of the "Spectrum-to-Signal" pipeline and the specific implementation of the CLR test-time scaling were not provided in the source material.
Original Source