VibeThinker-3B: Scaling Reasoning Capabilities via the Spectrum-to-Signal Post-Training Pipeline

VibeThinker-3B is a high-performance 3B dense reasoning model based on Qwen2.5-Coder-3B, utilizing a novel "Spectrum-to-Signal" pipeline to achieve competitive performance on verifiable mathematics and coding benchmarks, rivaling models significantly larger in parameter scale.

Architectural Foundation and Development

VibeThinker-3B is an open-source model released under the MIT license, designed to bring advanced reasoning capabilities to a small-parameter dense architecture. The model is built upon the Qwen2.5-Coder-3B base, leveraging the Spectrum-to-Signal post-training pipeline to optimize its reasoning trajectories.

Benchmark Performance: Mathematics and Code

The model demonstrates exceptional proficiency in verifiable domains, particularly in mathematics and competitive programming, where it competes with systems orders of magnitude larger.

Mathematical Reasoning

VibeThinker-3B shows strong baseline performance across several rigorous benchmarks, which further improves when applying CLR (Correctness-Led Reasoning) test-time scaling:

AIME26: 94.3% (increases to 97.1% with CLR)
HMMT25: 89.3% (increases to 95.4% with CLR)
BruMO25: 93.8% (increases to 99.2% with CLR)
IMO-AnswerBench: 76.4% (increases to 80.6% with CLR)

Coding and Instruction Following

In addition to its mathematical capabilities, the model maintains high efficiency in software engineering tasks and general instruction following:

LiveCodeBench v6: 80.2 Pass@1
OJBench: 38.6%
IFEval: 93.4% (notably maintaining high instruction-following accuracy even after the reasoning Reinforcement Learning phase)

Technical Analysis

The integration of the Spectrum-to-Signal pipeline allows the model to bridge the gap between standard language modeling and complex reasoning. The ability to maintain a 93.4 score on IFEval suggests that the RL process used for reasoning did not result in the "catastrophic forgetting" or degradation of general instruction-following capabilities often seen in specialized reasoning models.

Note: Detailed technical specifications regarding the internal mechanics of the "Spectrum-to-Signal" pipeline and the specific implementation of the CLR test-time scaling were not provided in the source material.

Original Source

Large Language Models Reasoning Models Qwen2.5-Coder Reinforcement Learning Test-Time Scaling Open Source AI

Techyon

VibeThinker-3B: A 3B Dense Reasoning Model Built on Qwen2.5-Coder-3B With the Spectrum-to-Signal Post-Training Pipeline

VibeThinker-3B: Scaling Reasoning Capabilities via the Spectrum-to-Signal Post-Training Pipeline

Architectural Foundation and Development

Benchmark Performance: Mathematics and Code

Mathematical Reasoning

Coding and Instruction Following

Technical Analysis

VibeThinker-3B: A 3B Dense Reasoning Model Built on Qwen2.5-Coder-3B With the Spectrum-to-Signal Post-Training Pipeline

VibeThinker-3B: Scaling Reasoning Capabilities via the Spectrum-to-Signal Post-Training Pipeline

Architectural Foundation and Development

Benchmark Performance: Mathematics and Code

Mathematical Reasoning

Coding and Instruction Following

Technical Analysis

Related Articles

🚀 relay-ai: a CLI that routes any AI provider into Claude Code, Codex (CLI & App), and Claude Desktop / Cowork

Retrieval Augmented Generation (RAG) in Large Language Model(LLMs)

Nobel Winner John Jumper to Leave Google DeepMind for Anthropic

Amazon drops Sam Altman movie after announcing OpenAI partnership

FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows