NVIDIA Introduces Nemotron-TwoTower-30B: A Hybrid Diffusion-Based Language Model

NVIDIA has unveiled the Nemotron-TwoTower-30B-A3B-Base-BF16, a novel architectural approach to language modeling that integrates a diffusion denoiser tower with a frozen autoregressive backbone to significantly accelerate token generation.

Architectural Innovation: The Two-Tower Approach

Departing from the standard autoregressive paradigm where tokens are generated strictly one by one, the Nemotron-TwoTower-30B-A3B-Base-BF16 employs a unique dual-tower architecture. This model is built upon the Nemotron 3 Nano 30B-A3B backbone, utilizing a hybrid mechanism to optimize inference throughput.

The system consists of two primary components:

Autoregressive Context Tower: A frozen component that provides the necessary contextual grounding for the generation process.
Diffusion Denoiser Tower: A specialized tower that iteratively fills blocks of tokens in parallel, rather than sequentially.

Performance and Efficiency Gains

According to NVIDIA, this mask-diffusion setup allows the model to generate multiple tokens simultaneously, drastically reducing the time required for output generation. Technical benchmarks indicate that the model achieves a 2.42× increase in wall-clock speed compared to traditional autoregressive methods.

Crucially, this increase in speed does not come at a significant cost to accuracy. NVIDIA reports that the model retains 98.7% of the aggregate benchmark quality of its autoregressive baseline, suggesting that the diffusion-based approach is a viable alternative for high-throughput LLM deployments.

Note: The provided source material contains a truncated description; specific details regarding the exact training methodology and the full set of benchmarks are not available.

Original Source

NVIDIA Nemotron Diffusion Models LLM Architecture Inference Optimization BF16

Techyon

NVIDIA has released Nemotron-TwoTower-30B-A3B-Base-BF16, an unusual diffusion-based language model built from the Nemotron 3 Nano 30B-A3B backbone.

NVIDIA Introduces Nemotron-TwoTower-30B: A Hybrid Diffusion-Based Language Model

Architectural Innovation: The Two-Tower Approach

Performance and Efficiency Gains

NVIDIA has released Nemotron-TwoTower-30B-A3B-Base-BF16, an unusual diffusion-based language model built from the Nemotron 3 Nano 30B-A3B backbone.

NVIDIA Introduces Nemotron-TwoTower-30B: A Hybrid Diffusion-Based Language Model

Architectural Innovation: The Two-Tower Approach

Performance and Efficiency Gains

Related Articles

For users with 4x-8x 6000 PROs, how is your experience with bigger models lately? (GLM 5.2, Kimi 2.7, DeepSeek V4 Pro)

Apple’s Siri AI at WWDC: How a Voice-First Agent Strategy Could Move the Stock and Reshape the AI Race

Baidu Releases Unlimited OCR, a 3B Model That Keeps the KV Cache Flat for Long-Document Parsing

Bible as RAG Database

V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning