Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

u/Alternative-Cat-1347 2026-05-22 · 22:11 UTC

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Article automatically generated from technical news.

..and on 8GB VRAM I can even push the context to 320K, 400K, 512K, and yes.. 1M. But it does start to slow down noticeably beyond 150k so I'd only do this if I ever really want the larger context. This is using APEX-I-Quality or Q4\_K\_XL quants both are better than Q4\_K\_M (IQ4\_NL\_XL for beyond

Fonte originale

→ View original source

← Back to homepage

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Related Articles

Qwen3.6 27B Pure Quant: 40 tok/s on 16 GB VRAM

How AI-Generated Documents from Deskrib.Ai Can Actually Help You Work Smarter (and Breathe Easier)

warpdotdev /warp

plastic-labs /honcho

I built a powerful RAG and knowledge graph agent that actually runs locally