Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Article automatically generated from technical news.

..and on 8GB VRAM I can even push the context to 320K, 400K, 512K, and yes.. 1M. But it does start to slow down noticeably beyond 150k so I'd only do this if I ever really want the larger context. This is using APEX-I-Quality or Q4\_K\_XL quants both are better than Q4\_K\_M (IQ4\_NL\_XL for beyond

Fonte originale