The author benchmarked Qwen 3.6 27B with VLLM using BF16, FP8, and NVFP4 quantizations via llama benchy. NVFP4 delivers the fastest inference but suffers from looping problems in copilot mode and fapaneng less detailed agent responses. BF16 avoids these issues, while FP8 offers a balanced trade‑off, making it the preferred choice according to the results.

Read original