Qwen3.5-9B on RTX 5060 8GB VRAM: The llama.cpp settings + quants that finally made reliable local agents work

Article automatically generated from technical news.

After spending the last couple of weeks testing different Qwen3.5-9B GGUF variants on my RTX 5060 8GB setup, I finally landed on a configuration that gives me usable speed and reliable agent behavior for browser automation tasks. My Hardware RTX 5060 8GB Ryzen 5 3600 + 32GB RAM Running mostly through LM Studio with llama.cpp backend (also tested pure llama-server) What Worked Best The variant that gave me the best balance was Qwen3.5-9B-Agency-Architect (GGUF). I also tested

Fonte originale