Qwen-27B-IQ4_KS for ik_llama.cpp, especially for NVIDIA with 16GB VRAM

Article automatically generated from technical news.

Hi everyone, I'm presenting a new quantization of the Qwen-27B model, created specifically with 16GB VRAM NVIDIA GPUs in mind. I used quants that, unfortunately, are not yet available in the main upstream `llama.cpp`. I'm talking about the KS and KSS quants developed by ikawrakow. After many trials

Fonte originale