We all repeat Q4/Q6 is fine... Has anyone else watched a small model's strict JSON collapse at Q6 while fp16 was perfect?
Article automatically generated from technical news.
I was running strict JSON output on a small model, around 1.5B, when I hit something odd. fp16 was fine. Q8_0 was fine too. But the moment I dropped to Q6_K, the one everyone calls "nearly lossless", the JSON completely fell apart. Enum values without their quotes, broken braces, free text showing up where enum values should be. Nothing changed except the quantization level. The model was clearly still "smart" in some sense, still capable of reasoning, but it couldn't hold
Fonte originale