GLM-5.2 753B (IQ1_S) fully local across 2×M5 Max over one TB5 cable — ~16 tok/s, llama.cpp RPC [video]
Article automatically generated from technical news.
Model: GLM-5.2, 753B params. Unsloth's dynamic IQ1_S build. It's labeled ~1.6 bits, but the dynamic mix keeps some layers at higher precision, so it lands around 2.1 bits effective — 202GB on disk. Lossy, obviously, but it stays coherent and follows instructions. Setup: Two M5 Max (128GB each) pooled to 256GB unified memory, wired with a single Thunderbolt 5 cable running llama.cpp's RPC backend. One box serves the endpoint and holds half the weights, the other holds the rest.
Fonte originale