31 tk/s in 3050 6gb vram ,qwen 3.6 28b A3B REAP unsloth
Article automatically generated from technical news.
Update on my local MoE setup after some tuning. Hardware: Lenovo LOQ RTX 3050 Laptop GPU (6GB VRAM) i5 HX 13th Gen 24GB DDR5 dual channel Samsung NVMe SSD Ubuntu + CUDA Model: Qwen3.6-28B-REAP20-A3B-Q4_K_M.gguf Backend: llama.cpp (latest build from source) OpenAI-compatible llama-server Pi Agent Graphify for context compression Permission-gate enabled Current launch: export GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 ./llama-server \ -m /models/Qwen3.6-28B
Fonte originale