31 tk/s in 3050 6gb vram ,qwen 3.6 28b A3B REAP unsloth

Article automatically generated from technical news.

Update on my local MoE setup after some tuning. Hardware: Lenovo LOQ RTX 3050 Laptop GPU (6GB VRAM) i5 HX 13th Gen 24GB DDR5 dual channel Samsung NVMe SSD Ubuntu + CUDA Model: Qwen3.6-28B-REAP20-A3B-Q4_K_M.gguf Backend: llama.cpp (latest build from source) OpenAI-compatible llama-server Pi Agent Graphify for context compression Permission-gate enabled Current launch: export GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 ./llama-server \ -m /models/Qwen3.6-28B

Fonte originale

31 tk/s in 3050 6gb vram ,qwen 3.6 28b A3B REAP unsloth

31 tk/s in 3050 6gb vram ,qwen 3.6 28b A3B REAP unsloth

Related Articles

Built a DIY Local 2x DGX Spark cluster cooler with automatic temperature controlled fan.

Evaluation & Monitoring Frameworks for Retrieval Systems

jamwithai /production-agentic-rag-course

nesquena /hermes-webui

DeepSWE benchmarks indicate that DeepSeek v4 Pro only passes 8% of tasks