Collecting Strix Halo / Ryzen AI MAX+ 395 local LLM results: llama.cpp Vulkan/RADV, Ollama, ROCm/HIP

Article automatically generated from technical news.

I’m collecting reproducible local LLM results for AMD Strix Halo / Ryzen AI MAX+ 395 systems. My current fastest direct llama.cpp row: - Hardware: Beelink GTR9 Pro - APU: Ryzen AI MAX+ 395 / Radeon 8060S - Memory: 128GB LPDDR5X unified memory - OS: Ubuntu 24.04 - Backend: llama.cpp Vulkan/RADV - Model: Qwen3-Coder 30B-A3B - Quant: Q4_K_S - Result: 98.51 t/s tg128, direct llama-bench Other rows I’m tracking: - Qwen3-Coder 30B-A3B UD-Q4_K_XL: 96.76 t/s tg128 - Qwen3.6 35

Fonte originale

Collecting Strix Halo / Ryzen AI MAX+ 395 local LLM results: llama.cpp Vulkan/RADV, Ollama, ROCm/HIP

Collecting Strix Halo / Ryzen AI MAX+ 395 local LLM results: llama.cpp Vulkan/RADV, Ollama, ROCm/HIP

Related Articles

I built LuckyCLI: a terminal coding agent with OAuth providers and a local project knowledge graph

Bedrock Codex, Robust MILP, Multi‑Model Deliberation, Tree‑Based Molecule Ops, and MoE Quantization

0xPlaygrounds /rig

0x4m4 /hexstrike-ai

Google ordered to put clearer links in AI search and let UK publishers opt out