How to Rank Local LLMs by Cost per Correct Answer (Measured GPU Energy, 8 Ollama Models)

Arsen Apostolov 2026-06-23 · 18:11 UTC

How to Rank Local LLMs by Cost per Correct Answer (Measured GPU Energy, 8 Ollama Models)

Article automatically generated from technical news.

TL;DR: I priced 8 local Ollama models by € per 1,000 correct answers — metered GPU energy ÷ correct answers, on one RTX 3090. gemma4:26b won at 96.9% accuracy for €0.013/1k-correct. The most expensive model (qwen3:8b-fp16) cost €0.239/1k and scored worse (66.7%). Reasoning tokens and full precision both cost a lot and bought nothing here. Every cost comes from real metered kWh via the open-s

Fonte originale

How to Rank Local LLMs by Cost per Correct Answer (Measured GPU Energy, 8 Ollama Models)

How to Rank Local LLMs by Cost per Correct Answer (Measured GPU Energy, 8 Ollama Models)

Related Articles

This Week in AI: GLM-5.2 Challenges the Frontier, Agents Mature, and Midjourney Goes Medical

TencentCloud /CubeSandbox

aws /agent-toolkit-for-aws

Claude Tag

I fine-tune small 7B models into single-voice "character modules" instead of prompt-wrapping a persona. ~20 historical/literary voices (Herodotus, Clausewitz, Kafka…), open weights + a free console.