mistral.rs v0.8.2: up to 2.8x faster CUDA inference than llama.cpp on GB10, B200, and H100

Article automatically generated from technical news.

Hey all! I’ve been working on CUDA performance in mistral.rs, and v0.8.2 is focused on CUDA throughput. The result: on Gemma 4 (dense & MoE), mistral.rs is faster than llama.cpp at every point in my release sweep on GB10/H100/B200. See some results below on GB10 and B200: https://preview.redd.it/jmdsjkrbfo4h1.png?width=3312&format=png&auto=webp&s=8a69286b73a8fad4edc671cb9ca8ad3f3cd74d1c The full report includes all steps to reproduce these results. The

Fonte originale