BeeLlama v0.2.0 β major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.
BeeLlama v0.2.0 Unveiled: Massive DFlash Optimization Drives 4.4x+ Acceleration on Local LLMs BeeLlama v0.2.0 introduces a major DFlash update, significantly boosting inference efficiency for large language models (LLMs)β¦
β View original source