llama: avoid copying logits during prompt decode in MTP by am17an · Pull Request #23198 · ggml-org/llama.cpp
Optimizing LLM Inference: Avoiding Logit Copying During Prompt Decoding in Llama.cpp A recent update to llama.cpp addresses a performance bottleneck by eliminating redundant logit copying during the prompt decoding phase…
→ View original source