Implementing a Reasoning Toggle for llama.cpp Web Chat via Tampermonkey
A community-developed userscript now enables a "Think" toggle button for the llama.cpp web interface, allowing users to show or hide reasoning chains for models like Qwen 2.5/3.6, mirroring the functionality found in LM Studio.
Enhancing the llama.cpp Web UI Experience
Users of the llama-serve web chat interface have previously noted a lack of native controls to toggle the visibility of internal reasoning (the "thought" process) produced by advanced reasoning models. To address this, a new solution has been proposed that introduces a dedicated toggle button to manage the display of these reasoning blocks without requiring modifications to the backend source code.
Client-Side Injection via Tampermonkey
Rather than requiring developers to manually patch and recompile llama.cpp daily, this functionality is implemented as a JavaScript snippet designed for Tampermonkey. By utilizing this browser extension, the functionality is injected directly into the web page's DOM at runtime.
Key Technical Advantages:
- Non-Invasive Integration: No need to modify the underlying C++ source code or rebuild the binary.
- Persistence: The toggle remains active across sessions via the browser extension, regardless of server updates.
- UI Parity: Brings the
llama-serveexperience closer to feature-rich local LLM runners like LM Studio.
Implementation Details
The script targets the web chat interface to provide a user-friendly switch that controls the visibility of reasoning tokens. This is particularly useful for models such as Qwen 3.6, where the distinction between the internal chain-of-thought and the final output is critical for user experience and readability.
Note: The provided source material is a brief announcement; specific code implementation details and the full script contents were not included in the original post.
Original Source