Implementing a Reasoning Toggle for llama.cpp Web Chat via Tampermonkey

A community-developed userscript now enables a "Think" toggle button for the llama.cpp web interface, allowing users to show or hide reasoning chains for models like Qwen 2.5/3.6, mirroring the functionality found in LM Studio.

Enhancing the llama.cpp Web UI Experience

Users of the llama-serve web chat interface have previously noted a lack of native controls to toggle the visibility of internal reasoning (the "thought" process) produced by advanced reasoning models. To address this, a new solution has been proposed that introduces a dedicated toggle button to manage the display of these reasoning blocks without requiring modifications to the backend source code.

Client-Side Injection via Tampermonkey

Rather than requiring developers to manually patch and recompile llama.cpp daily, this functionality is implemented as a JavaScript snippet designed for Tampermonkey. By utilizing this browser extension, the functionality is injected directly into the web page's DOM at runtime.

Key Technical Advantages:

  • Non-Invasive Integration: No need to modify the underlying C++ source code or rebuild the binary.
  • Persistence: The toggle remains active across sessions via the browser extension, regardless of server updates.
  • UI Parity: Brings the llama-serve experience closer to feature-rich local LLM runners like LM Studio.

Implementation Details

The script targets the web chat interface to provide a user-friendly switch that controls the visibility of reasoning tokens. This is particularly useful for models such as Qwen 3.6, where the distinction between the internal chain-of-thought and the final output is critical for user experience and readability.

Note: The provided source material is a brief announcement; specific code implementation details and the full script contents were not included in the original post.

Original Source
llama.cpp Qwen Frontend Development Tampermonkey Local LLM UI/UX