Implementing a Reasoning Toggle for llama.cpp Web Chat via Tampermonkey

A community-developed userscript now enables a "Think" toggle button for the llama.cpp web interface, allowing users to show or hide reasoning chains for models like Qwen 2.5/3.6, mirroring the functionality found in LM Studio.

Enhancing the llama.cpp Web UI Experience

Users of the llama-serve web chat interface have previously noted a lack of native controls to toggle the visibility of internal reasoning (the "thought" process) produced by advanced reasoning models. To address this, a new solution has been proposed that introduces a dedicated toggle button to manage the display of these reasoning blocks without requiring modifications to the backend source code.

Client-Side Injection via Tampermonkey

Rather than requiring developers to manually patch and recompile llama.cpp daily, this functionality is implemented as a JavaScript snippet designed for Tampermonkey. By utilizing this browser extension, the functionality is injected directly into the web page's DOM at runtime.

Key Technical Advantages:

Non-Invasive Integration: No need to modify the underlying C++ source code or rebuild the binary.
Persistence: The toggle remains active across sessions via the browser extension, regardless of server updates.
UI Parity: Brings the llama-serve experience closer to feature-rich local LLM runners like LM Studio.

Implementation Details

The script targets the web chat interface to provide a user-friendly switch that controls the visibility of reasoning tokens. This is particularly useful for models such as Qwen 3.6, where the distinction between the internal chain-of-thought and the final output is critical for user experience and readability.

Note: The provided source material is a brief announcement; specific code implementation details and the full script contents were not included in the original post.

Original Source

llama.cpp Qwen Frontend Development Tampermonkey Local LLM UI/UX

Techyon - AI News Aggregator

<Think> toggle button for llama.cp web chat for QWEN3.6

Implementing a Reasoning Toggle for llama.cpp Web Chat via Tampermonkey

Enhancing the llama.cpp Web UI Experience

Client-Side Injection via Tampermonkey

Key Technical Advantages:

Implementation Details

<Think> toggle button for llama.cp web chat for QWEN3.6

Implementing a Reasoning Toggle for llama.cpp Web Chat via Tampermonkey

Enhancing the llama.cpp Web UI Experience

Client-Side Injection via Tampermonkey

Key Technical Advantages:

Implementation Details

Related Articles

DeepSWE benchmarks indicate that DeepSeek v4 Pro only passes 8% of tasks

Evaluation & Monitoring Frameworks for Retrieval Systems

jamwithai /production-agentic-rag-course

nesquena /hermes-webui

Built a DIY Local 2x DGX Spark cluster cooler with automatic temperature controlled fan.