LlamaStash 0.0.2: Zero‑Overhead Terminal Launcher for Llama.cpp
LlamaStash 0.0.2 introduces a lightweight terminal UI and CLI to manage llama.cpp workloads, offering an OpenAI‑compatible proxy while preserving raw llama‑server performance across CUDA, ROCm‑HIP, Metal, Vulkan, and CPU backends.
Background and Motivation
LocalLLM developer deepu105 released LlamaStash after repeatedly scripting a wrapper around llama-server on an AMD Strix Halo GPU. Existing solutions such as Ollama and LM Studio abstract too much, incurring performance overhead and limiting flexibility. LlamaStash aims to fill the gap between raw llama-server and high‑level GUIs.
Key Features of LlamaStash 0.0.2
Unified Hardware Detection
On first launch, llamastash init runs a wizard that automatically detects the available compute APIs: CUDA, ROCm‑HIP, Metal, Vulkan, or CPU. It then configures the appropriate llama‑server binary and runtime flags.
Terminal User Interface (TUI)
The TUI provides real‑time status, model loading progress, and a simple command palette. It eliminates the need to remember command‑line options while maintaining full control over the underlying server.
Command‑Line Interface (CLI)
For scripting and automation, the CLI exposes the same functionality as the TUI. Commands such as llamastash start, llamastash stop, and llamastash status are available, enabling integration into CI/CD pipelines and custom workflows.
OpenAI‑Compatible Proxy
LlamaStash includes a lightweight proxy that implements the OpenAI API surface. Clients can send standard /v1/chat/completions requests to a local endpoint, which forwards them to the underlying llama‑server without adding measurable latency.
Cross‑Platform Support
Compiled binaries are provided for Linux, macOS, and Windows, ensuring that developers on any major desktop OS can deploy LlamaStash without manual builds.
Installation and Usage
Download the latest 0.0.2 release from the GitHub repository (link not provided in the source). Run:
./llamastash init
./llamastash start
Optionally, specify a custom model path: llamastash init --model /path/to/model.gguf. The proxy listens on localhost:8080 by default.
Limitations and Future Work
The release notes do not detail bug reports, performance benchmarks, or extensive configuration options. Users may need to refer to the repository README or issue tracker for deeper customization.
Original Source