LlamaStash 0.0.2: Zero‑Overhead Terminal Launcher for Llama.cpp

LlamaStash 0.0.2 introduces a lightweight terminal UI and CLI to manage llama.cpp workloads, offering an OpenAI‑compatible proxy while preserving raw llama‑server performance across CUDA, ROCm‑HIP, Metal, Vulkan, and CPU backends.

Background and Motivation

LocalLLM developer deepu105 released LlamaStash after repeatedly scripting a wrapper around llama-server on an AMD Strix Halo GPU. Existing solutions such as Ollama and LM Studio abstract too much, incurring performance overhead and limiting flexibility. LlamaStash aims to fill the gap between raw llama-server and high‑level GUIs.

Key Features of LlamaStash 0.0.2

Unified Hardware Detection

On first launch, llamastash init runs a wizard that automatically detects the available compute APIs: CUDA, ROCm‑HIP, Metal, Vulkan, or CPU. It then configures the appropriate llama‑server binary and runtime flags.

Terminal User Interface (TUI)

The TUI provides real‑time status, model loading progress, and a simple command palette. It eliminates the need to remember command‑line options while maintaining full control over the underlying server.

Command‑Line Interface (CLI)

For scripting and automation, the CLI exposes the same functionality as the TUI. Commands such as llamastash start, llamastash stop, and llamastash status are available, enabling integration into CI/CD pipelines and custom workflows.

OpenAI‑Compatible Proxy

LlamaStash includes a lightweight proxy that implements the OpenAI API surface. Clients can send standard /v1/chat/completions requests to a local endpoint, which forwards them to the underlying llama‑server without adding measurable latency.

Cross‑Platform Support

Compiled binaries are provided for Linux, macOS, and Windows, ensuring that developers on any major desktop OS can deploy LlamaStash without manual builds.

Installation and Usage

Download the latest 0.0.2 release from the GitHub repository (link not provided in the source). Run:

./llamastash init
./llamastash start

Optionally, specify a custom model path: llamastash init --model /path/to/model.gguf. The proxy listens on localhost:8080 by default.

Limitations and Future Work

The release notes do not detail bug reports, performance benchmarks, or extensive configuration options. Users may need to refer to the repository README or issue tracker for deeper customization.

llama.cpp OpenAI API CUDA ROCm Metal Vulkan CLI TUI Local LLM
Original Source