We need to produce HTML summary:

...

then Read original We must summarize the news: The user patched llamacpp to run DeepSeek V4 Flash with full 1M token context on RTX 5090, addressing VRAM issues, upstream PR #24231, wired CUDA kernel. We must be concise 2-4 sentences. Use precise technical language. No invented info. We'll output:

A patch to llama.cpp enables running DeepSeek V4 Flash locally with the full 1M‑token context on an RTX 5090, overcoming the ~251 VRAM demand per layer that previously required ~256 GB total. The fix integrates an upstream DSA lightning indexer PR (#24231) and adds a CUDA kernel for the indexer, allowing the model to fit within the GPU’s memory.

But need to be accurate: The description says they found absurd VRAM (~256GB at 1M). They patched llamacpp to support DSA lightning indexer with CUDA path. So we can say they reduced VRAM requirement to fit on RTX 5090. We must not invent numbers not given. The description says "absurd amounts of VRAM at higher context lengths