llama.cpp Integrates Architecture Support for Cohere2-MoE: Enabling North-Mini-Code 1.0

The llama.cpp ecosystem has expanded its architectural support to include the Cohere2-MoE framework, facilitating the local deployment of Cohere's North-Mini-Code 1.0, a specialized Mixture-of-Experts model optimized for software engineering and agentic workflows.

Architectural Expansion in llama.cpp

Recent updates to the llama.cpp repository (Pull Request #24260) introduce critical architecture support for Cohere2-MoE. This integration allows users to run Cohere's latest research releases locally using the GGML format, leveraging the efficiency of the llama.cpp inference engine for quantized execution on consumer hardware.

Introducing North-Mini-Code 1.0

The primary beneficiary of this update is the North-Mini-Code 1.0 model. Developed by Cohere and Cohere Labs, this model is an open-weights research release designed specifically for high-performance technical tasks.

Technical Specifications

Parameter Count: 30B total parameters, with 3B active parameters per token (A3B), utilizing a Mixture-of-Experts (MoE) architecture.
Primary Optimizations: The model is fine-tuned for code generation, agentic software engineering, and terminal-based tasks.
Licensing: Distributed under the Apache 2.0 license, permitting broad research and commercial application.

Deployment and Implementation

To utilize this new support, users are advised to recompile their llama.cpp binaries to incorporate the latest architectural changes. The model is available in both original weights and GGUF format for immediate deployment:

Original Weights: Available via CohereLabs on Hugging Face.
Quantized GGUF: Optimized versions provided by Unsloth for reduced memory footprints.

Note: This article is based on a community announcement; detailed benchmark performance and specific quantization metrics for the North-Mini-Code 1.0 model were not provided in the source material.

Original Source

llama.cpp Cohere2-MoE Mixture-of-Experts North-Mini-Code GGUF Local LLM

Techyon

Add arch support for cohere2-MoE by michaelw9999 · Pull Request #24260 · ggml-org/llama.cpp

llama.cpp Integrates Architecture Support for Cohere2-MoE: Enabling North-Mini-Code 1.0

Architectural Expansion in llama.cpp

Introducing North-Mini-Code 1.0

Technical Specifications

Deployment and Implementation

Add arch support for cohere2-MoE by michaelw9999 · Pull Request #24260 · ggml-org/llama.cpp

llama.cpp Integrates Architecture Support for Cohere2-MoE: Enabling North-Mini-Code 1.0

Architectural Expansion in llama.cpp

Introducing North-Mini-Code 1.0

Technical Specifications

Deployment and Implementation

Related Articles

Made a macOS app that creates highly personal macOS apps. Works with models as small as Gemma 4 E2B

Claude Opus 4.8 vs Claude Fable 5 — Anthropic’s Biggest AI Shift Yet

Natfii /UnrealClaude

Did Anthropic ask for this?

ClinHallu: A Benchmark for Diagnosing Stage-Wise Hallucinations in Medical MLLM Reasoning