Meta Llama 4: Open-Weight Multimodal MoE Changes the Cost Equation

Meta has introduced Llama 4, featuring a Mixture-of-Experts (MoE) architecture and multimodal capabilities, released in two open-weight variants—Scout and Maverick—to optimize the balance between inference cost and model capability.

Architectural Shift: Multimodal Mixture-of-Experts

With the release of Llama 4, Meta continues its commitment to open-weight models, introducing a multimodal framework powered by a Mixture-of-Experts (MoE) architecture. This approach allows the models to activate only a subset of parameters per token, significantly reducing the computational overhead while maintaining high-level performance across diverse tasks.

Model Variants: Scout and Maverick

To address the varying needs of developers and enterprises, Meta has shipped two distinct variants designed to occupy different positions on the cost-capability curve:

Llama 4 Scout

Llama 4 Scout is engineered for efficiency and high-volume workloads. It features a lower memory footprint and accelerated inference speeds, making it the ideal choice for the "long tail" of requests where high throughput and low latency are prioritized over maximum parameter depth.

Llama 4 Maverick

While the provided data focuses primarily on the efficiency of the Scout variant, the Maverick model serves as the higher-capability counterpart, designed for tasks requiring deeper reasoning and more complex multimodal processing.

Note: Due to the truncated nature of the source material, specific technical specifications regarding parameter counts, training data, and detailed performance benchmarks for the Maverick variant were not available.

Original Source
Large Language Models Mixture-of-Experts (MoE) Multimodal AI Open-Weight Models Meta AI