JetBrains Launches Mellum2: A High-Performance 12B Mixture-of-Experts Model
JetBrains has expanded its AI portfolio with the release of Mellum2, a 12-billion parameter Mixture-of-Experts (MoE) model, now available for the community via the Hugging Face Hub.
Overview of Mellum2
JetBrains, a leader in the development of professional IDEs and developer tools, has introduced Mellum2, a specialized large language model designed to leverage the Mixture-of-Experts (MoE) architecture. With a total parameter count of 12 billion, the model aims to provide a balance between computational efficiency and high-level performance, making sophisticated AI capabilities more accessible to the broader developer and research community.
Architectural Significance: Mixture-of-Experts (MoE)
The adoption of the MoE architecture in Mellum2 allows the model to activate only a subset of its parameters during inference. This approach typically results in faster processing speeds and lower computational overhead compared to dense models of similar total parameter counts, while maintaining the capacity to handle complex tasks by utilizing specialized "expert" layers within the network.
Availability and Integration
To foster open innovation and transparency, JetBrains has hosted Mellum2 on the Hugging Face Hub. This allows AI engineers and researchers to integrate the model into their own workflows, conduct fine-tuning, or deploy it within specialized development environments.
Note: The provided source material contains limited technical specifications regarding the training dataset, specific benchmark scores, or the exact number of active parameters per token.
Original Source