MS-Swift: A Unified Framework for Scaling PEFT and Full-Parameter Tuning Across 900+ LLMs and MLLMs
MS-Swift provides a comprehensive toolkit for the Continuous Pre-Training (CPT), Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO) of over 600 Large Language Models and 300 Multimodal Large Language Models.
Overview of MS-Swift
The ms-swift repository by ModelScope introduces a high-performance framework designed to streamline the adaptation of state-of-the-art foundation models. By supporting both Parameter-Efficient Fine-Tuning (PEFT) and full-parameter updates, the framework enables developers and researchers to optimize models across various training paradigms, ranging from initial domain adaptation to complex alignment phases.
Extensive Model Compatibility
The framework boasts an expansive ecosystem of supported architectures, ensuring compatibility with the latest iterations of industry-leading models. This allows for seamless experimentation across different model families without the need for extensive boilerplate code.
Supported Large Language Models (LLMs)
MS-Swift supports over 600 LLMs, including cutting-edge releases such as:
- Qwen series: Including Qwen3.6
- DeepSeek series: Including DeepSeek-V4
- GLM series: Including GLM-5.1
- InternLM series: Including InternLM3
- Llama series: Including Llama4
Supported Multimodal Large Language Models (MLLMs)
The framework extends its capabilities to over 300 MLLMs, facilitating the training of vision-language and omni-modal systems, including:
- Qwen-VL/Omni: Qwen3-VL and Qwen3-Omni
- InternVL: InternVL3.5
- Other Advanced Architectures: Ovis2.5, GLM4.5v, Gemma4, Llava, and Phi4
Advanced Training Methodologies
MS-Swift integrates several critical optimization and alignment techniques to enhance model performance and safety:
- CPT (Continuous Pre-Training): For domain-specific knowledge injection.
- SFT (Supervised Fine-Tuning): For instruction following and task-specific adaptation.
- DPO (Direct Preference Optimization): For aligning model outputs with human preferences.
- GRPO (Group Relative Policy Optimization): An advanced reinforcement learning approach for improved policy optimization.
The framework's methodology and contributions have been recognized in the context of AAAI 2025, underscoring its academic and technical validity in the field of artificial intelligence.