DanceOPD: Advancing Unified Image Generation via On-Policy Generative Field Distillation

Researchers introduce DanceOPD, a novel framework designed to unify text-to-image (T2I) generation, local editing, and global editing within a single model by mitigating the inherent conflicts between these diverse generative capabilities.

The Challenge of Unified Generative Capabilities

Modern image generation systems strive for versatility, aiming to integrate multiple functionalities—specifically text-to-image (T2I) synthesis, local image editing, and global image editing—into a single architecture. However, achieving this unification is technically challenging because these capabilities are often misaligned. In practice, optimizing a model for editing tasks frequently leads to a degradation in T2I performance, and a conflict typically arises between the requirements for global and local editing, where improvements in one often interfere with the efficacy of the other.

Introducing DanceOPD

To address these conflicts, the authors propose DanceOPD, a framework based on On-Policy Generative Field Distillation. The core objective of DanceOPD is to effectively compose these disparate capabilities, ensuring that the model can transition between synthesis and editing tasks without sacrificing quality or introducing performance trade-offs.

Key Technical Objectives

  • Capability Alignment: Harmonizing the latent spaces and generation processes for T2I and editing.
  • Conflict Mitigation: Reducing the negative interference between global and local editing mechanisms.
  • Generative Field Distillation: Utilizing an on-policy distillation approach to refine the model's generative fields, ensuring stability across diverse operational modes.

Note: Due to the limited nature of the provided source text, specific architectural details, experimental results, and the precise mathematical formulation of the "On-Policy Generative Field Distillation" are not available.

Original Source
Generative AI Image Synthesis Knowledge Distillation Text-to-Image Image Editing