The paper investigates how representation alignment accelerates diffusion transformer training and enhances generation quality. It compares two self‑alignment approaches—SRA and the newer Self‑Flow—that eliminate the need for external pretrained encoders by embedding alignment directly within the diffusion model. The authors note that Self‑Flow’s performance gains, attributed to dual‑time scheduling and cross‑token interactions, are still not fully understood and warrant further study.
→ View original source
huggingface/daily-papers