Diffusion Preference Alignment via Attenuated Kullback–Leibler Regularization

Zhang, Xinjian, and Xiang, Wei (2025) Diffusion Preference Alignment via Attenuated Kullback–Leibler Regularization. Electronics, 14 (15). 2939.

[img]
Preview
PDF (Published Version) - Published Version
Available under License Creative Commons Attribution.

Download (3MB) | Preview
View at Publisher Website: https://doi.org/10.3390/electronics14152...


Abstract

Direct preference optimization (DPO) has been successfully applied to align large language models (LLMs) with human preferences. In recent years, DPO has also been used to improve the generation quality of text-to-image diffusion models. However, existing techniques often rely on a single type of reward model. They are also prone to overfitting to inaccurate reward signals. As a result, model quality cannot be continuously improved. To address these limitations, we propose xDPO. This method introduces a novel regularization approach that implicitly defines reward functions for both preferred and non-preferred samples. This design greatly enhances the flexibility of reward modeling. The experimental results show that, after fine-tuning Stable Diffusion v1.5, xDPO achieves significant improvements in human preference evaluations compared to previous DPO methods. It also improves training efficiency by approximately 1.5 times. Meanwhile, xDPO maintains image–text alignment performance that is comparable to the original model.

Item ID: 87849
Item Type: Article (Research - C1)
ISSN: 2079-9292
Keywords: direct preference optimization (DPO), machine learning, preference alignment, text-to-image diffusion model
Copyright Information: © 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Date Deposited: 02 Mar 2026 05:26
FoR Codes: 46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461104 Neural networks @ 100%
SEO Codes: 22 INFORMATION AND COMMUNICATION SERVICES > 2204 Information systems, technologies and services > 220403 Artificial intelligence @ 100%
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page