AplusN: Progressively Integrating Attention and Normalization in Wavelet Domain for Pose Transfer

Yu, Wei, Wang, Rui, Yang, Weizhi, Hu, Wenjian, and Xiang, Wei (2025) AplusN: Progressively Integrating Attention and Normalization in Wavelet Domain for Pose Transfer. IEEE Transactions on Multimedia, 27. pp. 4467-4479.

PDF (Published Version) - Published Version
Restricted to Repository staff only

DOI: 10.1109/TMM.2025.3535296

View at Publisher Website: https://doi.org/10.1109/TMM.2025.3535296

Abstract

Pose-guided person image generation aims to synthesize images of human in various poses, often encountering issues such as occlusions and texture transfers. Previous methods have utilized attention mechanisms, flow field, normalization techniques, and diffusion model. Among them, flow field and attention are the two most commonly used methods. Flow fields are good at preserving detailed textures, while attention is better at generating reasonable semantic structures. Previous networks often used only one of the two and failed to make full use of their advantages. At the same time, the flow field and attention also showed complementary functions in the frequency domain. The flow field was good at preserving the high frequency information of the image with the detailed texture, while the semantic structure of attention was good at generating the image with the low frequency information, and few networks used this to improve the generation effect. Based on these facts, this paper introduces the AplusN network, which innovatively addresses the image generation problem by processing from low to high frequencies. For low-frequency information, a conditional large-kernel convolutional attention mechanism (CLA) is employed to capture the global information of the human body. High-frequency information is refined using a spatial-channel normalization module (SCN) to enhance the body's detailed textures. Additionally, we propose a wavelet loss function to align the frequency domain information of the generated images with the target images. Both qualitative and quantitative experiments demonstrate the superiority of our method over state-of-the-art (SOTA) methods, yielding better-defined overall body contours, local details, and higher-quality image generation.


Item ID:	88462
Item Type:	Article (Research - C1)
ISSN:	1941-0077
Keywords:	Conditional adversarial generative network, conditional large-kernel convolutional attention mechanism, pose-guided person image generation, spatial-channel normalization module, wavelet loss function
Copyright Information:	© 2025 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission.
Date Deposited:	21 Apr 2026 07:04
FoR Codes:	46 INFORMATION AND COMPUTING SCIENCES > 4603 Computer vision and multimedia computation > 460303 Computational imaging @ 100%
SEO Codes:	22 INFORMATION AND COMMUNICATION SERVICES > 2204 Information systems, technologies and services > 220403 Artificial intelligence @ 100%
	More Statistics

Actions (Repository Staff Only)

Item Control Page