GVT2RPM: An empirical study for general video transformer adaptation to remote physiological measurement
Wang, Hao, Ahn, Euijoon, Joseph, Andrew, Pathan, Faraz, Negishi, Kazuaki, and Kim, Jinman (2026) GVT2RPM: An empirical study for general video transformer adaptation to remote physiological measurement. Biomedical Signal Processing and Control, 113 (Part B). 108886.
|
PDF (Published Version)
- Published Version
Available under License Creative Commons Attribution. Download (3MB) | Preview |
Abstract
Remote physiological measurement (RPM) is an essential tool for healthcare monitoring as it enables the measurement of physiological signs, e.g., heart rate, in a remote setting, via physical wearables. Recent advancements in facial video-based RPMs have leveraged video analysis to detect photoplethysmographic (PPG) changes by learning pixel variations across frames. Transformer architectures, known for their success in natural video understanding, have also been applied to facial video-based RPM. However, existing transformer-based RPM methods often rely on RPM-specific modules, such as temporal difference convolutions and handcrafted feature maps, to capture subtle physiological signals and enhance temporal feature extraction. While these customized modules can improve performance, they lack robustness across datasets and cannot be generalized to different transformer architectures due to their high degree of customization. In this study, we demonstrate that general video transformers (GVTs) can achieve state-of-the-art performance for RPM without the need of RPM-specific modules. This approach simplifies the design process and facilitates the rapid deployment of various GVT architectures for RPM tasks. We conducted an empirical investigation into how training designs, including data preprocessing and network configurations, influence the performance of GVTs in facial video-based RPM. Furthermore, we propose practical guidelines to adapt GVTs to RPM (GVT2RPM) without the need for RPM-specific modules. Our experiments, conducted on five datasets using both intra-dataset (training and testing on the same dataset) and cross-dataset (training and testing on different datasets) settings, demonstrate that the proposed GVT2RPM guidelines outperform existing RPM-specific counterparts in most of cases. In intra-dataset experiments, it reduced mean absolute error by 5.0% (UBFC-rPPG), 35.6% (MMPD-simple), and 38.2% (MMPD). In cross-dataset experiments, it achieved reductions of 4.3% (UBFC-Phys), 13.2% (MMPD-simple), 9.5% (MMPD), and 13.4% (RLAP). The results demonstrate that our guidelines can be applied across various GVT architectures and are robust to diverse datasets, making them a promising solution for advancing RPM methodologies.
| Item ID: | 89382 |
|---|---|
| Item Type: | Article (Research - C1) |
| ISSN: | 1746-8108 |
| Copyright Information: | © 2025 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
| Date Deposited: | 04 Nov 2025 01:03 |
| FoR Codes: | 46 INFORMATION AND COMPUTING SCIENCES > 4603 Computer vision and multimedia computation > 460304 Computer vision @ 30% 46 INFORMATION AND COMPUTING SCIENCES > 4603 Computer vision and multimedia computation > 460308 Pattern recognition @ 30% 46 INFORMATION AND COMPUTING SCIENCES > 4603 Computer vision and multimedia computation > 460309 Video processing @ 40% |
| SEO Codes: | 20 HEALTH > 2002 Evaluation of health and support services > 200206 Health system performance (incl. effectiveness of programs) @ 50% 22 INFORMATION AND COMMUNICATION SERVICES > 2204 Information systems, technologies and services > 220403 Artificial intelligence @ 50% |
| Downloads: |
Total: 1 Last 12 Months: 1 |
| More Statistics |
