Self-Supervised Representation Learning Framework for Remote Physiological Measurement Using Spatiotemporal Augmentation Loss

Wang, Hao, Ahn, Euijoon, and Kim, Jinman (2022) Self-Supervised Representation Learning Framework for Remote Physiological Measurement Using Spatiotemporal Augmentation Loss. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence (36) From: AAAI-22: 36th AAAI Conference on Artificial Intelligence, 22 February - 1 March 2022, Virtual.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website:


Recent advances in supervised deep learning methods are enabling remote measurements of photoplethysmography-based physiological signals using facial videos. The performance of these supervised methods, however, are dependent on the availability of large labelled data. Contrastive learning as a self-supervised method has recently achieved state-of-the-art performances in learning representative data features by maximising mutual information between different augmented views. However, existing data augmentation techniques for contrastive learning are not designed to learn physiological signals from videos and often fail when there are complicated noise and subtle and periodic colour/shape variations between video frames. To address these problems, we present a novel self-supervised spatiotemporal learning framework for remote physiological signal representation learning, where there is a lack of labelled training data. Firstly, we propose a landmark-based spatial augmentation that splits the face into several informative parts based on the Shafer’s dichromatic reflection model to characterise subtle skin colour fluctuations. We also formulate a sparsity-based temporal augmentation exploiting Nyquist–Shannon sampling theorem to effectively capture periodic temporal changes by modelling physiological signal features. Furthermore, we introduce a constrained spatiotemporal loss which generates pseudo-labels for augmented video clips. It is used to regulate the training process and handle complicated noise. We evaluated our framework on 3 public datasets and demonstrated superior performances than other self-supervised methods and achieved competitive accuracy compared to the state-of-the-art supervised methods. Code is available at

Item ID: 75744
Item Type: Conference Item (Research - E1)
Copyright Information: © 2022, Association for the Advancement of Artificial Intelligence ( All rights reserved.
Date Deposited: 15 Sep 2022 01:36
FoR Codes: 46 INFORMATION AND COMPUTING SCIENCES > 4603 Computer vision and multimedia computation > 460304 Computer vision @ 50%
46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460299 Artificial intelligence not elsewhere classified @ 10%
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461103 Deep learning @ 40%
SEO Codes: 28 EXPANDING KNOWLEDGE > 2801 Expanding knowledge > 280115 Expanding knowledge in the information and computing sciences @ 80%
20 HEALTH > 2002 Evaluation of health and support services > 200208 Telehealth @ 20%
Downloads: Total: 4
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page