Synthetic data for reef modelling

Crocker, Rose, Robson, Barbara, Ani, Chinenye, Anthony, Ken, and Iwanaga, Takuya (2024) Synthetic data for reef modelling. Ecological Informatics, 82. 102698.

[img]
Preview
PDF (Published Version) - Published Version
Available under License Creative Commons Attribution.

Download (4MB) | Preview
View at Publisher Website: https://doi.org/10.1016/j.ecoinf.2024.10...
 
1


Abstract

Synthetic data mimics the statistical properties of real-world datasets while removing reference to sensitive or confidential information in the original dataset (Quintana, 2020). Synthetic data is also useful for general model testing and development, with many methods available for generating data from machine learning models (Raghunathan, 2021). Although not widely used in the context of ecological and environmental modelling, synthetic data can support and accelerate model testing and analyses where rightsholders are sensitive to data disclosure for study areas, or data collection is expensive.

In the context of reef modelling, synthetic data can be used to support model analyses that can be published without referring to specific sites, reefs, or study areas. This is desirable in the context of decision support for restoration of the Great Barrier Reef. The Reef has many stakeholders and release of early modelling results for intervention scenarios for specific areas would be premature until management or intervention strategy options have been discussed with stakeholders and/or rightsholders. Synthetic data allows a path to publish model and method demonstrations to share knowledge with the reef decision support community without prematurely suggesting policy recommendations for reefs which are sensitive to rightsholders or stakeholders.

We showcase a synthetic data pipeline developed for the reef decision-support system ADRIA (Adaptive Dynamic Reef Intervention Algorithms), using methods from the Python package Synthetic Data Vault (Patki et al., 2016) and others. The synthetic data models are developed to emulate the statistics of case-study reefs for publishing decision-support tool demonstrations, testing and method validation without revealing sensitive reef site information. This pipeline includes developing models for tabular (benthic/compositional reef data), spatial-temporal (wave and heat stress data) and spatial network data (coral larval connectivity). Conditional sampling methods which connect spatial relationships across datasets are used to develop synthetic reef data packages which mimic the statistical properties of the original dataset. The utility of the synthetic data is demonstrated on a sample reef data package, and methods used for anonymizing the data are detailed. The results are discussed in the context of formalizing synthetic data for reef modelling. All synthetic data code is available at ADRIA-synthetic-data/README.md at v0.1.0 - open-AIMS/ADRIA-synthetic-data (github.com), DOI: https://doi.org/10.5281/zenodo.10158323.

Item ID: 84036
Item Type: Article (Research - C1)
ISSN: 1878-0512
Copyright Information: © 2024 Published by Elsevier B.V. This is an open access article under the CC BY license http://creativecommons.org/licenses/by/4.0/
Funders: Reef Restoration and Adaptation Program (RRAP), Australian Institute of Marine Science
Date Deposited: 12 Nov 2024 22:53
FoR Codes: 41 ENVIRONMENTAL SCIENCES > 4102 Ecological applications > 410299 Ecological applications not elsewhere classified @ 50%
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461199 Machine learning not elsewhere classified @ 50%
SEO Codes: 18 ENVIRONMENTAL MANAGEMENT > 1805 Marine systems and management > 180599 Marine systems and management not elsewhere classified @ 100%
Downloads: Total: 1
Last 12 Months: 1
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page