Prediction and validation of protein–protein interactors from genome-wide DNA-binding data using a knowledge-based machine-learning approach

Waardenberg, Ashley J., Homan, Bernou, Mohamed, Stephanie, Harvey, Richard P., and Bouveret, Romaric (2016) Prediction and validation of protein–protein interactors from genome-wide DNA-binding data using a knowledge-based machine-learning approach. Open Biology, 6. 160183.

PDF (Published Version) - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview
View at Publisher Website:


The ability to accurately predict the DNA targets and interacting cofactors of transcriptional regulators from genome-wide data can significantly advance our understanding of gene regulatory networks. NKX2-5 is a homeodomain transcription factor that sits high in the cardiac gene regulatory network and is essential for normal heart development. We previously identified genomic targets for NKX2-5 in mouse HL-1 atrial cardiomyocytes using DNA-adenine methyltransferase identification (DamID). Here, we apply machine learning algorithms and propose a knowledge-based feature selection method for predicting NKX2-5 protein: protein interactions based on motif grammar in genome-wide DNA-binding data. We assessed model performance using leave-one-out cross-validation and a completely independent DamID experiment performed with replicates. In addition to identifying previously described NKX2-5-interacting proteins, including GATA, HAND and TBX family members, a number of novel interactors were identified, with direct protein: protein interactions between NKX2-5 and retinoid X receptor (RXR), paired-related homeobox (PRRX) and Ikaros zinc fingers (IKZF) validated using the yeast two-hybrid assay. We also found that the interaction of RXRα with NKX2-5 mutations found in congenital heart disease (Q187H, R189G and R190H) was altered. These findings highlight an intuitive approach to accessing protein–protein interaction information of transcription factors in DNA-binding experiments.

Item ID: 55656
Item Type: Article (Research - C1)
ISSN: 2046-2441
Keywords: machine learning, protein–protein interactions, transcription factors, gene regulatory networks
Copyright Information: Copyright © 2016 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License, which permits unrestricted use, provided the original author and source are credited.
Funders: National Health and Medical Research Council (NHMRC), Australian Research Council (ARC), University of New South Wales (UNSW)
Projects and Grants: NHMRC 573703, NHMRC 1061539, NHMRC 573705, ARC DP0988507
Date Deposited: 28 Sep 2018 00:29
FoR Codes: 31 BIOLOGICAL SCIENCES > 3102 Bioinformatics and computational biology > 310202 Biological network analysis @ 100%
SEO Codes: 97 EXPANDING KNOWLEDGE > 970106 Expanding Knowledge in the Biological Sciences @ 100%
Downloads: Total: 101
Last 12 Months: 5
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page