Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies

Aschard, Hugues, Vilhjálmsson , Bjarni J., Greliche, Nicolas, Morange, Pierre-Emmanuel, Trégouët, David-Alexandre, and Kraft, Peter (2014) Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. American Journal of Human Genetics, 94 (5). pp. 662-676.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website:


Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated with one or more of the traits. Principal component analysis (PCA) is a useful tool that has been widely used for the multivariate analysis of correlated variables. PCA is usually applied as a dimension reduction method: the few top principal components (PCs) explaining most of total trait variance are tested for association with a predictor of interest, and the remaining components are not analyzed. In this study we review the theoretical basis of PCA and describe the behavior of PCA when testing for association between a SNP and correlated traits. We then use simulation to compare the power of various PCA-based strategies when analyzing up to 100 correlated traits. We show that contrary to widespread practice, testing only the top PCs often has low power, whereas combining signal across all PCs can have greater power. This power gain is primarily due to increased power to detect genetic variants with opposite effects on positively correlated traits and variants that are exclusively associated with a single trait. Relative to other methods, the combined-PC approach has close to optimal power in all scenarios considered while offering more flexibility and more robustness to potential confounders. Finally, we apply the proposed PCA strategy to the genome-wide association study of five correlated coagulation traits where we identify two candidate SNPs that were not found by the standard approach.

Item ID: 41567
Item Type: Article (Research - C1)
ISSN: 1537-6605
Funders: Program Hospitalier de la Recherche Clinique, Région Ile-de-France, Pierre and Marie Curie University, ICAN Institute for Cardiometabolism and Nutrition
Projects and Grants: R03HG006720, R21CA165920, MARTHA, ANR-10-IAHU-05
Date Deposited: 12 Feb 2016 05:45
FoR Codes: 01 MATHEMATICAL SCIENCES > 0104 Statistics > 010402 Biostatistics @ 50%
06 BIOLOGICAL SCIENCES > 0604 Genetics > 060412 Quantitative Genetics (incl Disease and Trait Mapping Genetics) @ 50%
SEO Codes: 97 EXPANDING KNOWLEDGE > 970101 Expanding Knowledge in the Mathematical Sciences @ 50%
97 EXPANDING KNOWLEDGE > 970106 Expanding Knowledge in the Biological Sciences @ 50%
Downloads: Total: 2
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page