Sparse Canonical Correlation Analyses of Multimodal Omics Data

Kejun He; Xiaoning Qian; Jianhua Huang; Sharon M. Donovan; Robert S. Chapkin; Ivan Ivanov*

doi:10.11145/cb.v3i1.675

Sparse Canonical Correlation Analyses of Multimodal Omics Data

Authors

Kejun He
Xiaoning Qian
Jianhua Huang
Sharon M. Donovan
Robert S. Chapkin
Ivan Ivanov* Texas A&M University

DOI:

https://doi.org/10.11145/cb.v3i1.675

Abstract

There have been an increasing number of applications of sparse Canonical Correlation Analysis (sCCA) to genomic data during the past several years. Most of the research in this area has focused on the relationships between gene expression levels and phenotype variations. However, as multimodal omics data becomes available there is a need to integrate these data modalities into a framework that allows for simultaneous data analyses, thereby providing novel insight for various fields in the life sciences. The pioneering work of Schwartz et al. (2012) used the classical Canonical Correlation Analysis (CCA) to provide an integrative approach to the analysis of host gene expression and microbiota composition data from neonates with different feeding types. Although promising, the proposed approach has serious deficiencies. First, the statistical interpretation is problematic because the involved two-stage analysis makes the results sensitive to the variations of data and the original interpretation of CCA is lost. Second, the associated computational cost is tremendous, O(n³) where n is the number of variables involved in the analysis. Thus, we developed a methodology based on the sCCA to overcome these problems. The performance of our approach is compared to that of Schwartz et al. (2012) and to the sparse Principal Component Analysis (sPCA) on a large synthetic data set with the subsequent application to a multimodal omics data (gene expression, microbiota composition, and metabolites) from neonates with two different feeding types.

Downloads

Published

2016-03-31

Issue

Vol. 3 No. 1 (2016)

Section

Conference Contributions

License

The journal Biomath Communications is an open access journal. All published articles are immeditely available online and the respective DOI link activated. All articles can be access for free and no reader registration of any sort is required. No fees are charged to authors for article submission or processing. Online publications are funded through volunteer work, donations and grants.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License 4.0 that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Sparse Canonical Correlation Analyses of Multimodal Omics Data

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

License

Cover

ISSN

Index

Journals