TY - JOUR
T1 - A deep contrastive multi-modal encoder for multi-omics data integration and analysis
AU - Yinghua, Ma
AU - Khan, Ahmad
AU - Heng, Yang
AU - Khan, Fiaz Gul
AU - Ali, Farman
AU - Al-Otaibi, Yasser D.
AU - Bashir, Ali Kashif
N1 - Publisher Copyright:
© 2025 Elsevier Inc.
PY - 2025/5
Y1 - 2025/5
N2 - Cancer is a highly complex and fatal disease that affects various human organs. Early and accurate cancer analysis is crucial for timely treatment, prognosis, and understanding of the disease's development. Recent research utilizes deep learning-based models to combine multi-omics data for tasks such as cancer classification, clustering, and survival prediction. However, these models often overlook interactions between different types of data, which leads to suboptimal performance. In this paper, we present a Contrastive Multi-Modal Encoder (CMME) that integrates and maps multi-omics data into a lower-dimensional latent space, enabling the model to better understand relationships between different data types. The challenging distribution and organization of the data into anchors, positive samples, and negative samples encourage the model to learn synergies among different modalities, pay attention to both strong and weak modalities, and avoid biased learning. The performance of the proposed model is evaluated on downstream tasks such as clustering, classification, and survival prediction. The CMME achieved an accuracy of 98.16% and an F1 score of 98.09% in classifying breast cancer subtypes. For clustering tasks across ten cancer types based on TCGA data, the adjusted Rand index reached 0.966. Additionally, survival analysis results highlighted significant differences in survival rates between different cancer subtypes. The comprehensive qualitative and quantitative results demonstrate that the proposed method outperforms existing methods.
AB - Cancer is a highly complex and fatal disease that affects various human organs. Early and accurate cancer analysis is crucial for timely treatment, prognosis, and understanding of the disease's development. Recent research utilizes deep learning-based models to combine multi-omics data for tasks such as cancer classification, clustering, and survival prediction. However, these models often overlook interactions between different types of data, which leads to suboptimal performance. In this paper, we present a Contrastive Multi-Modal Encoder (CMME) that integrates and maps multi-omics data into a lower-dimensional latent space, enabling the model to better understand relationships between different data types. The challenging distribution and organization of the data into anchors, positive samples, and negative samples encourage the model to learn synergies among different modalities, pay attention to both strong and weak modalities, and avoid biased learning. The performance of the proposed model is evaluated on downstream tasks such as clustering, classification, and survival prediction. The CMME achieved an accuracy of 98.16% and an F1 score of 98.09% in classifying breast cancer subtypes. For clustering tasks across ten cancer types based on TCGA data, the adjusted Rand index reached 0.966. Additionally, survival analysis results highlighted significant differences in survival rates between different cancer subtypes. The comprehensive qualitative and quantitative results demonstrate that the proposed method outperforms existing methods.
KW - Cancer analysis
KW - Cancer classification
KW - Clustering
KW - Contrastive learning
KW - Deep learning
KW - Dimensionality reduction
KW - Multi-omics data
KW - Survival analysis
UR - https://www.scopus.com/pages/publications/85214312778
U2 - 10.1016/j.ins.2024.121864
DO - 10.1016/j.ins.2024.121864
M3 - Article
AN - SCOPUS:85214312778
SN - 0020-0255
VL - 700
JO - Information Sciences
JF - Information Sciences
M1 - 121864
ER -