TY - JOUR
T1 - Computational prediction of phosphorylation sites of SARS-CoV-2 infection using feature fusion and optimization strategies
AU - Sabir, Mumdooh J.
AU - Kamli, Majid Rasool
AU - Atef, Ahmed
AU - Alhibshi, Alawiah M.
AU - Edris, Sherif
AU - Hajarah, Nahid H.
AU - Bahieldin, Ahmed
AU - Manavalan, Balachandran
AU - Sabir, Jamal S.M.
N1 - Publisher Copyright:
© 2024 Elsevier Inc.
PY - 2024/9
Y1 - 2024/9
N2 - SARS-CoV-2′s global spread has instigated a critical health and economic emergency, impacting countless individuals. Understanding the virus's phosphorylation sites is vital to unravel the molecular intricacies of the infection and subsequent changes in host cellular processes. Several computational methods have been proposed to identify phosphorylation sites, typically focusing on specific residue (S/T) or Y phosphorylation sites. Unfortunately, current predictive tools perform best on these specific residues and may not extend their efficacy to other residues, emphasizing the urgent need for enhanced methodologies. In this study, we developed a novel predictor that integrated all the residues (STY) phosphorylation sites information. We extracted ten different feature descriptors, primarily derived from composition, evolutionary, and position-specific information, and assessed their discriminative power through five classifiers. Our results indicated that Light Gradient Boosting (LGB) showed superior performance, and five descriptors displayed excellent discriminative capabilities. Subsequently, we identified the top two integrated features have high discriminative capability and trained with LGB to develop the final prediction model, LGB-IPs. The proposed approach shows an excellent performance on 10-fold cross-validation with an ACC, MCC, and AUC values of 0.831, 0.662, 0.907, respectively. Notably, these performances are replicated in the independent evaluation. Consequently, our approach may provide valuable insights into the phosphorylation mechanisms in SARS-CoV-2 infection for biomedical researchers.
AB - SARS-CoV-2′s global spread has instigated a critical health and economic emergency, impacting countless individuals. Understanding the virus's phosphorylation sites is vital to unravel the molecular intricacies of the infection and subsequent changes in host cellular processes. Several computational methods have been proposed to identify phosphorylation sites, typically focusing on specific residue (S/T) or Y phosphorylation sites. Unfortunately, current predictive tools perform best on these specific residues and may not extend their efficacy to other residues, emphasizing the urgent need for enhanced methodologies. In this study, we developed a novel predictor that integrated all the residues (STY) phosphorylation sites information. We extracted ten different feature descriptors, primarily derived from composition, evolutionary, and position-specific information, and assessed their discriminative power through five classifiers. Our results indicated that Light Gradient Boosting (LGB) showed superior performance, and five descriptors displayed excellent discriminative capabilities. Subsequently, we identified the top two integrated features have high discriminative capability and trained with LGB to develop the final prediction model, LGB-IPs. The proposed approach shows an excellent performance on 10-fold cross-validation with an ACC, MCC, and AUC values of 0.831, 0.662, 0.907, respectively. Notably, these performances are replicated in the independent evaluation. Consequently, our approach may provide valuable insights into the phosphorylation mechanisms in SARS-CoV-2 infection for biomedical researchers.
KW - Bioinformatics
KW - Light gradient boosting
KW - Phosphorylation sites
KW - SARS-CoV-2
KW - Sequence analysis
UR - https://www.scopus.com/pages/publications/85194917353
U2 - 10.1016/j.ymeth.2024.04.021
DO - 10.1016/j.ymeth.2024.04.021
M3 - Article
C2 - 38768932
AN - SCOPUS:85194917353
SN - 1046-2023
VL - 229
SP - 1
EP - 8
JO - Methods
JF - Methods
ER -