TY - JOUR
T1 - Identification of SH2 domain-containing proteins and motifs prediction by a deep learning method
AU - Wu, Duanzhi
AU - Fang, Xin
AU - Luan, Kai
AU - Xu, Qijin
AU - Lin, Shiqi
AU - Sun, Shiying
AU - Yang, Jiaying
AU - Dong, Bingying
AU - Manavalan, Balachandran
AU - Liao, Zhijun
N1 - Publisher Copyright:
© 2023
PY - 2023/8
Y1 - 2023/8
N2 - The Src Homology 2 (SH2) domain plays an important role in the signal transmission mechanism in organisms. It mediates the protein-protein interactions based on the combination between phosphotyrosine and motifs in SH2 domain. In this study, we designed a method to identify SH2 domain-containing proteins and non-SH2 domain-containing proteins through deep learning technology. Firstly, we collected SH2 and non-SH2 domain-containing protein sequences including multiple species. We built six deep learning models through DeepBIO after data preprocessing and compared their performance. Secondly, we selected the model with the strongest comprehensive ability to conduct training and test separately again, and analyze the results visually. It was found that 288-dimensional (288D) feature could effectively identify two types of proteins. Finally, motifs analysis discovered the specific motif YKIR and revealed its function in signal transduction. In summary, we successfully identified SH2 domain and non-SH2 domain proteins through deep learning method, and obtained 288D features that perform best. In addition, we found a new motif YKIR in SH2 domain, and analyzed its function which helps to further understand the signaling mechanisms within the organism.
AB - The Src Homology 2 (SH2) domain plays an important role in the signal transmission mechanism in organisms. It mediates the protein-protein interactions based on the combination between phosphotyrosine and motifs in SH2 domain. In this study, we designed a method to identify SH2 domain-containing proteins and non-SH2 domain-containing proteins through deep learning technology. Firstly, we collected SH2 and non-SH2 domain-containing protein sequences including multiple species. We built six deep learning models through DeepBIO after data preprocessing and compared their performance. Secondly, we selected the model with the strongest comprehensive ability to conduct training and test separately again, and analyze the results visually. It was found that 288-dimensional (288D) feature could effectively identify two types of proteins. Finally, motifs analysis discovered the specific motif YKIR and revealed its function in signal transduction. In summary, we successfully identified SH2 domain and non-SH2 domain proteins through deep learning method, and obtained 288D features that perform best. In addition, we found a new motif YKIR in SH2 domain, and analyzed its function which helps to further understand the signaling mechanisms within the organism.
KW - Binary classification
KW - Deep learning
KW - Motif analysis
KW - SH2 domain
UR - https://www.scopus.com/pages/publications/85161041810
U2 - 10.1016/j.compbiomed.2023.107065
DO - 10.1016/j.compbiomed.2023.107065
M3 - Article
C2 - 37267826
AN - SCOPUS:85161041810
SN - 0010-4825
VL - 162
JO - Computers in Biology and Medicine
JF - Computers in Biology and Medicine
M1 - 107065
ER -