TY - JOUR
T1 - FRTpred
T2 - A novel approach for accurate prediction of protein folding rate and type
AU - Manavalan, Balachandran
AU - Lee, Jooyoung
N1 - Publisher Copyright:
© 2022 The Authors
PY - 2022/10
Y1 - 2022/10
N2 - Protein folding rate is an important property that is essential for understanding the protein folding process and is helpful for designing proteins. Predicting such properties from either sequence or structural information is a challenging task in bioinformatics. Although several computational methods have been developed in the past, only one sequence-based method is publicly available that shows limited accuracy when evaluated using a standardized independent dataset. This study proposes a novel approach, called FRTpred, that simultaneously predicts the logarithmic protein folding rate constant, ln(kf), and folding type from the provided sequence. First, 30 baseline models (regression models for ln(kf) and classification models for folding type) were constructed by integrating 10 representative feature extraction methods and three commonly used machine-learning algorithms. Subsequently, the predicted values of the 30 baseline models were combined and inputted into the random forest algorithm to construct the final prediction model. Cross-validation analysis showed that FRTpred achieved mean absolute deviations of 1.491, 2.016, and 1.954 for non-two-state, two-state, and combined models, respectively, when predicting ln(kf). Moreover, FRTpred predicts the folding type with an accuracy of 0.843. Performance comparisons based on independent tests against existing methods showed that FRTpred is more precise for both ln(kf) and folding type prediction. Thus, FRTpred is a powerful tool that may accelerate the characterization of the foldomics protein data and further inspire the development of next-generation predictors. The proposed model is available in the form of a web server that is freely accessible at http://thegleelab.org/FRTpred.
AB - Protein folding rate is an important property that is essential for understanding the protein folding process and is helpful for designing proteins. Predicting such properties from either sequence or structural information is a challenging task in bioinformatics. Although several computational methods have been developed in the past, only one sequence-based method is publicly available that shows limited accuracy when evaluated using a standardized independent dataset. This study proposes a novel approach, called FRTpred, that simultaneously predicts the logarithmic protein folding rate constant, ln(kf), and folding type from the provided sequence. First, 30 baseline models (regression models for ln(kf) and classification models for folding type) were constructed by integrating 10 representative feature extraction methods and three commonly used machine-learning algorithms. Subsequently, the predicted values of the 30 baseline models were combined and inputted into the random forest algorithm to construct the final prediction model. Cross-validation analysis showed that FRTpred achieved mean absolute deviations of 1.491, 2.016, and 1.954 for non-two-state, two-state, and combined models, respectively, when predicting ln(kf). Moreover, FRTpred predicts the folding type with an accuracy of 0.843. Performance comparisons based on independent tests against existing methods showed that FRTpred is more precise for both ln(kf) and folding type prediction. Thus, FRTpred is a powerful tool that may accelerate the characterization of the foldomics protein data and further inspire the development of next-generation predictors. The proposed model is available in the form of a web server that is freely accessible at http://thegleelab.org/FRTpred.
KW - Bioinformatics
KW - Folding type
KW - Machine learning
KW - Probabilistic features
KW - Protein folding rate
KW - Sequence analysis
UR - https://www.scopus.com/pages/publications/85138459537
U2 - 10.1016/j.compbiomed.2022.105911
DO - 10.1016/j.compbiomed.2022.105911
M3 - Article
C2 - 36096036
AN - SCOPUS:85138459537
SN - 0010-4825
VL - 149
JO - Computers in Biology and Medicine
JF - Computers in Biology and Medicine
M1 - 105911
ER -