TY - JOUR
T1 - RDR100
T2 - A Robust Computational Method for Identification of Krüppel-like Factors
AU - Malik, Adeel
AU - Kamli, Majid Rasool
AU - Sabir, Jamal S.M.
AU - Phan, Le Thi
AU - Kim, Chang Bae
AU - Manavalan, Balachandran
N1 - Publisher Copyright:
© 2024, Bentham Science Publishers. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Background: Krüppel-like factors (KLFs) are a family of transcription factors containing zinc fingers that regulate various cellular processes. KLF proteins are associated with human diseases, such as cancer, cardiovascular diseases, and metabolic disorders. The KLF family consists of 18 members with diverse expression profiles across numerous tissues. Accurate identification and annotation of KLF proteins is crucial, given their involvement in important biological functions. Although experimental approaches can identify KLF proteins precisely, large-scale identification is complicated, slow, and expensive. Methods: In this study, we developed RDR100, a novel random forest (RF)-based framework for predicting KLF proteins based on their primary sequences. First, we identified the optimal encodings for ten different features using a recursive feature elimination approach, and then trained their respective model using five distinct machine learning (ML) classifiers. Results: The performance of all models was assessed using independent datasets, and RDR100 was selected as the final model based on its consistent performance in cross-validation and independent evaluation. Conclusion: Our results demonstrate that RDR100 is a robust predictor of KLF proteins. RDR100 web server is available at https://procarb.org/RDR100/.
AB - Background: Krüppel-like factors (KLFs) are a family of transcription factors containing zinc fingers that regulate various cellular processes. KLF proteins are associated with human diseases, such as cancer, cardiovascular diseases, and metabolic disorders. The KLF family consists of 18 members with diverse expression profiles across numerous tissues. Accurate identification and annotation of KLF proteins is crucial, given their involvement in important biological functions. Although experimental approaches can identify KLF proteins precisely, large-scale identification is complicated, slow, and expensive. Methods: In this study, we developed RDR100, a novel random forest (RF)-based framework for predicting KLF proteins based on their primary sequences. First, we identified the optimal encodings for ten different features using a recursive feature elimination approach, and then trained their respective model using five distinct machine learning (ML) classifiers. Results: The performance of all models was assessed using independent datasets, and RDR100 was selected as the final model based on its consistent performance in cross-validation and independent evaluation. Conclusion: Our results demonstrate that RDR100 is a robust predictor of KLF proteins. RDR100 web server is available at https://procarb.org/RDR100/.
KW - dipeptide composition
KW - Krüppel-like factors
KW - machine learning
KW - random forest
KW - recursive feature elimination
KW - twostep feature selection
UR - https://www.scopus.com/pages/publications/85197223467
U2 - 10.2174/1574893618666230905102407
DO - 10.2174/1574893618666230905102407
M3 - Article
AN - SCOPUS:85197223467
SN - 1574-8936
VL - 19
SP - 584
EP - 599
JO - Current Bioinformatics
JF - Current Bioinformatics
IS - 6
ER -