TY - JOUR
T1 - Extremely-randomized-tree-based prediction of N6-methyladenosine sites in saccharomyces cerevisiae
AU - Govindaraj, Rajiv Gandhi
AU - Subramaniyam, Sathiyamoorthy
AU - Manavalan, Balachandran
N1 - Publisher Copyright:
© 2020 Bentham Science Publishers.
PY - 2020
Y1 - 2020
N2 - N6-methyladenosine (m6A) is one of the most common post-transcriptional modifications in RNA, which has been related to several biological processes. The accurate prediction of m6A sites from RNA sequences is one of the challenging tasks in computational biology. Several computational methods utilizing machine-learning algorithms have been proposed that accelerate in silico screening of m6A sites, thereby drastically reducing the experimental time and labor costs involved. In this study, we proposed a novel computational predictor termed ERT-m6Apred, for the accurate prediction of 6mA sites. To identify the feature encodings with more discriminative capability, we applied a two-step feature selection technique on seven different feature encodings and identified the corresponding optimal feature set. Subsequently, performance comparison of the corresponding optimal feature set-based extremely randomized tree model revealed that Pseudo k-tuple composition encoding, which includes 14 physicochemical properties significantly outperformed other encodings. Moreover, ERT-m6Apred achieved an accuracy of 78.84% during cross-validation analysis, which is comparatively better than recently reported predictors. In summary, ERT-m6Apred predicts Saccharomyces cerevisiae m6A sites with higher accuracy, thus facilitating biological hypothesis generation and experimental validations.
AB - N6-methyladenosine (m6A) is one of the most common post-transcriptional modifications in RNA, which has been related to several biological processes. The accurate prediction of m6A sites from RNA sequences is one of the challenging tasks in computational biology. Several computational methods utilizing machine-learning algorithms have been proposed that accelerate in silico screening of m6A sites, thereby drastically reducing the experimental time and labor costs involved. In this study, we proposed a novel computational predictor termed ERT-m6Apred, for the accurate prediction of 6mA sites. To identify the feature encodings with more discriminative capability, we applied a two-step feature selection technique on seven different feature encodings and identified the corresponding optimal feature set. Subsequently, performance comparison of the corresponding optimal feature set-based extremely randomized tree model revealed that Pseudo k-tuple composition encoding, which includes 14 physicochemical properties significantly outperformed other encodings. Moreover, ERT-m6Apred achieved an accuracy of 78.84% during cross-validation analysis, which is comparatively better than recently reported predictors. In summary, ERT-m6Apred predicts Saccharomyces cerevisiae m6A sites with higher accuracy, thus facilitating biological hypothesis generation and experimental validations.
KW - And RNA sequences
KW - Cross-validation
KW - Extremely randomized tree
KW - Feature optimization
KW - N-methyladenosine sites
UR - https://www.scopus.com/pages/publications/85081165499
U2 - 10.2174/1389202921666200219125625
DO - 10.2174/1389202921666200219125625
M3 - Article
AN - SCOPUS:85081165499
SN - 1389-2029
VL - 21
SP - 26
EP - 33
JO - Current Genomics
JF - Current Genomics
IS - 1
ER -