TY - JOUR
T1 - BERT4Bitter
T2 - A bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides
AU - Charoenkwan, Phasit
AU - Nantasenamat, Chanin
AU - Hasan, Md Mehedi
AU - Manavalan, Balachandran
AU - Shoombuatong, Watshara
N1 - Publisher Copyright:
© The Author(s) 2021. Published by Oxford University Press. All rights reserved.
PY - 2021/9/1
Y1 - 2021/9/1
N2 - Motivation: The identification of bitter peptides through experimental approaches is an expensive and timeconsuming endeavor. Due to the huge number of newly available peptide sequences in the post-genomic era, the development of automated computational models for the identification of novel bitter peptides is highly desirable. Results: In this work, we present BERT4Bitter, a bidirectional encoder representation from transformers (BERT)- based model for predicting bitter peptides directly from their amino acid sequence without using any structural information. To the best of our knowledge, this is the first time a BERT-based model has been employed to identify bitter peptides. Compared to widely used machine learning models, BERT4Bitter achieved the best performance with an accuracy of 0.861 and 0.922 for cross-validation and independent tests, respectively. Furthermore, extensive empirical benchmarking experiments on the independent dataset demonstrated that BERT4Bitter clearly outperformed the existing method with improvements of 8.0% accuracy and 16.0% Matthews coefficient correlation, highlighting the effectiveness and robustness of BERT4Bitter. We believe that the BERT4Bitter method proposed herein will be a useful tool for rapidly screening and identifying novel bitter peptides for drug development and nutritional research.
AB - Motivation: The identification of bitter peptides through experimental approaches is an expensive and timeconsuming endeavor. Due to the huge number of newly available peptide sequences in the post-genomic era, the development of automated computational models for the identification of novel bitter peptides is highly desirable. Results: In this work, we present BERT4Bitter, a bidirectional encoder representation from transformers (BERT)- based model for predicting bitter peptides directly from their amino acid sequence without using any structural information. To the best of our knowledge, this is the first time a BERT-based model has been employed to identify bitter peptides. Compared to widely used machine learning models, BERT4Bitter achieved the best performance with an accuracy of 0.861 and 0.922 for cross-validation and independent tests, respectively. Furthermore, extensive empirical benchmarking experiments on the independent dataset demonstrated that BERT4Bitter clearly outperformed the existing method with improvements of 8.0% accuracy and 16.0% Matthews coefficient correlation, highlighting the effectiveness and robustness of BERT4Bitter. We believe that the BERT4Bitter method proposed herein will be a useful tool for rapidly screening and identifying novel bitter peptides for drug development and nutritional research.
UR - https://www.scopus.com/pages/publications/85102066790
U2 - 10.1093/bioinformatics/btab133
DO - 10.1093/bioinformatics/btab133
M3 - Article
C2 - 33638635
AN - SCOPUS:85102066790
SN - 1367-4803
VL - 37
SP - 2556
EP - 2562
JO - Bioinformatics
JF - Bioinformatics
IS - 17
ER -