TY - JOUR
T1 - M3S-ALG
T2 - Improved and robust prediction of allergenicity of chemical compounds by using a novel multi-step stacking strategy
AU - Charoenkwan, Phasit
AU - Schaduangrat, Nalini
AU - Phan, Le Thi
AU - Manavalan, Balachandran
AU - Shoombuatong, Watshara
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2025/1
Y1 - 2025/1
N2 - A wide variety of chemicals cannot be introduced to the marketplace because of their high allergenicity. Therefore, it is fundamentally crucial to assess the allergenic potential of chemicals before introducing them into clinical therapeutics. However, assessing the allergenicity of chemical compounds experimentally is time-consuming and costly. To tackle this challenge, we propose M3S-ALG, a novel multi-step stacking strategy (M3S) for rapid and accurate identification of the allergenicity of chemical compounds by using only the SMILES notation. The proposed M3S method involves three steps, as follows. First, ten different balanced datasets were constructed using an under-sampling approach. Second, for each balanced dataset, 144 base-classifiers were trained and optimized to generate the prediction scores of allergenic chemical compounds considered as new probabilistic features. Third, we selected the important probabilistic features and employed them to construct the final stacked model (M3S-ALG). Experimental results show that M3S-ALG outperforms conventional ensemble strategies and its constituent base-classifiers on both the training and independent test datasets. This indicates the effectiveness and robustness of our proposed strategy in identifying the allergenicity of chemical compounds. In addition, M3S-ALG exhibited excellent prediction performance compared to existing methods on the independent test dataset, achieving a balanced accuracy of 0.877, MCC of 0.712, and AUC of 0.931. Finally, we developed a user-friendly online web server at https://pmlabqsar.pythonanywhere.com/M3SALG. This new approach is anticipated to facilitate the drug discovery and development community for the large-scale identification of chemical compounds with no allergenic properties.
AB - A wide variety of chemicals cannot be introduced to the marketplace because of their high allergenicity. Therefore, it is fundamentally crucial to assess the allergenic potential of chemicals before introducing them into clinical therapeutics. However, assessing the allergenicity of chemical compounds experimentally is time-consuming and costly. To tackle this challenge, we propose M3S-ALG, a novel multi-step stacking strategy (M3S) for rapid and accurate identification of the allergenicity of chemical compounds by using only the SMILES notation. The proposed M3S method involves three steps, as follows. First, ten different balanced datasets were constructed using an under-sampling approach. Second, for each balanced dataset, 144 base-classifiers were trained and optimized to generate the prediction scores of allergenic chemical compounds considered as new probabilistic features. Third, we selected the important probabilistic features and employed them to construct the final stacked model (M3S-ALG). Experimental results show that M3S-ALG outperforms conventional ensemble strategies and its constituent base-classifiers on both the training and independent test datasets. This indicates the effectiveness and robustness of our proposed strategy in identifying the allergenicity of chemical compounds. In addition, M3S-ALG exhibited excellent prediction performance compared to existing methods on the independent test dataset, achieving a balanced accuracy of 0.877, MCC of 0.712, and AUC of 0.931. Finally, we developed a user-friendly online web server at https://pmlabqsar.pythonanywhere.com/M3SALG. This new approach is anticipated to facilitate the drug discovery and development community for the large-scale identification of chemical compounds with no allergenic properties.
KW - Allergy
KW - Chemical allergens
KW - Cheminformatics
KW - Feature selection
KW - Machine learning
KW - Stacking strategy
UR - https://www.scopus.com/pages/publications/85201084839
U2 - 10.1016/j.future.2024.07.033
DO - 10.1016/j.future.2024.07.033
M3 - Article
AN - SCOPUS:85201084839
SN - 0167-739X
VL - 162
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
M1 - 107455
ER -