Improving chemical reaction yield prediction using pre-trained graph neural networks

  • Jongmin Han
  • , Youngchun Kwon
  • , Youn Suk Choi
  • , Seokho Kang

Research output: Contribution to journalArticlepeer-review

Abstract

Graph neural networks (GNNs) have proven to be effective in the prediction of chemical reaction yields. However, their performance tends to deteriorate when they are trained using an insufficient training dataset in terms of quantity or diversity. A promising solution to alleviate this issue is to pre-train a GNN on a large-scale molecular database. In this study, we investigate the effectiveness of GNN pre-training in chemical reaction yield prediction. We present a novel GNN pre-training method for performance improvement.Given a molecular database consisting of a large number of molecules, we calculate molecular descriptors for each molecule and reduce the dimensionality of these descriptors by applying principal component analysis. We define a pre-text task by assigning a vector of principal component scores as the pseudo-label to each molecule in the database. A GNN is then pre-trained to perform the pre-text task of predicting the pseudo-label for the input molecule. For chemical reaction yield prediction, a prediction model is initialized using the pre-trained GNN and then fine-tuned with the training dataset containing chemical reactions and their yields. We demonstrate the effectiveness of the proposed method through experimental evaluation on benchmark datasets.

Original languageEnglish
Article number25
JournalJournal of Cheminformatics
Volume16
Issue number1
DOIs
StatePublished - 1 Mar 2024

Keywords

  • Chemical reaction yield prediction
  • Deep learning
  • Graph neural network
  • Pre-training

Fingerprint

Dive into the research topics of 'Improving chemical reaction yield prediction using pre-trained graph neural networks'. Together they form a unique fingerprint.

Cite this