Synthetic data generation method improves risk prediction model for early tumor recurrence after surgery in patients with pancreatic cancer

Hye Jeong Jeong, Jeong Moo Lee, Hyeong Seok Kim, Hochang Chae, So Jeong Yoon, Sang Hyun Shin, In Woong Han, Jin Seok Heo, Ji Hye Min, Seung Hyup Hyun, Hongbeom Kim

Research output: Contribution to journalArticlepeer-review

Abstract

Pancreatic cancer is aggressive with high recurrence rates, necessitating accurate prediction models for effective treatment planning, particularly for neoadjuvant chemotherapy or upfront surgery. This study explores the use of variational autoencoder (VAE)-generated synthetic data to predict early tumor recurrence (within six months) in pancreatic cancer patients who underwent upfront surgery. Preoperative data of 158 patients between January 2021 and December 2022 was analyzed, and machine learning models—including Logistic Regression, Random Forest (RF), Gradient Boosting Machine (GBM), and Deep Neural Networks (DNN)—were trained on both original and synthetic datasets. The VAE-generated dataset (n = 94) closely matched the original data (p > 0.05) and enhanced model performance, improving accuracy (GBM: 0.81 to 0.87; RF: 0.84 to 0.87) and sensitivity (GBM: 0.73 to 0.91; RF: 0.82 to 0.91). PET/CT-derived metabolic parameters were the strongest predictors, accounting for 54.7% of the model predictive power with maximum standardized uptake value (SUVmax) showing the highest importance (0.182, 95% CI: 0.165–0.199). This study demonstrates that synthetic data can significantly enhance predictive models for pancreatic cancer recurrence, especially in data-limited scenarios, offering a promising strategy for oncology prediction models.

Original languageEnglish
Article number31885
JournalScientific Reports
Volume15
Issue number1
DOIs
StatePublished - Dec 2025

Keywords

  • Early recurrence prediction
  • Machine learning
  • Pancreatic cancer
  • Synthetic data
  • Various autoencoder

Fingerprint

Dive into the research topics of 'Synthetic data generation method improves risk prediction model for early tumor recurrence after surgery in patients with pancreatic cancer'. Together they form a unique fingerprint.

Cite this