Skip to main navigation Skip to search Skip to main content

Synthetic data generation method improves risk prediction model for early tumor recurrence after surgery in patients with pancreatic cancer

  • Eulji University
  • Seoul National University

Research output: Contribution to journalArticlepeer-review

Abstract

Pancreatic cancer is aggressive with high recurrence rates, necessitating accurate prediction models for effective treatment planning, particularly for neoadjuvant chemotherapy or upfront surgery. This study explores the use of variational autoencoder (VAE)-generated synthetic data to predict early tumor recurrence (within six months) in pancreatic cancer patients who underwent upfront surgery. Preoperative data of 158 patients between January 2021 and December 2022 was analyzed, and machine learning models—including Logistic Regression, Random Forest (RF), Gradient Boosting Machine (GBM), and Deep Neural Networks (DNN)—were trained on both original and synthetic datasets. The VAE-generated dataset (n = 94) closely matched the original data (p > 0.05) and enhanced model performance, improving accuracy (GBM: 0.81 to 0.87; RF: 0.84 to 0.87) and sensitivity (GBM: 0.73 to 0.91; RF: 0.82 to 0.91). PET/CT-derived metabolic parameters were the strongest predictors, accounting for 54.7% of the model predictive power with maximum standardized uptake value (SUVmax) showing the highest importance (0.182, 95% CI: 0.165–0.199). This study demonstrates that synthetic data can significantly enhance predictive models for pancreatic cancer recurrence, especially in data-limited scenarios, offering a promising strategy for oncology prediction models.

Original languageEnglish
Article number31885
JournalScientific Reports
Volume15
Issue number1
DOIs
StatePublished - Dec 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Early recurrence prediction
  • Machine learning
  • Pancreatic cancer
  • Synthetic data
  • Various autoencoder

Fingerprint

Dive into the research topics of 'Synthetic data generation method improves risk prediction model for early tumor recurrence after surgery in patients with pancreatic cancer'. Together they form a unique fingerprint.

Cite this