TY - JOUR
T1 - Improving thyroid disorder diagnosis via innovative stacking ensemble learning model
AU - Hassan, Ayesha
AU - Ramzan, Shabana
AU - Raza, Ali
AU - Munwar Iqbal, Muhammad
AU - Smerat, Aseel
AU - Latif Fitriyani, Norma
AU - Syafrudin, Muhammad
AU - Won Lee, Seung
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Objective: Hypothyroidism, hyperthyroidism, thyroid nodules, and other thyroid disorders are common around the world, affect millions of people worldwide, and untreated health conditions may lead to serious health issues. An accurate and timely diagnosis serves as crucial for proper management and medication. This study utilizes a dataset from the UCI machine-learning repository to put forward the comprehensive machine-learning technique for diagnosing thyroid disorders. Methods: The proposed methodology involved exploratory data analysis and preparation, which included handling missing values, encoding categorical values, and selecting features. The synthetic minority over-sampling technique technique is utilized to overcome the problem of class imbalance. Five advanced machine learning (ML) algorithms, logistic regression, support vector machine, decision tree, random forest, and gradient boosting are employed to develop predictive models. Further, an innovative stacking ensemble method is proposed with the help of four applied models. The results from these models are aggregated, and logistic regression serves as a meta-learner. Results: A 10-fold cross-validation technique is utilized to ensure robust model evaluation and reduce the risk of overfitting by using one test set for each subset and training on the rest of the subsets. The ensemble model attained an accuracy of 99.86%, outperforming individual models. Conclusion: These results reveal the capability of ML, especially ensemble approaches, to enhance accurate and timely diagnosis of thyroid disorders.
AB - Objective: Hypothyroidism, hyperthyroidism, thyroid nodules, and other thyroid disorders are common around the world, affect millions of people worldwide, and untreated health conditions may lead to serious health issues. An accurate and timely diagnosis serves as crucial for proper management and medication. This study utilizes a dataset from the UCI machine-learning repository to put forward the comprehensive machine-learning technique for diagnosing thyroid disorders. Methods: The proposed methodology involved exploratory data analysis and preparation, which included handling missing values, encoding categorical values, and selecting features. The synthetic minority over-sampling technique technique is utilized to overcome the problem of class imbalance. Five advanced machine learning (ML) algorithms, logistic regression, support vector machine, decision tree, random forest, and gradient boosting are employed to develop predictive models. Further, an innovative stacking ensemble method is proposed with the help of four applied models. The results from these models are aggregated, and logistic regression serves as a meta-learner. Results: A 10-fold cross-validation technique is utilized to ensure robust model evaluation and reduce the risk of overfitting by using one test set for each subset and training on the rest of the subsets. The ensemble model attained an accuracy of 99.86%, outperforming individual models. Conclusion: These results reveal the capability of ML, especially ensemble approaches, to enhance accurate and timely diagnosis of thyroid disorders.
KW - cross-validation
KW - ensemble method
KW - Machine learning
KW - predictive modeling
KW - synthetic minority over-sampling technique
KW - thyroid disorders
UR - https://www.scopus.com/pages/publications/105007624700
U2 - 10.1177/20552076251341430
DO - 10.1177/20552076251341430
M3 - Article
AN - SCOPUS:105007624700
SN - 2055-2076
VL - 11
JO - Digital Health
JF - Digital Health
M1 - 20552076251341430
ER -