TY - GEN
T1 - From RAG to QA-RAG
T2 - 40th Annual ACM Symposium on Applied Computing, SAC 2025
AU - Kim, Jaewoong
AU - Hur, Minseok
AU - Min, Moohong
N1 - Publisher Copyright:
Copyright © 2025 held by the owner/author(s).
PY - 2025/5/14
Y1 - 2025/5/14
N2 - Regulatory compliance in the pharmaceutical industry involves navigating complex and voluminous guidelines, often requiring significant amounts of human resources. Recent advancements in Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) methods provide promising enhancements to data processing and knowledge management, potentially easing these burdens. However, despite these advancements, conventional Retrieval-Augmented Generation (RAG) methods fall short in this domain due to inherent structural problems. To address these challenges, we introduce the Question and Answer Retrieval Augmented Generation (QA-RAG) framework. This framework enhances the conventional RAG framework. It integrates a dual-track retrieval mechanism tailored to the specific and dynamic nature of pharmaceutical regulations. It utilizes not only the original query but also the answers generated by a fine-tuned LLM, thus providing a more robust foundation for document retrieval. Our experiments demonstrate that QA-RAG outperforms conventional methods in various evaluation metrics including precision, recall, and F1-score. These results underscore QA-RAG's capability to enhance both the accuracy and efficiency of regulatory compliance processes in the pharmaceutical industry. This paper details the structure and efficacy of QA-RAG, emphasizing its potential to revolutionize the regulatory compliance process in the pharmaceutical industry and beyond.
AB - Regulatory compliance in the pharmaceutical industry involves navigating complex and voluminous guidelines, often requiring significant amounts of human resources. Recent advancements in Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) methods provide promising enhancements to data processing and knowledge management, potentially easing these burdens. However, despite these advancements, conventional Retrieval-Augmented Generation (RAG) methods fall short in this domain due to inherent structural problems. To address these challenges, we introduce the Question and Answer Retrieval Augmented Generation (QA-RAG) framework. This framework enhances the conventional RAG framework. It integrates a dual-track retrieval mechanism tailored to the specific and dynamic nature of pharmaceutical regulations. It utilizes not only the original query but also the answers generated by a fine-tuned LLM, thus providing a more robust foundation for document retrieval. Our experiments demonstrate that QA-RAG outperforms conventional methods in various evaluation metrics including precision, recall, and F1-score. These results underscore QA-RAG's capability to enhance both the accuracy and efficiency of regulatory compliance processes in the pharmaceutical industry. This paper details the structure and efficacy of QA-RAG, emphasizing its potential to revolutionize the regulatory compliance process in the pharmaceutical industry and beyond.
KW - fine-tuning large language models (LLMs)
KW - information retrieval effectiveness
KW - pharmaceutical regulatory compliance
KW - retrieval-augmented generation (RAG)
UR - https://www.scopus.com/pages/publications/105006421891
U2 - 10.1145/3672608.3707749
DO - 10.1145/3672608.3707749
M3 - Conference contribution
AN - SCOPUS:105006421891
T3 - Proceedings of the ACM Symposium on Applied Computing
SP - 1293
EP - 1295
BT - 40th Annual ACM Symposium on Applied Computing, SAC 2025
PB - Association for Computing Machinery
Y2 - 31 March 2025 through 4 April 2025
ER -