TY - JOUR
T1 - EILEEN
T2 - A Multi-Modal Framework for Extracting Alcohol Consumption Patterns from Bilingual Clinical Notes
AU - Kim, Han Kyul
AU - Park, Yujin
AU - Kim, Yoon Ji
AU - Yi, Seungag
AU - Park, Yeju
AU - So, Sujin
AU - Lee, Hyeon Ji
AU - Bae, Ye Seul
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - In this work, we introduce EILEEN (Efficient Inference for Language-based Extraction of EHR Notes), a novel multi-modal natural language processing (NLP) framework designed to extract various alcohol consumption patterns from unstructured clinical notes, particularly in bilingual and non-English contexts. Recent advances in NLP have significantly improved information extraction capability across various domains. However, identifying patterns of alcohol consumption in medical documents remains underexplored, with existing approaches heavily relying on traditional NLP methods such as bag-of-words models that require extensive text preprocessing. These methods are often limited to English-language clinical settings, where robust medical ontologies and NLP toolkits are available to support preprocessing tasks. Therefore, this limitation hinders their use in multilingual healthcare settings and in environments lacking robust NLP toolkits to facilitate preprocessing. Motivated by the need for a more generalizable and accurate approach, this paper investigates the impact of large language models (LLMs) in advancing alcohol consumption pattern extraction from clinical notes. By reducing the need for manual preprocessing and improving adaptability to multilingual clinical notes, this work aims to enable broader, more practical applications of NLP models in extracting alcohol consumption patterns from clinical notes. By fine-tuning multilingual language models along with additional data sources, EILEEN effectively analyzes unstructured electronic health records (EHR) without relying on traditional concept normalization or extensive text preprocessing resources. Furthermore, the multi-modal component of EILEEN enables it to integrate and leverage diverse types of alcohol-related information, such as various types and amounts of alcohol consumed by a patient, thereby improving its pattern extraction accuracy. Our experiments, conducted in two different medical institutions in Korea, demonstrate that EILEEN significantly outperforms existing NLP methods in accurately identifying clinically relevant alcohol consumption patterns. By providing accurate, detailed, and clinically useful alcohol consumption patterns from unstructured clinical notes, EILEEN empowers healthcare practitioners with actionable insights essential for informed clinical decision-making.
AB - In this work, we introduce EILEEN (Efficient Inference for Language-based Extraction of EHR Notes), a novel multi-modal natural language processing (NLP) framework designed to extract various alcohol consumption patterns from unstructured clinical notes, particularly in bilingual and non-English contexts. Recent advances in NLP have significantly improved information extraction capability across various domains. However, identifying patterns of alcohol consumption in medical documents remains underexplored, with existing approaches heavily relying on traditional NLP methods such as bag-of-words models that require extensive text preprocessing. These methods are often limited to English-language clinical settings, where robust medical ontologies and NLP toolkits are available to support preprocessing tasks. Therefore, this limitation hinders their use in multilingual healthcare settings and in environments lacking robust NLP toolkits to facilitate preprocessing. Motivated by the need for a more generalizable and accurate approach, this paper investigates the impact of large language models (LLMs) in advancing alcohol consumption pattern extraction from clinical notes. By reducing the need for manual preprocessing and improving adaptability to multilingual clinical notes, this work aims to enable broader, more practical applications of NLP models in extracting alcohol consumption patterns from clinical notes. By fine-tuning multilingual language models along with additional data sources, EILEEN effectively analyzes unstructured electronic health records (EHR) without relying on traditional concept normalization or extensive text preprocessing resources. Furthermore, the multi-modal component of EILEEN enables it to integrate and leverage diverse types of alcohol-related information, such as various types and amounts of alcohol consumed by a patient, thereby improving its pattern extraction accuracy. Our experiments, conducted in two different medical institutions in Korea, demonstrate that EILEEN significantly outperforms existing NLP methods in accurately identifying clinically relevant alcohol consumption patterns. By providing accurate, detailed, and clinically useful alcohol consumption patterns from unstructured clinical notes, EILEEN empowers healthcare practitioners with actionable insights essential for informed clinical decision-making.
KW - alcohol information extraction
KW - Clinical informatics
KW - multilingual transformers
KW - multimodal learning
KW - natural language processing
UR - https://www.scopus.com/pages/publications/85217581897
U2 - 10.1109/ACCESS.2025.3538803
DO - 10.1109/ACCESS.2025.3538803
M3 - Article
AN - SCOPUS:85217581897
SN - 2169-3536
VL - 13
SP - 25741
EP - 25751
JO - IEEE Access
JF - IEEE Access
ER -