TY - JOUR
T1 - EHR phenotyping via jointly embedding medical concepts and words into a unified vector space
AU - Bai, Tian
AU - Chanda, Ashis Kumar
AU - Egleston, Brian L.
AU - Vucetic, Slobodan
N1 - Publisher Copyright:
© 2018 The Author(s).
PY - 2018/12/12
Y1 - 2018/12/12
N2 - Background: There has been an increasing interest in learning low-dimensional vector representations of medical concepts from Electronic Health Records (EHRs). Vector representations of medical concepts facilitate exploratory analysis and predictive modeling of EHR data to gain insights about the patterns of care and health outcomes. EHRs contain structured data such as diagnostic codes and laboratory tests, as well as unstructured free text data in form of clinical notes, which provide more detail about condition and treatment of patients. Methods: In this work, we propose a method that jointly learns vector representations of medical concepts and words. This is achieved by a novel learning scheme based on the word2vec model. Our model learns those relationships by integrating clinical notes and sets of accompanying medical codes and by defining joint contexts for each observed word and medical code. Results: In our experiments, we learned joint representations using MIMIC-III data. Using the learned representations of words and medical codes, we evaluated phenotypes for 6 diseases discovered by our and baseline method. The experimental results show that for each of the 6 diseases our method finds highly relevant words. We also show that our representations can be very useful when predicting the reason for the next visit. Conclusions: The jointly learned representations of medical concepts and words capture not only similarity between codes or words themselves, but also similarity between codes and words. They can be used to extract phenotypes of different diseases. The representations learned by the joint model are also useful for construction of patient features.
AB - Background: There has been an increasing interest in learning low-dimensional vector representations of medical concepts from Electronic Health Records (EHRs). Vector representations of medical concepts facilitate exploratory analysis and predictive modeling of EHR data to gain insights about the patterns of care and health outcomes. EHRs contain structured data such as diagnostic codes and laboratory tests, as well as unstructured free text data in form of clinical notes, which provide more detail about condition and treatment of patients. Methods: In this work, we propose a method that jointly learns vector representations of medical concepts and words. This is achieved by a novel learning scheme based on the word2vec model. Our model learns those relationships by integrating clinical notes and sets of accompanying medical codes and by defining joint contexts for each observed word and medical code. Results: In our experiments, we learned joint representations using MIMIC-III data. Using the learned representations of words and medical codes, we evaluated phenotypes for 6 diseases discovered by our and baseline method. The experimental results show that for each of the 6 diseases our method finds highly relevant words. We also show that our representations can be very useful when predicting the reason for the next visit. Conclusions: The jointly learned representations of medical concepts and words capture not only similarity between codes or words themselves, but also similarity between codes and words. They can be used to extract phenotypes of different diseases. The representations learned by the joint model are also useful for construction of patient features.
KW - Distributed representation
KW - Electronic health records
KW - Healthcare
KW - Natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85058311197&partnerID=8YFLogxK
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=purepublist2023&SrcAuth=WosAPI&KeyUT=WOS:000452837700001&DestLinkType=FullRecord&DestApp=WOS
U2 - 10.1186/s12911-018-0672-0
DO - 10.1186/s12911-018-0672-0
M3 - Article
C2 - 30537974
SN - 1472-6947
VL - 18
SP - 123
JO - BMC Medical Informatics and Decision Making
JF - BMC Medical Informatics and Decision Making
IS - Suppl 4
M1 - 123
ER -