TY - JOUR
T1 - EHR2Vec
T2 - Representation Learning of Medical Concepts From Temporal Patterns of Clinical Notes Based on Self-Attention Mechanism
AU - Wang, Li
AU - Wang, Qinghua
AU - Bai, Heming
AU - Liu, Cong
AU - Liu, Wei
AU - Zhang, Yuanpeng
AU - Jiang, Lei
AU - Xu, Huji
AU - Wang, Kai
AU - Zhou, Yunyun
N1 - Copyright © 2020 Wang, Wang, Bai, Liu, Liu, Zhang, Jiang, Xu, Wang and Zhou.
PY - 2020
Y1 - 2020
N2 - Efficiently learning representations of clinical concepts (i. e., symptoms, lab test, etc.) from unstructured clinical notes of electronic health record (EHR) data remain significant challenges, since each patient may have multiple visits at different times and each visit may contain different sequential concepts. Therefore, learning distributed representations from temporal patterns of clinical notes is an essential step for downstream applications on EHR data. However, existing methods for EHR representation learning can not adequately capture either contextual information per-visit or temporal information at multiple visits. In this study, we developed a new vector embedding method called EHR2Vec that can learn semantically-meaningful representations of clinical concepts. EHR2Vec incorporated the self-attention structure and showed its utility in accurately identifying relevant clinical concept entities considering time sequence information from multiple visits. Using EHR data from systemic lupus erythematosus (SLE) patients as a case study, we showed EHR2Vec outperforms in identifying interpretable representations compared to other well-known methods including Word2Vec and Med2Vec, according to clinical experts' evaluations.
AB - Efficiently learning representations of clinical concepts (i. e., symptoms, lab test, etc.) from unstructured clinical notes of electronic health record (EHR) data remain significant challenges, since each patient may have multiple visits at different times and each visit may contain different sequential concepts. Therefore, learning distributed representations from temporal patterns of clinical notes is an essential step for downstream applications on EHR data. However, existing methods for EHR representation learning can not adequately capture either contextual information per-visit or temporal information at multiple visits. In this study, we developed a new vector embedding method called EHR2Vec that can learn semantically-meaningful representations of clinical concepts. EHR2Vec incorporated the self-attention structure and showed its utility in accurately identifying relevant clinical concept entities considering time sequence information from multiple visits. Using EHR data from systemic lupus erythematosus (SLE) patients as a case study, we showed EHR2Vec outperforms in identifying interpretable representations compared to other well-known methods including Word2Vec and Med2Vec, according to clinical experts' evaluations.
UR - http://www.scopus.com/inward/record.url?scp=85088277028&partnerID=8YFLogxK
U2 - 10.3389/fgene.2020.00630
DO - 10.3389/fgene.2020.00630
M3 - Article
C2 - 32714371
SN - 1664-8021
VL - 11
SP - 630
JO - Frontiers in Genetics
JF - Frontiers in Genetics
M1 - 630
ER -