EHR phenotyping via jointly embedding medical concepts and words into a unified vector space

Tian Bai, Ashis Kumar Chanda, Brian L. Egleston, Slobodan Vucetic

Research output: Contribution to journalArticlepeer-review

24 Scopus citations

Abstract

Background: There has been an increasing interest in learning low-dimensional vector representations of medical concepts from Electronic Health Records (EHRs). Vector representations of medical concepts facilitate exploratory analysis and predictive modeling of EHR data to gain insights about the patterns of care and health outcomes. EHRs contain structured data such as diagnostic codes and laboratory tests, as well as unstructured free text data in form of clinical notes, which provide more detail about condition and treatment of patients. Methods: In this work, we propose a method that jointly learns vector representations of medical concepts and words. This is achieved by a novel learning scheme based on the word2vec model. Our model learns those relationships by integrating clinical notes and sets of accompanying medical codes and by defining joint contexts for each observed word and medical code. Results: In our experiments, we learned joint representations using MIMIC-III data. Using the learned representations of words and medical codes, we evaluated phenotypes for 6 diseases discovered by our and baseline method. The experimental results show that for each of the 6 diseases our method finds highly relevant words. We also show that our representations can be very useful when predicting the reason for the next visit. Conclusions: The jointly learned representations of medical concepts and words capture not only similarity between codes or words themselves, but also similarity between codes and words. They can be used to extract phenotypes of different diseases. The representations learned by the joint model are also useful for construction of patient features.

Original languageEnglish
Article number123
Pages (from-to)123
JournalBMC Medical Informatics and Decision Making
Volume18
Issue numberSuppl 4
DOIs
StatePublished - Dec 12 2018

Keywords

  • Distributed representation
  • Electronic health records
  • Healthcare
  • Natural language processing

Fingerprint

Dive into the research topics of 'EHR phenotyping via jointly embedding medical concepts and words into a unified vector space'. Together they form a unique fingerprint.

Cite this