Skip to main navigation Skip to search Skip to main content

Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT

  • Jingye Yang
  • , Cong Liu
  • , Wendy Deng
  • , Da Wu
  • , Chunhua Weng
  • , Yunyun Zhou
  • , Kai Wang
  • Raymond G. Perelman Center for Cellular and Molecular Therapeutics
  • Columbia University

Research output: Contribution to journalArticlepeer-review

43 Scopus citations

Abstract

To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models-PhenoBCBERT and PhenoGPT-for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models to automate the detection of phenotype terms, including those not in the current HPO. We compare these models with PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also show strong performance in case studies on biomedical literature. We evaluate the strengths and weaknesses of BERT- and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases.

Original languageEnglish
Article number100887
Pages (from-to)100887
JournalPatterns
Volume5
Issue number1
DOIs
StatePublished - Jan 12 2024

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • BERT
  • DSML 3: Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems
  • GPT
  • Human Phenotype Ontology
  • clinical notes
  • electronic health records
  • named entity recognition
  • transformer

Fingerprint

Dive into the research topics of 'Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT'. Together they form a unique fingerprint.

Cite this