TY - JOUR
T1 - Natural language processing (NLP) tools in extracting biomedical concepts from research articles
T2 - a case study on autism spectrum disorder
AU - Peng, Jacqueline
AU - Zhao, Mengge
AU - Havrilla, James
AU - Liu, Cong
AU - Weng, Chunhua
AU - Guthrie, Whitney
AU - Schultz, Robert
AU - Wang, Kai
AU - Zhou, Yunyun
N1 - Publisher Copyright:
© 2020, The Author(s).
PY - 2020/11/30
Y1 - 2020/11/30
N2 - BACKGROUND: Natural language processing (NLP) tools can facilitate the extraction of biomedical concepts from unstructured free texts, such as research articles or clinical notes. The NLP software tools CLAMP, cTAKES, and MetaMap are among the most widely used tools to extract biomedical concept entities. However, their performance in extracting disease-specific terminology from literature has not been compared extensively, especially for complex neuropsychiatric disorders with a diverse set of phenotypic and clinical manifestations.METHODS: We comparatively evaluated these NLP tools using autism spectrum disorder (ASD) as a case study. We collected 827 ASD-related terms based on previous literature as the benchmark list for performance evaluation. Then, we applied CLAMP, cTAKES, and MetaMap on 544 full-text articles and 20,408 abstracts from PubMed to extract ASD-related terms. We evaluated the predictive performance using precision, recall, and F1 score.RESULTS: We found that CLAMP has the best performance in terms of F1 score followed by cTAKES and then MetaMap. Our results show that CLAMP has much higher precision than cTAKES and MetaMap, while cTAKES and MetaMap have higher recall than CLAMP.CONCLUSION: The analysis protocols used in this study can be applied to other neuropsychiatric or neurodevelopmental disorders that lack well-defined terminology sets to describe their phenotypic presentations.
AB - BACKGROUND: Natural language processing (NLP) tools can facilitate the extraction of biomedical concepts from unstructured free texts, such as research articles or clinical notes. The NLP software tools CLAMP, cTAKES, and MetaMap are among the most widely used tools to extract biomedical concept entities. However, their performance in extracting disease-specific terminology from literature has not been compared extensively, especially for complex neuropsychiatric disorders with a diverse set of phenotypic and clinical manifestations.METHODS: We comparatively evaluated these NLP tools using autism spectrum disorder (ASD) as a case study. We collected 827 ASD-related terms based on previous literature as the benchmark list for performance evaluation. Then, we applied CLAMP, cTAKES, and MetaMap on 544 full-text articles and 20,408 abstracts from PubMed to extract ASD-related terms. We evaluated the predictive performance using precision, recall, and F1 score.RESULTS: We found that CLAMP has the best performance in terms of F1 score followed by cTAKES and then MetaMap. Our results show that CLAMP has much higher precision than cTAKES and MetaMap, while cTAKES and MetaMap have higher recall than CLAMP.CONCLUSION: The analysis protocols used in this study can be applied to other neuropsychiatric or neurodevelopmental disorders that lack well-defined terminology sets to describe their phenotypic presentations.
KW - Autism Spectrum Disorder
KW - Benchmarking
KW - Humans
KW - Natural Language Processing
KW - PubMed
KW - Software
UR - http://www.scopus.com/inward/record.url?scp=85098332856&partnerID=8YFLogxK
U2 - 10.1186/s12911-020-01352-2
DO - 10.1186/s12911-020-01352-2
M3 - Article
C2 - 33380331
SN - 1472-6947
VL - 20
SP - 322
JO - Bmc Medical Informatics and Decision Making
JF - Bmc Medical Informatics and Decision Making
IS - Suppl 11
M1 - 322
ER -