TY - JOUR
T1 - Discovering body site and severity modifiers in clinical texts
AU - Dligach, Dmitriy
AU - Bethard, Steven
AU - Becker, Lee
AU - Miller, Timothy
AU - Savova, Guergana K.
PY - 2014
Y1 - 2014
N2 - Objective: To research computational methods for discovering body site and severity modifiers in clinical texts. Methods: We cast the task of discovering body site and severity modifiers as a relation extraction problem in the context of a supervised machine learning framework. We utilize rich linguistic features to represent the pairs of relation arguments and delegate the decision about the nature of the relationship between them to a support vector machine model. We evaluate our models using two corpora that annotate body site and severity modifiers. We also compare the model performance to a number of rule-based baselines. We conduct crossdomain portability experiments. In addition, we carry out feature ablation experiments to determine the contribution of various feature groups. Finally, we perform error analysis and report the sources of errors. Results: The performance of our method for discovering body site modifiers achieves F1 of 0.740-0.908 and our method for discovering severity modifiers achieves F1 of 0.905-0.929. Discussion: Results indicate that both methods perform well on both in-domain and out-domain data, approaching the performance of human annotators. The most salient features are token and named entity features, although syntactic dependency features also contribute to the overall performance. The dominant sources of errors are infrequent patterns in the data and inability of the system to discern deeper semantic structures. Conclusions: We investigated computational methods for discovering body site and severity modifiers in clinical texts. Our best system is released open source as part of the clinical Text Analysis and Knowledge Extraction System (cTAKES).
AB - Objective: To research computational methods for discovering body site and severity modifiers in clinical texts. Methods: We cast the task of discovering body site and severity modifiers as a relation extraction problem in the context of a supervised machine learning framework. We utilize rich linguistic features to represent the pairs of relation arguments and delegate the decision about the nature of the relationship between them to a support vector machine model. We evaluate our models using two corpora that annotate body site and severity modifiers. We also compare the model performance to a number of rule-based baselines. We conduct crossdomain portability experiments. In addition, we carry out feature ablation experiments to determine the contribution of various feature groups. Finally, we perform error analysis and report the sources of errors. Results: The performance of our method for discovering body site modifiers achieves F1 of 0.740-0.908 and our method for discovering severity modifiers achieves F1 of 0.905-0.929. Discussion: Results indicate that both methods perform well on both in-domain and out-domain data, approaching the performance of human annotators. The most salient features are token and named entity features, although syntactic dependency features also contribute to the overall performance. The dominant sources of errors are infrequent patterns in the data and inability of the system to discern deeper semantic structures. Conclusions: We investigated computational methods for discovering body site and severity modifiers in clinical texts. Our best system is released open source as part of the clinical Text Analysis and Knowledge Extraction System (cTAKES).
UR - http://www.scopus.com/inward/record.url?scp=84901834501&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84901834501&partnerID=8YFLogxK
U2 - 10.1136/amiajnl-2013-001766
DO - 10.1136/amiajnl-2013-001766
M3 - Article
C2 - 24091648
SN - 1067-5027
VL - 21
SP - 448
EP - 454
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 3
ER -