融合汉字部首的BERT-BiLSTM-CRF中医医案命名实体识别模型
作者:
作者单位:

(湖南中医药大学信息科学与工程学院 长沙 410208)

作者简介:

通讯作者:

中图分类号:

R-058

基金项目:

国家重点研发计划中医药现代化研究重点专项(项目编号:2017YFC1703306);科技创新2030——“新一代人工智能”重大项目课题(项目编号:2018AAA0102102);湖南省中医药管理局重点课题(项目编号:A2023048)。


Study on Named Entity Recognition of Chinese Medical Records Based on BERT-BiLSTM-CRF with Chinese Radicals
Author:
Affiliation:

(School of Informatics, Hunan University of Chinese Medicine, Changsha 410208, China)

Fund Project:

  • 摘要
  • 图/表
  • 访问统计
  • 参考文献
  • 相似文献
  • 引证文献
  • 资源附件
  • 文章评论
    摘要:

    目的/意义 研究提取中医医案中医疗术语的方法,实现医案自动结构化,为医案知识发现提供结构化数据。方法/过程 提出一种BERT结合长短期记忆人工神经网络、条件随机场和部首特征的深度学习命名实体识别模型,在BERT词向量中嵌入汉字部首,采用双向长短期记忆人工神经网络提取实体特征,使用条件随机场进行序列预测。将人工标注的400份共计5万余字的医案按照3∶1划分为训练集和测试集,使用该模型识别中医医案中的身体部位、药物、症状、疾病4类命名实体。结果/结论 该模型在测试集F1值为84.81%,优于其他未嵌入部首的模型,表明该模型能够更有效地识别中医医案中的命名实体,更好地结构化医案。

    Abstract:

    Purpose/Significance To study the method of extracting medical terms from Chinese medical records, to realize the automatic structure of medical records, and to provide structured data for knowledge discovery of medical records. Method/Process The paper proposes a deep learning named entity recognition (NER) model based on BERT combining long short-term memory (LSTM), conditional random fields (CRF) and radical features. This model embeds Chinese radicals in BERT word vector, extracts entity features with BiLSTM, and uses CRF for sequence prediction. 400 medical cases with a total of more than 50 000 words manually marked are divided into training set and test set according to 3∶1, the model is used to identify four types of named entities in Chinese medical records, namely body, medicine, symptom, and disease. Result/Conclusion The F1 value of this model on the test set is 84.81%, which is superior to other models without embedded radicals, indicating that the model can more effectively identify named entities in Chinese medical records and better structured medical records.

    参考文献
    相似文献
    引证文献
引用本文

刘彬,肖晓霞,邹北骥,等.融合汉字部首的BERT-BiLSTM-CRF中医医案命名实体识别模型[J].医学信息学杂志,2023,44(6):48-53

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:2022-12-19
  • 录用日期:
  • 在线发布日期: 2023-07-18
  • 出版日期:

扫码关注

官方微信