融合相似度算法与预训练模型的中文电子病历实体映射方法研究
作者:
作者单位:

(中国医学科学院/北京协和医学院医学信息研究所/图书馆 北京 100020)

中图分类号:

R-058

基金项目:

科技创新2030——“新一代人工智能”重大专项课题“中文医学术语体系构建”(项目编号:2020AAA0104901)。


Study on Chinese Electronic Medical Record Entity Mapping Method by Fusing Similarity Algorithms and Pre-trained Models
Author:
Affiliation:

Institute of Medical Information & Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, China

  • 摘要
  • 访问统计
  • 参考文献 [1]
  • 相似文献
  • 文章评论
    摘要:

    采用自标注中文电子病历标准数据集,融合相似度算法与预训练模型并分别应用于实体映射的候选实体生成和实体消歧阶段,对不同相似度算法和预训练模型的性能进行比较分析。提出基于别名间相似性改进药物类实体映射效果的方法,结合Jaccard相似度算法与BERT预训练模型,高效实现海量中文电子病历实体映射任务。

    Abstract:

    The self-annotated Chinese electronic medical record(EMR) standard datasetisused, the similarity algorithms and pre-trained models are fused and applied to the candidate entity generation and entity disambiguation stages of entity mapping, and the performance of different similarity algorithms and pre-trained models is compared and analyzed. A method is proposed to improve the mapping effect of drug class entities based on alias similarity, and the Jaccard similarity algorithm and BERT pre-trained model are combined to efficiently realize the task of mapping the entities of massive Chinese EMRs.

    参考文献
    1 徐国海.面向中文医疗文本的命名实体识别研究[D].上海:华东师范大学,2019.2 吴思竹,钱庆.医学概念标准化工作研究[J].医学信息学杂志,2012,33(3):2-9.3 WANG Q, ZHOU Y, RUAN T,et al.Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition[J].Journal of biomedical informatics,2019,92:103133.4 DOGAN R I, LEAMAN R, LU Z.NCBI disease corpus:a resource for disease name recognition and concept normalization[J].Journal of biomedical informatics, 2014,47:1-10.5 黄源航,焦晓康,汤步洲,等.CHIP 2019评测任务1概述:临床术语标准化任务[J].中文信息学报,2021,35(3):94-99.6 胡佳慧,方安,赵琬清,等.面向知识发现的中文电子病历标注方法研究[J].数据分析与知识发现,2019,3(7):123-132.7 马诗语,黄润才.基于ALBERT与BILSTM的糖尿病命名实体识别[J].中国医学物理学杂志,2021,38(11):1438-1443.8 陈仕鸿,刘晓庆.基于余弦距离的中文问答系统中问句相似度计算[J].福建电脑,2017,33(2):31-32.9 ZHANG Z.An improved BM25 algorithm for clinical decision support in precision medicine based on co-word analysis and cuckoo search[J].BMC medical informatics and decision making,2021,21(1):81.10 于鹏.逻辑公式间的Jaccard距离及其应用[J].计算机科学与探索,2020,14(11):1975-1980.11 邵清,叶琨.基于编辑距离和相似度改进的汉字字符串匹配[J].电子科技,2016,29(9):7-11.12 王立印,张辉,陈勇.一种基于Dice-Euclidean相似度计算的协同过滤算法[J].计算机应用研究,2015,32(10):2891-2895.13 AlAMMARY A S.BERTmodels for arabic text classification:asystematic review[J].Applied sciences,2022,12(11):5720.14 GAO L, ZHANG L, ZHANG L, et al.RSVN:a RoBERTa sentence vector normalization scheme for short texts to extract semantic information[J].Applied sciences,2022,12(21):11278.15 CHOI B, LEE Y, KYUNG Y,et al.AlBERT with knowledge graph encoder utilizing semantic similarity for commonsense question answering[J].Intelligent automation & soft computing,2023,36(1):71–82.16 CUI Y,CHE W, LIU T,et al.Pre-training with whole word masking for Chinese BERT[J].IEEE/ACM transactions on audio, speech, and language processing,2021,29:3504-3514.17 孙曰君,刘智强,杨志豪,等.基于BERT的临床术语标准化[J].中文信息学报,2021,35(4):75-82.
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

冯凤翔,任慧玲,李晓瑛,等.融合相似度算法与预训练模型的中文电子病历实体映射方法研究[J].医学信息学杂志,2023,44(5):45-50

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 最后修改日期:2023-02-20
  • 在线发布日期: 2023-06-13

扫码关注

官方微信