面向知识抽取的真实世界中文电子病历数据质量分析与治理对策研究
作者:
作者单位:

(1.深圳市卫生健康发展研究和数据管理中心 深圳 518000;2.吉林大学公共卫生学院 长春 130021)

作者简介:

盖彦蓉,高级工程师,发表论文6篇;通信作者:张云秋,教授,博士生导师。

通讯作者:

中图分类号:

基金项目:


Study on Data Quality Analysis and Governance Countermeasures of Real-World Chinese Electronic Medical Records Oriented to Knowledge Extraction
Author:
Affiliation:

(1.Shenzhen Health Development Research and Data Management Center, Shenzhen 518000, China;2.School of Public Health, Jilin University, Changchun 130021, China)

Fund Project:

  • 摘要
  • 图/表
  • 访问统计
  • 参考文献
  • 相似文献
  • 引证文献
  • 资源附件
  • 文章评论
    摘要:

    目的/意义 分析真实世界中文电子病历知识抽取应用中的深层质量瓶颈,从数据治理和管理流程视角提出对策。方法/过程 制定覆盖临床诊疗主要实体和关系类型的标注规则,选用BERT+Bi-LSTM+CRF模型,基于真实世界电子病历数据开展实验,分析电子病历数据治理的关键问题。结果/结论 所选模型在真实世界电子病历上的实体及关系识别性能均明显低于其在公开数据集上的表现。数据方面原因包括表述不规范、数据稀疏、科室间术语差异;数据治理原因包括隐私保护与数据利用失衡、缺乏全流程管理及入库前质量检测等,对此提出针对性建议。

    Abstract:

    Purpose/Significance To analyze the deep-seated quality bottlenecks in the application of knowledge extraction of real-world Chinese electronic medical records (EMR), and to propose countermeasures from the perspectives of data governance and management processes. Method/Process Annotation rules covering the main entities and relationship types in clinical diagnosis and treatment are formulated. The BERT+Bi-LSTM+CRF model is selected, and experiments are conducted based on real-world EMR data to analyze the key issues in the governance of EMR data. Result/Conclusion The entity and relation recognition performance of the selected model on real-world EMR is significantly lower than that on public datasets. Data-related reasons include irregular expression, data sparsity, and terminology differences among departments. Data governance-related reasons include the imbalance between privacy protection and data utilization, the lack of end-to-end process management, insufficient pre-storage quality inspection, etc. Targeted suggestions are put forward.

    参考文献
    相似文献
    引证文献
引用本文

盖彦蓉,张云秋,张慧,等.面向知识抽取的真实世界中文电子病历数据质量分析与治理对策研究[J].医学信息学杂志,2025,46(12):47-53

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:2025-11-19
  • 录用日期:
  • 在线发布日期: 2026-01-08
  • 出版日期:

扫码关注

官方微信