基于电子病历的慢性病辅助诊断模型与特征边际效应分析
作者:
作者单位:

(1.中国医学科学院/北京协和医学院医学信息研究所 北京 100020;2.人力资源和社会保障部信息中心 北京 100716;3.中国医学科学院/北京协和医学院马克思主义学院人文和社会科学学院 北京 100730)

作者简介:

王颖帅,博士,发表论文20余篇;通信作者:胡红濮。

通讯作者:

中图分类号:

R-058

基金项目:

中国卫生经济学会医药卫生体制改革重点研究课题;中国医学科学院医学与健康科技创新工程项目(项目编号:2022-I2M-1-019);中央级公益性科研院所基本科研业务费专项(项目编号:2024-ZHCH630-01)。


The Auxiliary Diagnosis Model of Chronic Diseases Based on Electronic Medical Records and Analysis of Feature Marginal Effects
Author:
Affiliation:

(1.Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, China;2.Information Center of the Ministry of Human Resources and Social Security, Beijing 100716, China;3.School of Marxism, School of Humanities and Social Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China)

Fund Project:

  • 摘要
  • 图/表
  • 访问统计
  • 参考文献
  • 相似文献
  • 引证文献
  • 资源附件
  • 文章评论
    摘要:

    目的/意义 基于电子病历数据中的主诉、生命体征、辅助检查、现病史等特征,预测6种常见慢性病患病概率,为基层医生辅助诊断提供参考。方法/过程 基于公开发布的电子病历数据,采用数据清洗、自然语言处理等技术,建设精细化特征工程,采用逻辑回归、朴素贝叶斯、深度神经网络、决策树、支持向量机和轻量级梯度提升机6种机器学习算法建模。采用精确度、召回率、AUC等指标评估模型性能,使用SHAP值增强模型可解释性。结果/结论 轻量级梯度提升树效果较好,能够有效预测慢性病风险。特征分析显示主诉、专科检查、现病史等对模型预测的边际效应影响较大,为个性化预防提供了科学依据。

    Abstract:

    Purpose/Significance To predict the probability of six common chronic diseases based on features from electronic medical record (EMR) data, including chief complaints, vital signs, ancillary examinations, and history of present illness, so as to provide references for auxiliary diagnosis for primary care physicians. Method/Process Using publicly available EMR data, techniques such as data cleaning and natural language processing (NLP) are employed to create a detailed feature. Models are built by using six machine learning algorithms:logistic regression, naive Bayes, deep neural networks, decision trees, support vector machine, and light gradient boosting machine (LightGBM). The performance of models is evaluated using multiple metrics, including accuracy, recall, and AUC. SHAP values are used to enhance the interpret ability of the models. Result/Conclusion LightGBM shows the best performance and can effectively predict chronic disease risk. Feature analysis reveals that chief complaints, specialty examinations, and history of present illness have a significant marginal effect on model predictions, providing a scientific basis for personalized prevention strategies.

    参考文献
    相似文献
    引证文献
引用本文

王颖帅,王智飞,万艳丽,等.基于电子病历的慢性病辅助诊断模型与特征边际效应分析[J].医学信息学杂志,2025,46(10):17-24

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:2025-09-03
  • 录用日期:
  • 在线发布日期: 2025-11-12
  • 出版日期:

扫码关注

官方微信