中文医学量表内容实体及关系标注语料库构建与应用

doi:10.3969/j.issn.1673-6036.2025.11.004

首页 > 过刊浏览>2025年第46卷第11期 >中文医学量表内容实体及关系标注语料库构建与应用

中文医学量表内容实体及关系标注语料库构建与应用
DOI:
                        10.3969/j.issn.1673-6036.2025.11.004
                    
作者:
                        
                        
                    
作者单位:(中国医学科学院/北京协和医学院医学信息研究所/图书馆 北京 100020)
作者简介:陈振丽,硕士研究生,发表论文3篇；
通讯作者:
中图分类号:R-058
基金项目:国家社会科学基金项目(项目编号:21BTQ069)。

Construction and Application of a Chinese Medical Scale Corpus with Entity and Relation Annotations

Author:

Affiliation:

(Institute of Medical Information/Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, China)

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

目的/意义构建中文医学量表核心内容要素语料库,为相关知识实体与关系抽取任务提供数据基础。方法/过程设计涵盖量表名称、测量概念、测量条目及其编码等5类实体,以及4类语义关系的标注体系,制定统一标注规范；采用双人背靠背人工标注方式,对1 491篇中文核心期刊文献富语义段落进行标注,并通过一致性计算和下游抽取任务实验评估语料库质量,最终形成CMedScale语料库。结果/结论各模型在引入语料库示例后,实体识别的Micro-F1分数提升2.95~13.89个百分点,关系抽取的Micro-F1分数提升16.93~33.33个百分点。CMedScale语料库为中文医学量表知识抽取及相关下游任务研究提供了高质量数据支撑。

Abstract:

Purpose/Significance To construct a Chinese medical scale corpus of core content elements, so as to provide a data basis for the task of extracting related knowledge entities and relations. Method/Process An annotation schema covering five types of scale-related entities such as scale names, measurement concepts, measurement items, and their corresponding codes, as well as four types of semantic relations is designed, and a unified annotation standard is formulated. A double-blind manual annotation approach is adopted to annotate semantically rich paragraphs from 1 491 Chinese core journal articles. The corpus quality is further evaluated through inter-annotator agreement and downstream task experiments, ultimately resulting in the CMedScale corpus. Result/Conclusion After introducing the corpus examples, the Micro-F1 scores of entity recognition in each model increased by 2.95 to 13.89 percentage points, and those for relation extraction improved by 16.93 to 33.33 percentage points. The CMedScale corpus provides high-quality data support for Chinese medical scale knowledge extraction and related downstream research tasks.

参考文献

相似文献

引证文献

引用本文

陈振丽,孙海霞,郝洁,等.中文医学量表内容实体及关系标注语料库构建与应用[J].医学信息学杂志,2025,46(11):20-27

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:2025-10-09
录用日期:
在线发布日期: 2025-12-15
出版日期:

首页

期刊介绍

在线期刊

投稿指南

出版政策

专家中心

学术交流

引用本文

分享

文章指标

历史

友情链接