基于融合矩阵的文本相似度计算实现检索结果聚类
  修订日期:2023-12-29  点此下载全文
引用本文:赵悦阳,崔雷.基于融合矩阵的文本相似度计算实现检索结果聚类[J].医学信息学杂志,2024,45(3):58-64
摘要点击次数:
全文下载次数:
作者单位
赵悦阳 中国医科大学附属盛京医院图书馆 沈阳 110004 
崔雷 中国医科大学医学健康管理学院 沈阳 110122 
基金项目:辽宁省社会科学规划基金资助项目(项目编号:L20BTQ003)。
中文摘要:目的/意义 弥补医学文本语义表示方面的不足,实现PubMed数据库检索结果聚类。方法/过程 采用Jaccard系数和TF-IDF构建融合矩阵方法,建立短语间、文档间、短语与文档内容间的相似性关系融合矩阵,训练聚类算法,将PubMed数据库检索结果集合分组,随后生成类别标签,描述每一类簇文档的含义。结果/结论 基于融合矩阵的聚类效果较好,提取出描述类别的高频词能很好地区分类别含义,对检索结果文本聚类任务有效。
中文关键词:文献检索  文本聚类  融合矩阵  文本相似度
 
A Fusion Matrix-based Study on Text Clustering of Document Retrieval Results
Abstract:Purpose/Significance To solve the deficiencies in the semantic representation of medical texts, and to realize the clustering of the retrieval results of the PubMed database.Method/Process The paper proposes a method to construct a fusion matrix by using the Jaccard coefficient and TF-IDF. Similarity relations between phrases, documents, and the contents of phrases and documents are combined to construct a fusion matrix, and several clustering algorithms are trained to group a collection of documents from the PubMed database. Category annotations are created to describe the meaning of each category of clustered documents.Result/Conclusion Experimental results show that the fusion matrix-based clustering is superior in grouping the document sets, and the extracted high-frequency words in the category descriptions distinguish the meanings of the categories well, so the fusion matrix design is effective for clustering descriptions of academic texts.
keywords:document retrieval  text clustering  fusion matrix  text similarity
查看全文  查看/发表评论  下载PDF阅读器

京公网安备 11010502037823号

美女meinvmeinvmeinvmeinvmeinv