Abstract:Purpose/Significance To analyze the deep-seated quality bottlenecks in the application of knowledge extraction of real-world Chinese electronic medical records (EMR), and to propose countermeasures from the perspectives of data governance and management processes. Method/Process Annotation rules covering the main entities and relationship types in clinical diagnosis and treatment are formulated. The BERT+Bi-LSTM+CRF model is selected, and experiments are conducted based on real-world EMR data to analyze the key issues in the governance of EMR data. Result/Conclusion The entity and relation recognition performance of the selected model on real-world EMR is significantly lower than that on public datasets. Data-related reasons include irregular expression, data sparsity, and terminology differences among departments. Data governance-related reasons include the imbalance between privacy protection and data utilization, the lack of end-to-end process management, insufficient pre-storage quality inspection, etc. Targeted suggestions are put forward.