Abstract:Purpose/Significance To fully mine the text information in traditional Chinese medicine (TCM) medical records, to improve the degree of TCM informatization, and to improve the accuracy of downstream tasks such as symptom term extraction and relationship extraction in TCM medical records.Method/Process A large number of TCM medical case data are obtained through optical character recognition (OCR) technology and crawler technology, and data preprocessing is carried out. A pre-training data set for TCM medical case field is constructed. The first proprietary pre-training model, namely TcmYiAnBERT, for TCM field is obtained through multiple rounds of training by using the BERT model pre-training method, and the model is open source. Result/Conclusion The experiment shows that the recognition accuracy of TCM domain specific pre-training model TcmYiAnBERT in the task of TCM named entity recognition (NER) is 2.8 percentage points higher than that of other pre-training models.