Abstract:Purpose/Significance To construct a multidimensional data mining prediction framework, and to enhance the accuracy of risk prediction of type 2 diabetes mellitus (T2DM) and the efficiency of clinical decision-making. Method/Process Based on the Pima dataset, univariate, bivariate, and multivariate analyses are conducted to screen core risk factors. Five machine learning models, namely logistic regression, random forest, support vector machine, extreme gradient boosting and light gradient boosting machine, are employed for modeling. Hyperparameter optimization is performed using grid search and cross-validation. Result/Conclusion The identified key risk factors such as blood glucose level, body mass index, and age are consistent with conclusions from traditional evidence based medicine. The prediction accuracy of random forest reaches 0.870 1, and the overall performance is the best. By data mining and feature selection, the cost of data collection is reduced, the cycle of risk factor identification is shortened, and the nonlinear interaction mechanism among variables is revealed, providing an efficient tool for the general screening of high-risk groups in the community.