Abstract
BACKGROUND: High-risk human papillomavirus (hrHPV) full genotyping facilitates risk stratification and efficiency in cervical cancer screening, widely verified and adopted in various screening settings. We aimed develop a cervical cancer predictive model that can guide referrals for colposcopy using hrHPV full genotyping data in a setting where screening rate is low. METHODS: We developed, compared and validated four machine learning models (eXtreme gradient boosting [XGBoost], support vector machine [SVM], random forest [RF], and naive bayes [NB]) for cervical cancer prediction, using data from a national cervical cancer screening project conducted in 267 healthcare centers in China. Cervical intraepithelial neoplasia grade 2 or worse (CIN2+) and CIN3+ were the primary and secondary outcomes. In various screening settings across China, the performance of discrimination was evaluated using area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, area under the precision-recall curve (AUPRC), and accuracy. Calibration and clinical utility were assessed with brier score, calibration curve and decision curve analysis (DCA). FINDINGS: 1,112,846 women were recruited, of whom 599,043 were included in the analysis based on hrHPV full genotyping. Of these, 254,434 (age [years, median, IQR]: 48, 42-54), 297,479 (49, 43-55), 38,500 (37, 32-44), 1950 (38, 33-46), 1590 (53, 47-58), 779 (38, 31-49) and 4311 (40, 33-50) were in the development, temporal validation and external validation 1-5 datasets, respectively. The final simplified clinical risk prediction model includes hrHPV, number of HPV genotypes, cervical cytology, HPV16, HPV18, age, HPV52, HPV39 and gynecological examination. The final optimal XGBoost model for predicting CIN2+ showed good discrimination (AUROC, maximum 0.989 [0.987-0.992]; minimum 0.781 [0.74-0.819]), and calibration (brier score, maximum 0.118 [0.099-0.137]) in the five external validation sets. DCA showed that when the clinical decision threshold probability for optimal XGBoost model was less than 0.80, the model for predicting CIN2+ provided a superior standardized net benefit. The optimal XGBoost model obtained similar results in predicting CIN3+. INTERPRETATION: We developed a cervical cancer screening risk prediction model that employs hrHPV full genotyping and simple test results to achieve risk prediction and stratified management for colposcopy referrals. This predictive tool is particularly suitable for settings with low screening rates.
Citation
@article{RN1119,
author = {Dong, B. and Lu, Z. and Yang, T. and Wang, J. and Zhang, Y. and Tuo, X. and Wang, J. and Lin, S. and Cai, H. and Cheng, H. and Cao, X. and Huang, X. and Zheng, Z. and Miao, C. and Wang, Y. and Xue, H. and Xu, S. and Liu, X. and Zou, H. and Sun, P.},
title = {Development, validation, and clinical application of a machine learning model for risk stratification and management of cervical cancer screening based on full-genotyping hrHPV test (SMART-HPV): a modelling study},
journal = {Lancet Reg Health West Pac},
volume = {55},
pages = {101480},
ISSN = {2666-6065 (Electronic)
2666-6065 (Linking)},
DOI = {10.1016/j.lanwpc.2025.101480},
url = {https://www.ncbi.nlm.nih.gov/pubmed/39926367},
year = {2025},
type = {Journal Article}
}