ORIGINAL
Introdução: A predição acurada da mortalidade pós-AVC orienta condutas clínicas e alocação de recursos. Modelos de aprendizado de máquina (ML) podem aprimorar o prognóstico usando dados clínicos rotineiros da admissão. Objetivo: Comparar o desempenho de diferentes algoritmos de ML, avaliar discriminação e calibração dos modelos, e determinar a importância das principais variáveis na previsão da mortalidade a seis meses após AVCi. Métodos: Analisaram-se 12.936 pacientes do International Stroke Trial (IST). Dezenove variáveis basais (demografia, consciência, subtipo de AVC, déficits neurológicos, fibrilação atrial e tratamentos precoces) foram incluídas. Sete algoritmos supervisionados — regressão logística, SGD, SVM, MLP, random forest, gradient boosting e XGBoost — foram treinados com e sem reamostragem (SMOTE, ADASYN, Tomek Links, ENN). Validação cruzada aninhada garantiu avaliação imparcial. Resultados: Regressão logística e SGD apresentaram melhor desempenho (AUC-ROC ≈0,76-0,77), com valor preditivo negativo >92%. Modelos complexos não mostraram vantagem consistente. A análise SHAP confirmou idade, consciência e subtipo de AVC como principais preditores. Conclusão: Modelos lineares parcimoniosos oferecem predição robusta e interpretável da mortalidade pós-AVC, com potencial apoio clínico.
Introduction: Accurate prediction of post-stroke mortality is essential for guiding treatment and resource allocation. Machine learning approaches may improve prognostic accuracy using routine admission data. Objective: To compare the performance of different ML algorithms, evaluate model discrimination and calibration, and assess the prognostic importance of key baseline variables in predicting six-month mortality after ischemic stroke. Methods: We analyzed 12,936 patients from the International Stroke Trial using nineteen baseline clinical variables. Seven supervised ML algorithms—logistic regression, SGD, SVM, MLP, random forest, gradient boosting, and XGBoost—were trained with and without resampling techniques (SMOTE, ADASYN, Tomek Links, ENN, hybrids). Nested cross-validation ensured unbiased evaluation using AUC-ROC, AUC-PR, calibration, and clinical metrics. Results: Logistic regression and SGD achieved the best performance (AUC-ROC ≈0.76-0.77) with strong calibration and negative predictive value >92%. Complex models showed no systematic advantage. Resampling strategies did not improve performance. SHAP analysis confirmed age, consciousness, and stroke subtype as dominant predictors. Conclusion: Parsimonious linear models provide robust and interpretable prediction of six-month mortality after ischemic stroke, supporting clinical decision-making. External validation and recalibration in contemporary cohorts remain essential.
1. Feigin VL, Abate MD, Abate YH, et al. Global, regional, and national burden of stroke and its risk factors, 1990-2021: a systematic analysis for the Global Burden of Disease Study 2021. Lancet Neurol. 2024;23(10):973-1003. https://doi.org/10.1016/S1474-4422(24)00369-7. PMid:39304265.
2. INTERNATIONAL STROKE TRIAL COLLABORATIVE GROUP.
The International Stroke Trial (IST): a randomised trial of aspirin, subcutaneous heparin, both, or neither among 19435 patients with acute ischaemic stroke. Lancet. 1997;349(9065):1569-81. https:// doi.org/10.1016/S0140-6736(97)04011-7. PMid:9174558.
3. Guo K, Zhu B, Zha L, et al. Interpretable prediction of stroke prognosis: SHAP for SVM and nomogram for logistic regression. Front Neurol. 2025;16:1522868. https://doi.org/10.3389/fneur.2025.1522868. PMid:40103937.
4. Hwangbo L, Kang YJ, Kwon H, et al. Stacking ensemble learning model to predict 6-month mortality in ischemic stroke patients. Sci Rep. 2022;12(1):17389. https://doi.org/10.1038/s41598-022-22323-9. PMid:36253488.
5. Goh B, Bhaskar SMM. Evaluating machine learning models for stroke prognosis and prediction in atrial fibrillation patients: a comprehensive meta-analysis. Diagnostics. 2024;14(21):2391. https://doi.org/10.3390/ diagnostics14212391. PMid:39518359.
6. Berkson J. Application of the logistic function to bio-assay. J Am Stat Assoc. 1944;39(227):357-65.
7. Breiman L. Random forests. Mach Learn. 2001;45(1):5-32. https:// doi.org/10.1023/A:1010933404324.
8. Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273-97. https://doi.org/10.1023/A:1022627411411.
9. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016 Aug 13-17; San Francisco, CA. New York: ACM; 2016. p. 785-94.
10. Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29(5):1189-232. https://doi.org/10.1214/ aos/1013203451.
11. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nature. 1986;323(6088):533-6. https:// doi.org/10.1038/323533a0.
12. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825-30.
13. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321-57. https://doi.org/10.1613/jair.953.
14. He H, Bai Y, Garcia EA, Li S. ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN); 2008 Jun 1-8; Hong Kong, China. Piscataway: IEEE; 2008. p. 1322-8.
15. Tomek I. Two modifications of CNN. IEEE Trans Syst Man Cybern. 1976;6(11):769-72.
16. Wilson DL. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern. 1972;2(3):408-21. https:// doi.org/10.1109/TSMC.1972.4309137.
17. Batista GE, Prati RC, Monard MC. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. 2004;6(1):20-9. https://doi.org/10.1145/1007730.1007735.
18. Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017;18(1):559-63.
19. Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):325. https://doi.org/10.1002/1097-0142(1950)3:1<32::AIDCNCR2820030106>3.0.CO;2-3. PMid:15405679.
20. Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4765-74.
21. Saposnik G, Kapral MK, Liu Y, et al. IScore: a risk score to predict death early after hospitalization for an acute ischemic stroke. Circulation. 2011;123(7):739-49. https://doi.org/10.1161/ CIRCULATIONAHA.110.983353. PMid:21300951.
22. O’Donnell MJ, Fang J, D’Uva C, et al. The PLAN score: a bedside prediction rule for death and severe disability following acute ischemic stroke. Arch Intern Med. 2012;172(20):1548-56. https:// doi.org/10.1001/2013.jamainternmed.30. PMid:23147454.
23. Ntaios G, Faouzi M, Ferrari J, Lang W, Vemmos K, Michel P. An integer-based score to predict functional outcome in acute ischemic stroke: the ASTRAL score. Neurology. 2012;78(24):1916-22. https:// doi.org/10.1212/WNL.0b013e318259e221. PMid:22649218.
24. Cooray C, Mazya M, Bottai M, et al. External validation of the ASTRAL and DRAGON scores for prediction of functional outcome in stroke. Stroke. 2016;47(6):1493-9. https://doi.org/10.1161/ STROKEAHA.116.012802. PMid:27174528.
25. Lesenne A, Grieten J, Ernon L, et al. Prediction of functional outcome after acute ischemic stroke: comparison of the CT-DRAGON score and a reduced features set. Front Neurol. 2020;11:718. https:// doi.org/10.3389/fneur.2020.00718. PMid:32849196.
26. Papavasileiou V, Milionis H, Michel P, et al. ASTRAL score predicts 5-year dependence and mortality in acute ischemic stroke. Stroke. 2013;44(6):1616-20. https://doi.org/10.1161/STROKEAHA.113.001047. PMid:23559264.
27. Yang Y, Tang L, Deng Y, et al. The predictive performance of artificial intelligence on the outcome of stroke: a systematic review and metaanalysis. Front Neurosci. 2023;17:1256592. https://doi.org/10.3389/ fnins.2023.1256592. PMid:37746141.
28. Irie F, Matsumoto K, Matsuo R, et al. Predictive performance of machine learning–based models for poststroke clinical outcomes in comparison with conventional prognostic scores: multicenter, hospital-based observational study. JMIR AI. 2024;3:e46840. https:// doi.org/10.2196/46840. PMid:38875590.
29. Klug J, Leclerc G, Dirren E, Carrera E. Machine learning for early dynamic prediction of functional outcome after stroke. Commun Med. 2024;4(1):232. https://doi.org/10.1038/s43856-024-00666-w. PMid:39537988.
30. Wang WY, Sang WW, Jin D, et al. The prognostic value of the iScore, the PLAN score, and the ASTRAL score in acute ischemic stroke. J Stroke Cerebrovasc Dis. 2017;26(6):1233-8. https:// doi.org/10.1016/j.jstrokecerebrovasdis.2017.01.013. PMid:28236594.
1Faculty of Medicine, Universidade Federal do Triângulo Mineiro – UFTM, Uberaba, MG, Brazil.
2Center for Mathematics, Computing and Cognition – CMCC, Universidade Federal do ABC – UFABC, Santo André, SP, Brazil.
3Discipline of Neurosurgery, Hospital das Clínicas, Universidade Federal do Triângulo Mineiro – UFTM, Uberaba, MG, Brazil.
4Neurosurgery Division, Universidade Federal do Sergipe – UFS, Aracaju, SE, Brazil.
Received Sep 25, 2025
Accepted Sep 29, 2025