Abstract
The digital transformation of the Public Administration is imperative so that the necessary mechanisms can be developed to reduce the number of years of life lost and to increase the quality of life of the population. The main objective of this project is to leverage existing information in public administration databases and others in order to support decision-makers regarding the best response to emerging diseases, better adaptation of public health intervention programs and improve the capacity of the health systems in the future.
The data proposed for this work will provide important information in terms of morbidity, sociodemographic and socioeconomic context of the entire Portuguese population, and are available mainly in the following databases: 1 Hospital Morbidity Database (BDMH); 2) Information System of Death Certificates (SICO); 3) National Health Service Information and Monitoring System (SIM@SNS) and Monitoring System of Regional Health Administrations (SIARS), which includes records of follow-up of children in Primary Health Care. In addition, several other important sources of data will be consulted.
INSTITUTIONS
Main Contractor: Instituto de Engenharia Mecânica (IDMEC)
Participating Institutions: Faculdade de Medicina da Universidade de Lisboa (FM/ULisboa); Faculdade de Ciências Médicas (FCM/UNL); Direcção-Geral da Saúde (DGS)
Main Research Unit: Laboratório Associado de Energia, Transportes e Aeronáutica (LAETA)
TEAM
Principal Investigator: Susana Margarida da Silva Vieira
Researchers: Maria Cristina de Brito Eusébio Bárbara Prista Caetano; Fernando Miguel Teixeira Xavier; Joaquim Paul Laurens Viegas
Co-investigador Responsável: João Miguel Costa Sousa
PROJECT SUMMARY
As the first relatively extensive study on the determinants of infant and youth mortality in Portugal making use of Machine Learning techniques, this work contributes to the development of this research area in multiple ways. First, the exploratory data analysis and data preprocessing steps were important on their own, as they led to important insights, even though their main purpose was to prepare the data for the subsequent modelling steps. Additionally, by using sophisticated feature selection methods, important determinants with a highly nonlinear relation to infant and youth mortality could be discerned in a way that classical statistical analysis simply would not allow. Furthermore, the prediction model is capable of making new predictions online, which in turn can be used to simulate scenarios, and analyze their eventual impact on infant and youth mortality. This will allow decision makers (Public Health professionals, for instance) to study the effects of each and every variable (and combination of variables) on infant and youth mortality.
To the best of our knowledge, no tool of this kind (incorporating a Machine Learning model) had been conceived prior to this project.
The clustering analysis performed for this work is also a valuable contribution, as, despite being a fairly common unsupervised learning technique, this was the first time (to the authors’ best knowledge) that an approach of this sort was applied to mortality data concerning Portugal. The clustering approach whereby municipalities are grouped together according not only to the mortality variable being considered, but to a given ”mortality determinant” as well is particularly noteworthy in this regard. Finally, this success in attaining its goals paves the way for future applications of Machine Learning algorithms in infant and youth mortality studies, particularly those concerning Portugal.
METHODOLOGY
A brief description of the methodology is included here to frame the outputs of the project. This section provides a brief overview of the methodology used in this work. Figure 1 illustrates the general workflow followed by the authors.
The available data comprised 178 databases sourced from various authorities, including the Directorate-General of Health, Statistics Portugal, the World Health Organization, PORDATA, the Portuguese Environment Agency, OpenRouteService, the Childhood Obesity Surveillance Initiative, and the Health Behaviour in School-Aged Children. Two of these databases, belonging to the Directorate-General of Health, are restricted and not accessible to the general public.
This study systematically categorizes databases into three primary types: External data, Mortality data, and Auxiliary data. External data represents factors indirectly related to mortality. These data incorporates six areas — Economics, Healthcare, Society, Demographics, Education, and Environment — captured over a six-year period (2014-2019). These variables serve as the potential determinants whose impact on mortality the study aims to investigate.
Regarding the external dataset, the predominant challenges stemmed from a substantial amount of missing values and disparities in the sampling frequency of the data, including a mix of monthly and annual measures. To address the issue of missing values, a hybrid approach involving both feature elimination and data imputation was employed. Features with more than 75% of their values missing were discarded. For the remaining features with missing values, a data imputation approach based on K-Nearest Neighbours (with K=5) was adopted. Using this hybrid approach, 72 features were rejected.
Special Issue Edition
- Published – S. M. Vieira. And João M. Sousa. Special issue “Computational Intelligence in Health Care”. Mathematical Biosciences and Engineering, AIMS Press. 2023. https://www.aimspress.com/mbe/article/6355/special-articles
Journal Papers
- Submitted - Beatriz P. Lourenço, Aldo Arévalo, Miguel Santos Loureiro, Susana M. Vieira. Transformer and RNN-based Approach to Mortality Rate Forecasting with Interpretable Predictions: A study on the Portuguese Population. Submitted to Expert Systems With Applications, February 2024
- Revised and re-submitted - Filipe André Gonzalez, Tomás Lamas, Pedro Costa, Susana M Vieira. Is Artificial Intelligence prepared for the ICU 24h-shifts?. Anaesthesia Critical Care & Pain Medicine, Elsevier, re-submitted May 2024.
- Re-Submitted - Filipe Santos, Ricardo Magalhães, Rodrigo Ventura, Cristina Barbara, Miguel Xavier, Maria Isabel Alves, Matilde Valente Rosa, Cátia Salgado, João M. Sousa, Susana M. Vieira. Systematic Review on European Youth Mortality and its Socioeconomic Determinants and Risk Factors. Public Health, December 2023.
- Re-Submitted - Rodrigo S.B. Ventura, Filipe M.P. Santos, Susana M. Vieira, José Valente de Oliveira, João M.C. Sousa. Novel ALMMo-0 Classifiers for Imbalanced Datasets. Expert Systems With Applications, December 2023.
- Revised and Re-Submitted - Susana M. Vieira, Pedro Rodrigues, Aldo Arévalo, Catia Salgado, João M. C. Sousa. “A data fusion approach for predicting in-hospital mortality of acute kidney injury patients in the intensive care unit.” Submitted to Algorithms, MDPI, Submitted April 2024.
- Revised and Re-Submitted - Bernardo Firme, Aldo Arevalo, Susana Vieira and Joao MC Sousa. Individualized Clinical Dashboards for Decision Making in Mortality Prediction of Critically-ill Patients Under Insulin Therapy, Mathematical Biosciences and Engineering, AIMS, March 2024.
- Submitted - Duarte Rolim, Susana Vieira. “Multi/Many-Objective Optimization in Feature Selection”. Algorithms, MDPI. Submitted April 2024.
- Submitted - Susana M. Vieira, Ricardo Maia, Cátia Salgado. “Performance measures for imbalanced datasets: which is best for classification problems?”. Submitted to Applied Sciences, MDPI. Abril 2024
- Published – S Mantena, AR Arévalo, JH Maley, SM da Silva Vieira, R Mateo-Collado. Predicting hypoglycemia in critically Ill patients using machine learning and electronic health records. Journal of Clinical Monitoring and Computing 2022, 36 (5), 1297-1303. https://doi.org/10.1007/s10877-021-00760-7
- Published – Sousa, J.M.C.; Luís, R.; Santos, R.M.; Mendonça, L.; Vieira, S.M. Fuzzy Multi-Item Newsvendor Problem: An Application to Inventory Management. Mathematics 2024, 12, 1652. https://doi.org/10.3390/math12111652.
Communications in international scientific meetings
- PAPER: Accepted to be presented at WCCI 2024 - Beatriz P. Lourenco, Miguel Santos Loureiro, Filipe Santos, Rodrigo Ventura, Ricardo Magalhaes, Matilde Valente Rosa, Vera Dantas, Cristina Barbara, Joao M. C. Sousa, Susana M. S. Vieira. Identifying the Determinants of Infant and Youth Mortality in Portugal: a Machine Learning approach. IJCNN 2024, July 2024.
- PAPER: Accepted to be presented at WCCI 2024 – Andre Seabra, Rodrigo Ventura, Rui Almeida, Susana M. S. Vieira, Joao M. C. Sousa. Applications Of Autonomous Learning Multi Model System To Multiclass Imbalanced Datasets. FUZZ-IEEE 2024, July 2024.
- PAPER: Published - Ventura, R.B.; Santos, F.M.; Magalhães, R.M.; Salgado, C.M.; Dantas, V.; Rosa, M.V.; Sousa, J.M.C.; Vieira, S.M. Forecasting Neonatal Mortality in Portugal. Eng. Proc. 2023, 39, 89. https://doi.org/10.3390/engproc2023039089
- PAPER: Published – F. Santos, R. Ventura, J. M. C. Sousa and S. M. Vieira, “First-Order Autonomous Learning Multi-Model Systems for Multiclass Classification tasks,” 2022 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Padua, Italy, 2022, pp. 1-6, doi: 10.1109/FUZZ-IEEE55066.2022.9882593
- PAPER: Published – F. Santos, J. M. C. Sousa and S. M. Vieira, “A new approach to ALMMo-0 Classifiers: A trade-off between accuracy and complexity,” 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Luxembourg, Luxembourg, 2021, pp. 1-6, doi: 10.1109/FUZZ45933.2021.9494579.
- INVITED SPEAKER: at the ”13th International Conference on Fuzzy Computation Theory and Applications (FCTA 2021)”, with the talk ”Fuzzy Systems in Health Care” (https://www.youtube.com/watch?v=GbQRBZfXVzs)
- INVITED SPEAKER: at the Eindhoven University of Technology Workshop on “Automated Decision Making- AI Systems and individual rights” on the 16th July 2021 with the talk “Transparency and Reproducibility in Artificial Intelligence in Health Care Research”
- INVITED SPEAKER: at the ESICM LIVES 40 GE sponsored Clinical Symposium on “Artificial Intelligence: New Opportunities for a Better Healthcare” on May 13, 2022 (Programme (y-congress.com))
Communications in national scientific meetings
- INVITED SPEAKER: at the “XXIII Congresso Nacional de Medicina Intensiva”, organized by the Sociedade Portuguesa de Cuidados Intensivos, where the project PI was an invited speaker on the 22nd January 2021 https://www.spci.pt/xxiii-congresso-medicina-intensiva),
- INVITED SPEAKER: at the “Congresso APIH 2023”, organized by the Sociedade Portuguesa de Infecção Hospitalar (APIH), where the project PI was an invited speaker on “Inteligência Artificial com potencial em prevenção e controlo de infeções e de resistência aos antimicrobianos”, 24th October 2023
- ROUND TABLE: in “Artificial Intelligence” at “Encontro Renal 2023”, organized by Sociedade Protuguesa de Nefrologia (SPN), 16th November 2023.
PhD Thesis
- Defended – Aldo Arévolo, Supervisor João Miguel da Costa Sousa, Co-Supervisor Susana Vieira, Co-Supervisor Stan Finkelstein, PhD degree in Bioengineering , ULisbon, IST, thesis: Data Based Modeling for Supporting Clinical Decision Making, concluded in December 16th 2020.
- Defended – André Silva, Supervisor João Miguel da Costa Sousa, Supervisor Susana Vieira, Co-supervisor Stan Finkelstein, PhD degree in Bioengineering , ULisbon, IST, thesis: Machine Learning Applications for Renal Critical Care, concluded in 1st of February 2023.
Master Thesis
- Defended – Beatriz Lourenço, Supervisor Prof. Susana Vieira, Co-supervised by Aldo Arévalo, Master degree in Data Science , U Lisbon, IST, thesis: Mortality Rate Prediction using Deep Learning, concluded in June 2023.
- Defended – Inês Garção, Supervisor Prof. Susana Vieira, Co-supervisor Doctor Cátia Salgado, thesis: Cosiner for circadian variation identification in heart rate among critically ill patients. Integrated Master degree in Mechanical Engineering, U Lisbon, IST, concluded in November 2022.
- Defended – Rodrigo Saragoça Boal Ventura, Supervisor Prof. Susana Vieira, thesis: Applications of Autonomous Learning Multi Model Systems to Binary Classification on Imbalanced Datasets. Integrated Master degree in Mechanical Engineering, U Lisbon, IST, concluded in December 2021.
- Defended – Filipe Pereira Santos, Supervisor Prof. João Sousa, thesis: Advancements in Autonomous Learning Multi-Model Systems. Integrated Master degree in Mechanical Engineering, U Lisbon, IST, concluded in January 2021.
Workshop
AI4Life Dashboard Hands-on, 7th of November 2023, at DGS.