Machine Learning Algorithms to Predict Anemia in Children Under the Age of Five Years in Afghanistan: A Case of Kunduz Province
Main Article Content
Abstract
Anemia has become an increasingly common problem, especially in developing and underdeveloped countries. Therefore, an ability to predict the anemia is beneficial and a good preventive measure. It is also a good indicator of future health risks of that infant. This study is concerned about the implementation of predictive anemia models for Afghanistan based on the data obtained from the hospitals of Kunduz province. The main objective of the study is to identify the most suitable machine learning techniques (i.e., classifiers) among the five popular ones. These are K-Nearest Neighbor (K-NN), Naïve Bayes, Multi-Layer Perceptron (MLP), Random Forest, and Support Vector Machine (SVM). Prior to implementing the predictive models, data preprocessing is carried out. This is done by means of data cleansing and feature selection. The well-known Correlation based Feature Selection algorithm (CFS) is employed to select the top fifteen attributes. The classifier in this study comprises two categories, Anemic and Non-Anemic. The preparation of the dataset is carefully done to ensure well-balanced samples in each category. The study reveals that Random Forest is the best classifier with an accuracy of 86.4% and with the Area Under the Curve (AUC) of 88.2%, respectively. The study has a direct benefit to the health and prevention policy making in Afghanistan.