Steinert, Steffen; Ruf, Verena; Dzsotjan, David; Großmann, Nicolas; Schmidt, Albrecht; Kuhn, Jochen; Küchemann, Stefan; Veerappampalayam Easwaramoorthy, Sathishkumar (2024): A refined approach for evaluating small datasets via binary classification using machine learning. PLOS ONE, 19 (5): e0301276. ISSN 1932-6203
240709_journal.pone.0301276.pdf
Die Publikation ist unter der Lizenz Creative Commons Namensnennung (CC BY) verfügbar.
Herunterladen (1MB)
Abstract
Classical statistical analysis of data can be complemented or replaced with data analysis based on machine learning. However, in certain disciplines, such as education research, studies are frequently limited to small datasets, which raises several questions regarding biases and coincidentally positive results. In this study, we present a refined approach for evaluating the performance of a binary classification based on machine learning for small datasets. The approach includes a non-parametric permutation test as a method to quantify the probability of the results generalising to new data. Furthermore, we found that a repeated nested cross-validation is almost free of biases and yields reliable results that are only slightly dependent on chance. Considering the advantages of several evaluation metrics, we suggest a combination of more than one metric to train and evaluate machine learning classifiers. In the specific case that both classes are equally important, the Matthews correlation coefficient exhibits the lowest bias and chance for coincidentally good results. The results indicate that it is essential to avoid several biases when analysing small datasets using machine learning.
Dokumententyp: | Artikel (LMU) |
---|---|
Organisationseinheit (Fakultäten): | 17 Physik |
DFG-Fachsystematik der Wissenschaftsbereiche: | Naturwissenschaften |
Veröffentlichungsdatum: | 05. Aug 2024 11:45 |
Letzte Änderung: | 05. Aug 2024 11:45 |
URI: | https://oa-fund.ub.uni-muenchen.de/id/eprint/1369 |
DFG: | Gefördert durch die Deutsche Forschungsgemeinschaft (DFG) - 491502892 |