A refined approach for evaluating small datasets via binary classification using machine learning

www.lmu.de | UB | Blättern

Erweiterte Suche

Englisch

Erweiterte Suche

Steinert, Steffen; Ruf, Verena; Dzsotjan, David; Großmann, Nicolas; Schmidt, Albrecht; Kuhn, Jochen; Küchemann, Stefan; Veerappampalayam Easwaramoorthy, Sathishkumar (2024): A refined approach for evaluating small datasets via binary classification using machine learning. PLOS ONE, 19 (5): e0301276. ISSN 1932-6203

[thumbnail of 240709_journal.pone.0301276.pdf]

Creative Commons Namensnennung (CC BY)
Veröffentlichte Publikation
240709_journal.pone.0301276.pdf

DOI: 10.1371/journal.pone.0301276

Abstract

Classical statistical analysis of data can be complemented or replaced with data analysis based on machine learning. However, in certain disciplines, such as education research, studies are frequently limited to small datasets, which raises several questions regarding biases and coincidentally positive results. In this study, we present a refined approach for evaluating the performance of a binary classification based on machine learning for small datasets. The approach includes a non-parametric permutation test as a method to quantify the probability of the results generalising to new data. Furthermore, we found that a repeated nested cross-validation is almost free of biases and yields reliable results that are only slightly dependent on chance. Considering the advantages of several evaluation metrics, we suggest a combination of more than one metric to train and evaluate machine learning classifiers. In the specific case that both classes are equally important, the Matthews correlation coefficient exhibits the lowest bias and chance for coincidentally good results. The results indicate that it is essential to avoid several biases when analysing small datasets using machine learning.

Dokumententyp:	Artikel (LMU)
Organisationseinheit (Fakultäten):	17 Physik
DFG-Fachsystematik der Wissenschaftsbereiche:	Naturwissenschaften
Veröffentlichungsdatum:	05. Aug 2024 11:45
Letzte Änderung:	05. Aug 2024 11:45
URI:	https://oa-fund.ub.uni-muenchen.de/id/eprint/1369
DFG:	Gefördert durch die Deutsche Forschungsgemeinschaft (DFG) - 491502892

: Publikation bearbeiten