Logo Logo

Kieser, Fabian; Wulff, Peter; Kuhn, Jochen; Küchemann, Stefan (2023): Educational data augmentation in physics education research using ChatGPT. Physical Review Physics Education Research, 19 (2). ISSN 2469-9896

[thumbnail of PhysRevPhysEducRes.19.020150.pdf] Published Article

The publication is available under the license Creative Commons Attribution.

Download (467kB)


Generative AI technologies such as large language models show novel potential to enhance educational research. For example, generative large language models were shown to be capable of solving quantitative reasoning tasks in physics and concept tests such as the Force Concept Inventory (FCI). Given the importance of such concept inventories for physics education research, and the challenges in developing them such as field testing with representative populations, this study seeks to examine to what extent a generative large language model could be utilized to generate a synthetic dataset for the FCI that exhibits content-related variability in responses. We use the recently introduced ChatGPT based on the GPT 4 generative large language model and investigate to what extent ChatGPT could solve the FCI accurately (RQ1) and could be prompted to solve the FCI as if it were a student belonging to a different cohort (RQ2). Furthermore, we study, to what extent ChatGPT could be prompted to solve the FCI as if it were a student having a different force- and mechanics-related preconception (RQ3). In alignment with other research, we found that ChatGPT could accurately solve the FCI. We furthermore found that prompting ChatGPT to respond to the inventory as if it belonged to a different cohort yielded no variance in responses, however, responding as if it had a certain preconception introduced much variance in responses that approximate real human responses on the FCI in some regards.

View Item
View Item