Evaluating LLM-Generated Q&A Test: A Student-Centered Study

  • Anna Wróblewska
  • , Bartosz Grabek
  • , Jakub Świstak
  • , Daniel Dan

Research output: Contribution to conferenceOtherResearchpeer-review

Abstract

This research prepares an automatic pipeline for generating reliable question-answer (Q&A) tests using AI chatbots. We automatically generated a GPT-4o–based Q&A test for a Natural Language Processing course and evaluated its psychometric and perceived-quality metrics with students and experts. A mixed-format IRT analysis showed that the generated items exhibit strong discrimination and appropriate difficulty, while student and expert star-ratings reflect high overall quality. A uniform DIF check identified two items for review. These findings demonstrate that LLM-generated assessments can match human-authored tests in psychometric performance and user satisfaction, illustrating a scalable approach to AI-assisted assessment development.
Original languageEnglish
Pages277
Number of pages289
Publication statusPublished - 15 Jul 2025
EventAIED 26th International Conference on Artificial intelligence in Education - University of Palermo, Palermo, Italy
Duration: 22 Jul 202526 Jul 2025
Conference number: 26
https://aied2025.itd.cnr.it

Conference

ConferenceAIED 26th International Conference on Artificial intelligence in Education
Abbreviated titleAIED
Country/TerritoryItaly
CityPalermo
Period22/07/202526/07/2025
Internet address

Fingerprint

Dive into the research topics of 'Evaluating LLM-Generated Q&A Test: A Student-Centered Study'. Together they form a unique fingerprint.

Cite this