Abstract
This research prepares an automatic pipeline for generating reliable question-answer (Q&A) tests using AI chatbots. We automatically generated a GPT-4o–based Q&A test for a Natural Language Processing course and evaluated its psychometric and perceived-quality metrics with students and experts. A mixed-format IRT analysis showed that the generated items exhibit strong discrimination and appropriate difficulty, while student and expert star-ratings reflect high overall quality. A uniform DIF check identified two items for review. These findings demonstrate that LLM-generated assessments can match human-authored tests in psychometric performance and user satisfaction, illustrating a scalable approach to AI-assisted assessment development.
| Original language | English |
|---|---|
| Pages | 277 |
| Number of pages | 289 |
| Publication status | Published - 15 Jul 2025 |
| Event | AIED 26th International Conference on Artificial intelligence in Education - University of Palermo, Palermo, Italy Duration: 22 Jul 2025 → 26 Jul 2025 Conference number: 26 https://aied2025.itd.cnr.it |
Conference
| Conference | AIED 26th International Conference on Artificial intelligence in Education |
|---|---|
| Abbreviated title | AIED |
| Country/Territory | Italy |
| City | Palermo |
| Period | 22/07/2025 → 26/07/2025 |
| Internet address |