Publication Date
In 2025 | 5 |
Since 2024 | 10 |
Since 2021 (last 5 years) | 29 |
Since 2016 (last 10 years) | 46 |
Since 2006 (last 20 years) | 59 |
Descriptor
Source
Author
Bercovitz, Elizabeth | 2 |
Brandt, Rusty | 2 |
Papageorgiou, Spiros | 2 |
Zimmerman, Linda | 2 |
Adeleke, A. A. | 1 |
Ahmed Al - Badri | 1 |
Ahmed Yaqinuddin | 1 |
Aiken, Lewis R. | 1 |
Alammary, Ali | 1 |
Alderson, J. Charles | 1 |
Alexander Kah | 1 |
More ▼ |
Publication Type
Reports - Research | 75 |
Journal Articles | 58 |
Speeches/Meeting Papers | 9 |
Tests/Questionnaires | 8 |
Numerical/Quantitative Data | 3 |
Information Analyses | 1 |
Education Level
Audience
Researchers | 7 |
Policymakers | 1 |
Practitioners | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Lei Guo; Wenjie Zhou; Xiao Li – Journal of Educational and Behavioral Statistics, 2024
The testlet design is very popular in educational and psychological assessments. This article proposes a new cognitive diagnosis model, the multiple-choice cognitive diagnostic testlet (MC-CDT) model for tests using testlets consisting of MC items. The MC-CDT model uses the original examinees' responses to MC items instead of dichotomously scored…
Descriptors: Multiple Choice Tests, Diagnostic Tests, Accuracy, Computer Software
Claude, ChatGPT, Copilot, and Gemini Performance versus Students in Different Topics of Neuroscience
Volodymyr Mavrych; Ahmed Yaqinuddin; Olena Bolgova – Advances in Physiology Education, 2025
Despite extensive studies on large language models and their capability to respond to questions from various licensed exams, there has been limited focus on employing chatbots for specific subjects within the medical curriculum, specifically medical neuroscience. This research compared the performances of Claude 3.5 Sonnet (Anthropic), GPT-3.5 and…
Descriptors: Artificial Intelligence, Computer Software, Neurosciences, Medical Education
Hongfei Ye; Jian Xu; Danqing Huang; Meng Xie; Jinming Guo; Junrui Yang; Haiwei Bao; Mingzhi Zhang; Ce Zheng – Discover Education, 2025
This study evaluates Large language models (LLMs)' performance on Chinese Postgraduate Medical Entrance Examination (CPGMEE) as well as the hallucinations produced by LLMs and investigate their implications for medical education. We curated 10 trials of mock CPGMEE to evaluate the performances of 4 LLMs (GPT-4.0, ChatGPT, QWen 2.1 and Ernie 4.0).…
Descriptors: College Entrance Examinations, Foreign Countries, Computational Linguistics, Graduate Medical Education
An Analysis of Differential Bundle Functioning in Multidimensional Tests Using the SIBTEST Procedure
Özdogan, Didem; Kelecioglu, Hülya – International Journal of Assessment Tools in Education, 2022
This study aims to analyze the differential bundle functioning in multidimensional tests with a specific purpose to detect this effect through differentiating the location of the item with DIF in the test, the correlation between the dimensions, the sample size, and the ratio of reference to focal group size. The first 10 items of the test that is…
Descriptors: Correlation, Sample Size, Test Items, Item Analysis
Melanie Ann Weber; Mia Anzilotti; Reece Gormley; Christina Huber; Alyssa McGarvey; Grace McKee; Claire Ogden; Hannah Seinfeld; Julia Wank; Arnold Olszewski – Perspectives of the ASHA Special Interest Groups, 2024
Purpose: Technology, including educational applications (apps), is commonly used in schools by teachers and speech-language pathologists. Nonetheless, very little research has examined the efficacy of these apps for student learning or how to choose appropriate apps for instruction. Several previous rubrics to evaluate the instructional quality of…
Descriptors: Computer Software, Handheld Devices, Educational Technology, Technology Uses in Education
Kunal Sareen – Innovations in Education and Teaching International, 2024
This study examines the proficiency of Chat GPT, an AI language model, in answering questions on the Situational Judgement Test (SJT), a widely used assessment tool for evaluating the fundamental competencies of medical graduates in the UK. A total of 252 SJT questions from the "Oxford Assess and Progress: Situational Judgement" Test…
Descriptors: Ethics, Decision Making, Artificial Intelligence, Computer Software
Roger Young; Emily Courtney; Alexander Kah; Mariah Wilkerson; Yi-Hsin Chen – Teaching of Psychology, 2025
Background: Multiple-choice item (MCI) assessments are burdensome for instructors to develop. Artificial intelligence (AI, e.g., ChatGPT) can streamline the process without sacrificing quality. The quality of AI-generated MCIs and human experts is comparable. However, whether the quality of AI-generated MCIs is equally good across various domain-…
Descriptors: Item Response Theory, Multiple Choice Tests, Psychology, Textbooks
Marli Crabtree; Kenneth L. Thompson; Ellen M. Robertson – HAPS Educator, 2024
Research has suggested that changing one's answer on multiple-choice examinations is more likely to lead to positive academic outcomes. This study aimed to further understand the relationship between changing answer selections and item attributes, student performance, and time within a population of 158 first-year medical students enrolled in a…
Descriptors: Anatomy, Science Tests, Medical Students, Medical Education
Mimi Ismail; Ahmed Al - Badri; Said Al - Senaidi – Journal of Education and e-Learning Research, 2025
This study aimed to reveal the differences in individuals' abilities, their standard errors, and the psychometric properties of the test according to the two methods of applying the test (electronic and paper). The descriptive approach was used to achieve the study's objectives. The study sample consisted of 74 male and female students at the…
Descriptors: Achievement Tests, Computer Assisted Testing, Psychometrics, Item Response Theory
Zhang, Mengxue; Heffernan, Neil; Lan, Andrew – International Educational Data Mining Society, 2023
Automated scoring of student responses to open-ended questions, including short-answer questions, has great potential to scale to a large number of responses. Recent approaches for automated scoring rely on supervised learning, i.e., training classifiers or fine-tuning language models on a small number of responses with human-provided score…
Descriptors: Scoring, Computer Assisted Testing, Mathematics Instruction, Mathematics Tests
Nagy, Gabriel; Ulitzsch, Esther – Educational and Psychological Measurement, 2022
Disengaged item responses pose a threat to the validity of the results provided by large-scale assessments. Several procedures for identifying disengaged responses on the basis of observed response times have been suggested, and item response theory (IRT) models for response engagement have been proposed. We outline that response time-based…
Descriptors: Item Response Theory, Hierarchical Linear Modeling, Predictor Variables, Classification
Salem, Alexandra C.; Gale, Robert; Casilio, Marianne; Fleegle, Mikala; Fergadiotis, Gerasimos; Bedrick, Steven – Journal of Speech, Language, and Hearing Research, 2023
Purpose: ParAlg (Paraphasia Algorithms) is a software that automatically categorizes a person with aphasia's naming error (paraphasia) in relation to its intended target on a picture-naming test. These classifications (based on lexicality as well as semantic, phonological, and morphological similarity to the target) are important for…
Descriptors: Semantics, Computer Software, Aphasia, Classification
Emery-Wetherell, Meaghan; Wang, Ruoyao – Assessment & Evaluation in Higher Education, 2023
Over four semesters of a large introductory statistics course the authors found students were engaging in contract cheating on Chegg.com during multiple choice examinations. In this paper we describe our methodology for identifying, addressing and eventually eliminating cheating. We successfully identified 23 out of 25 students using a combination…
Descriptors: Computer Assisted Testing, Multiple Choice Tests, Cheating, Identification
Saatcioglu, Fatima Munevver; Atar, Hakan Yavuz – International Journal of Assessment Tools in Education, 2022
This study aims to examine the effects of mixture item response theory (IRT) models on item parameter estimation and classification accuracy under different conditions. The manipulated variables of the simulation study are set as mixture IRT models (Rasch, 2PL, 3PL); sample size (600, 1000); the number of items (10, 30); the number of latent…
Descriptors: Accuracy, Classification, Item Response Theory, Programming Languages
R. Freed; D. H. McKinnon; M. T. Fitzgerald; S. Salimpour – Physical Review Physics Education Research, 2023
This paper presents the results of a confirmatory factor analysis on two self-efficacy scales designed to probe the self-efficacy of college-level introductory astronomy (Astro-101) students (n ¼ 15181) from 22 institutions across the United States of America and Canada. The students undertook a course based on similar curriculum materials, which…
Descriptors: Self Efficacy, Science Instruction, Astronomy, Factor Analysis