Publication Date
In 2025 | 6 |
Since 2024 | 14 |
Since 2021 (last 5 years) | 35 |
Since 2016 (last 10 years) | 61 |
Since 2006 (last 20 years) | 86 |
Descriptor
Computer Software | 107 |
Test Items | 107 |
Item Analysis | 41 |
Foreign Countries | 37 |
Difficulty Level | 30 |
Item Response Theory | 30 |
Comparative Analysis | 26 |
Computer Assisted Testing | 26 |
Accuracy | 22 |
Test Construction | 22 |
Models | 21 |
More ▼ |
Source
Author
Wang, Wen-Chung | 5 |
Jin, Kuan-Yu | 4 |
Deane, Paul | 2 |
Futagi, Yoko | 2 |
Higgins, Derrick | 2 |
Luo, Yong | 2 |
Ackerman, Terry A. | 1 |
Adeleke, A. A. | 1 |
Ahmed Al - Badri | 1 |
Ahmed, Tamim | 1 |
Akbari, Alireza | 1 |
More ▼ |
Publication Type
Reports - Research | 107 |
Journal Articles | 82 |
Speeches/Meeting Papers | 17 |
Tests/Questionnaires | 4 |
Numerical/Quantitative Data | 1 |
Education Level
Audience
Researchers | 6 |
Location
Japan | 4 |
Germany | 3 |
Indonesia | 3 |
South Africa | 3 |
Canada | 2 |
China | 2 |
Nigeria | 2 |
Saudi Arabia | 2 |
Turkey | 2 |
Africa | 1 |
Ghana | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Kuan-Yu Jin; Wai-Lok Siu – Journal of Educational Measurement, 2025
Educational tests often have a cluster of items linked by a common stimulus ("testlet"). In such a design, the dependencies caused between items are called "testlet effects." In particular, the directional testlet effect (DTE) refers to a recursive influence whereby responses to earlier items can positively or negatively affect…
Descriptors: Models, Test Items, Educational Assessment, Scores
Lei Guo; Wenjie Zhou; Xiao Li – Journal of Educational and Behavioral Statistics, 2024
The testlet design is very popular in educational and psychological assessments. This article proposes a new cognitive diagnosis model, the multiple-choice cognitive diagnostic testlet (MC-CDT) model for tests using testlets consisting of MC items. The MC-CDT model uses the original examinees' responses to MC items instead of dichotomously scored…
Descriptors: Multiple Choice Tests, Diagnostic Tests, Accuracy, Computer Software
Ö. Emre C. Alagöz; Thorsten Meiser – Educational and Psychological Measurement, 2024
To improve the validity of self-report measures, researchers should control for response style (RS) effects, which can be achieved with IRTree models. A traditional IRTree model considers a response as a combination of distinct decision-making processes, where the substantive trait affects the decision on response direction, while decisions about…
Descriptors: Item Response Theory, Validity, Self Evaluation (Individuals), Decision Making
Hongfei Ye; Jian Xu; Danqing Huang; Meng Xie; Jinming Guo; Junrui Yang; Haiwei Bao; Mingzhi Zhang; Ce Zheng – Discover Education, 2025
This study evaluates Large language models (LLMs)' performance on Chinese Postgraduate Medical Entrance Examination (CPGMEE) as well as the hallucinations produced by LLMs and investigate their implications for medical education. We curated 10 trials of mock CPGMEE to evaluate the performances of 4 LLMs (GPT-4.0, ChatGPT, QWen 2.1 and Ernie 4.0).…
Descriptors: College Entrance Examinations, Foreign Countries, Computational Linguistics, Graduate Medical Education
Ally, Said – International Journal of Education and Development using Information and Communication Technology, 2022
Moodle software has become the heart of teaching and learning services in education. The software is viewed as a trusted modern platform for transforming learning and teaching modes from conventional face-to-face to fully online classes. However, its use for online examination is very limited despite having a state-of-the-art Quiz Module with…
Descriptors: Integrated Learning Systems, Computer Assisted Testing, Information Security, Evaluation Methods
Musa Adekunle Ayanwale; Mdutshekelwa Ndlovu – Journal of Pedagogical Research, 2024
The COVID-19 pandemic has had a significant impact on high-stakes testing, including the national benchmark tests in South Africa. Current linear testing formats have been criticized for their limitations, leading to a shift towards Computerized Adaptive Testing [CAT]. Assessments with CAT are more precise and take less time. Evaluation of CAT…
Descriptors: Adaptive Testing, Benchmarking, National Competency Tests, Computer Assisted Testing
An Analysis of Differential Bundle Functioning in Multidimensional Tests Using the SIBTEST Procedure
Özdogan, Didem; Kelecioglu, Hülya – International Journal of Assessment Tools in Education, 2022
This study aims to analyze the differential bundle functioning in multidimensional tests with a specific purpose to detect this effect through differentiating the location of the item with DIF in the test, the correlation between the dimensions, the sample size, and the ratio of reference to focal group size. The first 10 items of the test that is…
Descriptors: Correlation, Sample Size, Test Items, Item Analysis
Owen Henkel; Hannah Horne-Robinson; Maria Dyshel; Greg Thompson; Ralph Abboud; Nabil Al Nahin Ch; Baptiste Moreau-Pernet; Kirk Vanacore – Journal of Learning Analytics, 2025
This paper introduces AMMORE, a new dataset of 53,000 math open-response question-answer pairs from Rori, a mathematics learning platform used by middle and high school students in several African countries. Using this dataset, we conducted two experiments to evaluate the use of large language models (LLM) for grading particularly challenging…
Descriptors: Learning Analytics, Learning Management Systems, Mathematics Instruction, Middle School Students
Kunal Sareen – Innovations in Education and Teaching International, 2024
This study examines the proficiency of Chat GPT, an AI language model, in answering questions on the Situational Judgement Test (SJT), a widely used assessment tool for evaluating the fundamental competencies of medical graduates in the UK. A total of 252 SJT questions from the "Oxford Assess and Progress: Situational Judgement" Test…
Descriptors: Ethics, Decision Making, Artificial Intelligence, Computer Software
Kyung-Mi O. – Language Testing in Asia, 2024
This study examines the efficacy of artificial intelligence (AI) in creating parallel test items compared to human-made ones. Two test forms were developed: one consisting of 20 existing human-made items and another with 20 new items generated with ChatGPT assistance. Expert reviews confirmed the content parallelism of the two test forms.…
Descriptors: Comparative Analysis, Artificial Intelligence, Computer Software, Test Items
Roger Young; Emily Courtney; Alexander Kah; Mariah Wilkerson; Yi-Hsin Chen – Teaching of Psychology, 2025
Background: Multiple-choice item (MCI) assessments are burdensome for instructors to develop. Artificial intelligence (AI, e.g., ChatGPT) can streamline the process without sacrificing quality. The quality of AI-generated MCIs and human experts is comparable. However, whether the quality of AI-generated MCIs is equally good across various domain-…
Descriptors: Item Response Theory, Multiple Choice Tests, Psychology, Textbooks
Marli Crabtree; Kenneth L. Thompson; Ellen M. Robertson – HAPS Educator, 2024
Research has suggested that changing one's answer on multiple-choice examinations is more likely to lead to positive academic outcomes. This study aimed to further understand the relationship between changing answer selections and item attributes, student performance, and time within a population of 158 first-year medical students enrolled in a…
Descriptors: Anatomy, Science Tests, Medical Students, Medical Education
Mimi Ismail; Ahmed Al - Badri; Said Al - Senaidi – Journal of Education and e-Learning Research, 2025
This study aimed to reveal the differences in individuals' abilities, their standard errors, and the psychometric properties of the test according to the two methods of applying the test (electronic and paper). The descriptive approach was used to achieve the study's objectives. The study sample consisted of 74 male and female students at the…
Descriptors: Achievement Tests, Computer Assisted Testing, Psychometrics, Item Response Theory
Zhang, Mengxue; Heffernan, Neil; Lan, Andrew – International Educational Data Mining Society, 2023
Automated scoring of student responses to open-ended questions, including short-answer questions, has great potential to scale to a large number of responses. Recent approaches for automated scoring rely on supervised learning, i.e., training classifiers or fine-tuning language models on a small number of responses with human-provided score…
Descriptors: Scoring, Computer Assisted Testing, Mathematics Instruction, Mathematics Tests
Wijanarko, Bambang Dwi; Heryadi, Yaya; Toba, Hapnes; Budiharto, Widodo – Education and Information Technologies, 2021
Automated question generation is a task to generate questions from structured or unstructured data. The increasing popularity of online learning in recent years has given momentum to automated question generation in education field for facilitating learning process, learning material retrieval, and computer-based testing. This paper report on the…
Descriptors: Foreign Countries, Undergraduate Students, Engineering Education, Computer Software