NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Researchers1
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 40 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025
Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…
Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy
Jing Ma – ProQuest LLC, 2024
This study investigated the impact of scoring polytomous items later on measurement precision, classification accuracy, and test security in mixed-format adaptive testing. Utilizing the shadow test approach, a simulation study was conducted across various test designs, lengths, number and location of polytomous item. Results showed that while…
Descriptors: Scoring, Adaptive Testing, Test Items, Classification
Peer reviewed Peer reviewed
Direct linkDirect link
Yang Du; Susu Zhang – Journal of Educational and Behavioral Statistics, 2025
Item compromise has long posed challenges in educational measurement, jeopardizing both test validity and test security of continuous tests. Detecting compromised items is therefore crucial to address this concern. The present literature on compromised item detection reveals two notable gaps: First, the majority of existing methods are based upon…
Descriptors: Item Response Theory, Item Analysis, Bayesian Statistics, Educational Assessment
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Demir, Seda – Journal of Educational Technology and Online Learning, 2022
The purpose of this research was to evaluate the effect of item pool and selection algorithms on computerized classification testing (CCT) performance in terms of some classification evaluation metrics. For this purpose, 1000 examinees' response patterns using the R package were generated and eight item pools with 150, 300, 450, and 600 items…
Descriptors: Test Items, Item Banks, Mathematics, Computer Assisted Testing
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Wang, Wei; Dorans, Neil J. – ETS Research Report Series, 2021
Agreement statistics and measures of prediction accuracy are often used to assess the quality of two measures of a construct. Agreement statistics are appropriate for measures that are supposed to be interchangeable, whereas prediction accuracy statistics are appropriate for situations where one variable is the target and the other variables are…
Descriptors: Classification, Scaling, Prediction, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip – Educational and Psychological Measurement, 2022
Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores and hence to incomplete data on mastery tests such as the AP and U.S. Medical Licensing examinations. Investigators are often interested in estimating the probabilities of passing of the examinees with incomplete data on mastery tests.…
Descriptors: Mastery Tests, Computer Assisted Testing, Probability, Test Wiseness
Peer reviewed Peer reviewed
Direct linkDirect link
Shukla, Vishakha; Long, Madeleine; Bhatia, Vrinda; Rubio-Fernandez, Paula – Journal of Experimental Psychology: Learning, Memory, and Cognition, 2022
While most research on scalar implicature has focused on the lexical scale "some" vs "all," here we investigated an understudied scale formed by two syntactic constructions: categorizations (e.g., "Wilma is a nurse") and comparisons ("Wilma is like a nurse"). An experimental study by Rubio-Fernandez et al.…
Descriptors: Cues, Pragmatics, Comparative Analysis, Syntax
Peer reviewed Peer reviewed
Direct linkDirect link
Kayla V. Campaña; Benjamin G. Solomon – Assessment for Effective Intervention, 2025
The purpose of this study was to compare the classification accuracy of data produced by the previous year's end-of-year New York state assessment, a computer-adaptive diagnostic assessment ("i-Ready"), and the gating combination of both assessments to predict the rate of students passing the following year's end-of-year state assessment…
Descriptors: Accuracy, Classification, Diagnostic Tests, Adaptive Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Carioti, Desiré; Stucchi, Natale Adolfo; Toneatto, Carlo; Masia, Marta Franca; Del Monte, Milena; Stefanelli, Silvia; Travellini, Simona; Marcelli, Antonella; Tettamanti, Marco; Vernice, Mirta; Guasti, Maria Teresa; Berlingeri, Manuela – Annals of Dyslexia, 2023
In this study, we validated the "ReadFree tool", a computerised battery of 12 visual and auditory tasks developed to identify poor readers also in minority-language children (MLC). We tested the task-specific discriminant power on 142 Italian-monolingual participants (8-13 years old) divided into monolingual poor readers (N = 37) and…
Descriptors: Language Minorities, Task Analysis, Italian, Monolingualism
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Fadillah, Sarah Meilani; Ha, Minsu; Nuraeni, Eni; Indriyanti, Nurma Yunita – Malaysian Journal of Learning and Instruction, 2023
Purpose: Researchers discovered that when students were given the opportunity to change their answers, a majority changed their responses from incorrect to correct, and this change often increased the overall test results. What prompts students to modify their answers? This study aims to examine the modification of scientific reasoning test, with…
Descriptors: Science Tests, Multiple Choice Tests, Test Items, Decision Making
Peer reviewed Peer reviewed
Direct linkDirect link
Chia-Hsuan Liao; Ellen Lau – Second Language Research, 2024
Event concepts of common verbs (e.g. "eat," "sleep") can be broadly shared across languages, but a given language's rules for subcategorization are largely arbitrary and vary substantially across languages. When subcategorization information does not match between first language (L1) and second language (L2), how does this…
Descriptors: Verbs, Brain Hemisphere Functions, Diagnostic Tests, English
Peer reviewed Peer reviewed
Direct linkDirect link
Gaillat, Thomas; Simpkin, Andrew; Ballier, Nicolas; Stearns, Bernardo; Sousa, Annanda; Bouyé, Manon; Zarrouk, Manel – ReCALL, 2021
This paper focuses on automatically assessing language proficiency levels according to linguistic complexity in learner English. We implement a supervised learning approach as part of an automatic essay scoring system. The objective is to uncover Common European Framework of Reference for Languages (CEFR) criterial features in writings by learners…
Descriptors: Prediction, Rating Scales, English (Second Language), Second Language Learning
Peer reviewed Peer reviewed
Direct linkDirect link
Roark, Casey L.; Lehet, Matthew I.; Dick, Frederic; Holt, Lori L. – Journal of Experimental Psychology: Learning, Memory, and Cognition, 2022
Category learning is fundamental to cognition, but little is known about how it proceeds in real-world environments when learners do not have instructions to search for category-relevant information, do not make overt category decisions, and do not experience direct feedback. Prior research demonstrates that listeners can acquire task-irrelevant…
Descriptors: Classification, Learning Processes, Schemata (Cognition), Decision Making
Peer reviewed Peer reviewed
Direct linkDirect link
Choi, Youn-Jeng; Asilkalkan, Abdullah – Measurement: Interdisciplinary Research and Perspectives, 2019
About 45 R packages to analyze data using item response theory (IRT) have been developed over the last decade. This article introduces these 45 R packages with their descriptions and features. It also describes possible advanced IRT models using R packages, as well as dichotomous and polytomous IRT models, and R packages that contain applications…
Descriptors: Item Response Theory, Data Analysis, Computer Software, Test Bias
Peer reviewed Peer reviewed
Direct linkDirect link
Bao, Yu; Bradshaw, Laine – Measurement: Interdisciplinary Research and Perspectives, 2018
Diagnostic classification models (DCMs) can provide multidimensional diagnostic feedback about students' mastery levels of knowledge components or attributes. One advantage of using DCMs is the ability to accurately and reliably classify students into mastery levels with a relatively small number of items per attribute. Combining DCMs with…
Descriptors: Test Items, Selection, Adaptive Testing, Computer Assisted Testing
Previous Page | Next Page »
Pages: 1  |  2  |  3